python - Pandas 数据帧 : how to turn one row into separate rows based on labelled column value

我正在为外汇新闻分析创建基于实体的情绪分类。对于每篇新闻文章，可能会识别多种货币。但我正在努力解决如何将一行(例如根据现有人类标签的 {'USD':1, "JPY":-1})转换为单独的行。

现在的示例数据框是:

       sentiment                                               text
0   USD:1,CNY:-1  US economy is improving while China is struggling
1  USD:-1, JPY:1    Unemployment is high for US while low for Japan

并希望转换为多行，如下所示:

  currency sentiment                                               text
0      USD         1  US economy is improving while China is struggling
1      CNY        -1  US economy is improving while China is struggling
2      USD        -1    Unemployment is high for US while low for Japan
3      JPY         1    Unemployment is high for US while low for Japan

非常感谢您的帮助

最佳答案

您可以在 ,|: 上拆分 sentiment col，然后展开 & stack

然后使用 pd.reindex & pd.index.repeat根据拆分的 len 重复 text 列。

# Split the col on both , and : then stack.
s = df['sentiment'].str.split(',|:',expand=True).stack()

# Reindex and repeat cols on len of split and reset index.
df1 = df.reindex(df.index.repeat(df['sentiment'].fillna("").str.split(',').apply(len))) 
df1 = df1.reset_index(drop=True)

df1['currency'] = s[::2].reset_index(drop=True)
df1['sentiment'] = s[1::2].reset_index(drop=True)

print (df1.sort_index(axis=1))

输出:

    currency  sentiment              text
0    USD         1        US economy is improving while China is struggling
1    CNY        -1        US economy is improving while China is struggling
2    USD        -1        Unemployment is high for US while low for Japan
3    JPY         1        Unemployment is high for US while low for Japan

关于python - Pandas 数据帧 : how to turn one row into separate rows based on labelled column value，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52800063/

python - Pandas 数据帧 : how to turn one row into separate rows based on labelled column value

输出:

上一篇：python - 从 asn1 格式的签名数据中导出时间戳 'signing_time'

下一篇：python - 我有一个包含值的数据框并将其设置为 Excel 中的范围，如何隐藏索引？