我正在为外汇新闻分析创建基于实体的情绪分类。对于每篇新闻文章,可能会识别多种货币。但我正在努力解决如何将一行(例如根据现有人类标签的 {'USD':1, "JPY":-1}
)转换为单独的行。
现在的示例数据框是:
sentiment text
0 USD:1,CNY:-1 US economy is improving while China is struggling
1 USD:-1, JPY:1 Unemployment is high for US while low for Japan
并希望转换为多行,如下所示:
currency sentiment text
0 USD 1 US economy is improving while China is struggling
1 CNY -1 US economy is improving while China is struggling
2 USD -1 Unemployment is high for US while low for Japan
3 JPY 1 Unemployment is high for US while low for Japan
非常感谢您的帮助
最佳答案
您可以在 ,|:
上拆分 sentiment
col,然后展开 & stack
然后使用 pd.reindex
& pd.index.repeat
根据拆分的 len
重复 text
列。
# Split the col on both , and : then stack.
s = df['sentiment'].str.split(',|:',expand=True).stack()
# Reindex and repeat cols on len of split and reset index.
df1 = df.reindex(df.index.repeat(df['sentiment'].fillna("").str.split(',').apply(len)))
df1 = df1.reset_index(drop=True)
df1['currency'] = s[::2].reset_index(drop=True)
df1['sentiment'] = s[1::2].reset_index(drop=True)
print (df1.sort_index(axis=1))
输出:
currency sentiment text
0 USD 1 US economy is improving while China is struggling
1 CNY -1 US economy is improving while China is struggling
2 USD -1 Unemployment is high for US while low for Japan
3 JPY 1 Unemployment is high for US while low for Japan
关于python - Pandas 数据帧 : how to turn one row into separate rows based on labelled column value,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52800063/