python - Pandas 数据帧 : how to turn one row into separate rows based on labelled column value

标签 python pandas

我正在为外汇新闻分析创建基于实体的情绪分类。对于每篇新闻文章,可能会识别多种货币。但我正在努力解决如何将一行(例如根据现有人类标签的 {'USD':1, "JPY":-1})转换为单独的行。

现在的示例数据框是:

       sentiment                                               text
0   USD:1,CNY:-1  US economy is improving while China is struggling
1  USD:-1, JPY:1    Unemployment is high for US while low for Japan

并希望转换为多行,如下所示:

  currency sentiment                                               text
0      USD         1  US economy is improving while China is struggling
1      CNY        -1  US economy is improving while China is struggling
2      USD        -1    Unemployment is high for US while low for Japan
3      JPY         1    Unemployment is high for US while low for Japan

非常感谢您的帮助

最佳答案

您可以在 ,|: 上拆分 sentiment col,然后展开 & stack

然后使用 pd.reindex & pd.index.repeat根据拆分的 len 重复 text 列。

# Split the col on both , and : then stack.
s = df['sentiment'].str.split(',|:',expand=True).stack()

# Reindex and repeat cols on len of split and reset index.
df1 = df.reindex(df.index.repeat(df['sentiment'].fillna("").str.split(',').apply(len))) 
df1 = df1.reset_index(drop=True)

df1['currency'] = s[::2].reset_index(drop=True)
df1['sentiment'] = s[1::2].reset_index(drop=True)

print (df1.sort_index(axis=1))

输出:

    currency  sentiment              text
0    USD         1        US economy is improving while China is struggling
1    CNY        -1        US economy is improving while China is struggling
2    USD        -1        Unemployment is high for US while low for Japan
3    JPY         1        Unemployment is high for US while low for Japan

关于python - Pandas 数据帧 : how to turn one row into separate rows based on labelled column value,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52800063/

相关文章:

python - 使用重采样计算 2 周的平均计数

python - 如何在 Python 中创建日期时间使用跟踪器?

javascript - 如何将经度和纬度转换为街道地址

python:打印短utf编码字符串时遇到问题

python - 安装 'No module named numpy' 模块时出现 'lap' 错误,如何修复?

python - Groupby、移位和前向填充

python - Pandas Dataframe 自动类型转换

python - 在黑白图片中的特定颜色周围添加边框

python - 截屏并在其上使用 OCR

python - 将负日期时间转换为 NaT