python - 替换 Pandas 数据框中每个单元格值的有效方法

标签 python pandas numpy

我有两个数据帧,一个是 topic_ ,它是目标数据帧,而 tw 是源数据帧。 topic_ 是一个主题词矩阵,其中每个单元格存储一个词出现在特定主题中的概率。我已经使用 numpy.zeros 将 topic_ 数据帧初始化为零。 tw 数据框的示例-

print(tw)
    topic_id                                     word_prob_pair
0          0  [(customer, 0.061703717964), (team, 0.01724444...
1          1  [(team, 0.0260560163563), (customer, 0.0247838...
2          2  [(customer, 0.0171786268847), (footfall, 0.012...
3          3  [(team, 0.0290787264225), (product, 0.01570401...
4          4  [(team, 0.0197917953222), (data, 0.01343226630...
5          5  [(customer, 0.0263740639141), (team, 0.0251677...
6          6  [(customer, 0.0289764173735), (team, 0.0249938...
7          7  [(client, 0.0265082412402), (want, 0.016477447...
8          8  [(customer, 0.0524006965405), (team, 0.0322975...
9          9  [(generic, 0.0373422774996), (product, 0.01834...
10        10  [(customer, 0.0305256248248), (team, 0.0241559...
11        11  [(customer, 0.0198707090364), (ad, 0.018516805...
12        12  [(team, 0.0159852971954), (customer, 0.0124540...
13        13  [(team, 0.033444510469), (store, 0.01961003290...
14        14  [(team, 0.0344793243818), (customer, 0.0210975...
15        15  [(team, 0.026416114692), (customer, 0.02041691...
16        16  [(campaign, 0.0486186973667), (team, 0.0236024...
17        17  [(customer, 0.0208270072145), (branch, 0.01757...
18        18  [(team, 0.0280889397541), (customer, 0.0127932...
19        19  [(team, 0.0297011415217), (customer, 0.0216007...

我的 topic_ dataframe 的大小为 num_topics(即 20)乘以 number_of_unique_words(在 tw dataframe 中)

以下是我用来替换 topic_ 数据帧中每个值的代码

for each_topic in range(num_topics):
    a = tw['word_prob_pair'].iloc[each_topic]
    for word, prob in a:
        topic_.set_value(each_topic, word, prob)

有没有更好的方法来完成这项任务?

最佳答案

您可以将list comprehensionDataFrame 构造函数一起使用,最后将NaN 替换为0 by fillna :

df = pd.DataFrame({'word_prob_pair':[

[('customer', 0.061703717964), ('team', 0.01724444)],
[('team', 0.0260560163563), ('customer', 0.0247838)],
[('customer', 0.0171786268847), ('footfall', 0.012)],
[('team', 0.0290787264225), ('product', 0.01570401)],
[('team', 0.0197917953222), ('data', 0.01343226630)],
[('customer', 0.0263740639141), ('team', 0.0251677)],
[('customer', 0.0289764173735), ('team', 0.0249938)],
[('client', 0.0265082412402), ('want', 0.016477447)]
] })
print (df)
                                     word_prob_pair
0  [(customer, 0.061703717964), (team, 0.01724444)]
1  [(team, 0.0260560163563), (customer, 0.0247838)]
2  [(customer, 0.0171786268847), (footfall, 0.012)]
3  [(team, 0.0290787264225), (product, 0.01570401)]
4   [(team, 0.0197917953222), (data, 0.0134322663)]
5  [(customer, 0.0263740639141), (team, 0.0251677)]
6  [(customer, 0.0289764173735), (team, 0.0249938)]
7  [(client, 0.0265082412402), (want, 0.016477447)]

df1 = pd.DataFrame([dict(x) for x in df.word_prob_pair])
df1 = df1.fillna(0)
print (df1)
     client  customer      data  footfall   product      team      want
0  0.000000  0.061704  0.000000     0.000  0.000000  0.017244  0.000000
1  0.000000  0.024784  0.000000     0.000  0.000000  0.026056  0.000000
2  0.000000  0.017179  0.000000     0.012  0.000000  0.000000  0.000000
3  0.000000  0.000000  0.000000     0.000  0.015704  0.029079  0.000000
4  0.000000  0.000000  0.013432     0.000  0.000000  0.019792  0.000000
5  0.000000  0.026374  0.000000     0.000  0.000000  0.025168  0.000000
6  0.000000  0.028976  0.000000     0.000  0.000000  0.024994  0.000000
7  0.026508  0.000000  0.000000     0.000  0.000000  0.000000  0.016477

关于python - 替换 Pandas 数据框中每个单元格值的有效方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42339649/

相关文章:

python - Pandas 错误 TypeError : data type not understood

python - Pandas 数据框问题

python - 如何向 numpy 数组添加名称而不更改其维度?

linux - 如何确保 numpy BLAS 库可用作动态可加载库?

python - python 中的图像数组

python - Pandas :保留重复项时填写缺失的日期

python - 为 tf.split() 使用 num_splits 变量

python - 如何在 web2py 中生成多对多关系的 FORM?

python - 我们可以将python脚本修改的环境变量传递给 shell 吗?

python - 如何以编程方式将两个 aac 文件合并为一个文件?