python - Pandas ；通过划分大型数据帧的最后一列来创建新列

我有一个非常大的数据框，由 400 列和 >1000 行组成。数据框的列是固定的，不会改变。我想要的是对最后 120 列做一些事情；将这些列中的值除以数据框中其他列的值

我的数据框是这样的:

Column1 Column2 Column3 .... Column280...Column400
A       2       6            20          40   
B       4       3            20          20
C       3       3            30          9

我想将 Column280 到 Column400 除以 Column2，并将每次除法的结果添加到新列，如下所示:

Column1 Column2 Column3 .... Column280...Column400 .. Column401....Column520
A       2       6            20          40           10           20
B       4       3            20          20           5            5
C       3       3            30          9            10           3


Column401 = Column280/Column2 

Column402 = Column281/Column2 

Column403 = Column282/Column2

等等

我有一个包含最后 120 列的列名的列表，但我真的不知道如何告诉 pandas 划分这些列并将结果添加到新列。希望有人可以在这里帮助我!

最佳答案

设置:

df = pd.DataFrame(np.random.randint(0,20,(5,400)), columns=range(1, 401)) \
       .add_prefix('Column')

解决方案:

df[['Column{}'.format(i) for i in range(401, 401+(400-280)+1)]] = \
    df.loc[:, 'Column280':'Column400'].div(df['Column2'], axis=0)

结果:

In [42]: df
Out[42]:
   Column1  Column2  Column3  Column4  Column5  Column6  Column7  Column8  Column9  Column10    ...      Column512  \
0        8        7        3        9       11       14       12       18        6         5    ...       2.714286
1        9       12        4        8        8        2       14       16        9        12    ...       0.166667
2       15        8       11        9       15        0        9       15       16         2    ...       0.000000
3       16       17       12       10        0       15       18        9        9        19    ...       1.117647
4        0       16       17        6        8       17        3        4       17         0    ...       0.812500

   Column513  Column514  Column515  Column516  Column517  Column518  Column519  Column520  Column521
0   0.428571   1.857143   1.714286   0.000000   2.142857   2.428571   1.000000   2.285714   0.571429
1   1.416667   0.750000   0.083333   0.916667   0.166667   1.250000   1.083333   0.500000   1.166667
2   2.000000   0.500000   0.125000   1.875000   1.500000   2.000000   1.000000   1.875000   1.875000
3   0.352941   0.882353   0.470588   0.882353   0.176471   1.000000   0.058824   0.588235   0.941176
4   0.562500   0.687500   0.750000   1.000000   0.750000   0.875000   0.687500   1.000000   1.000000

[5 rows x 521 columns]

说明:

如果我们想一步向 DataFrame 添加几列，我们可以这样做:

df[['new1','new2','new3']] = array

其中数组必须是(len(df) x 3)形状或三个与DF长度相同的系列

df.loc[:, 'Column280':'Column400'] 选择从 'Column280' 到 'Column400'< 的所有行和列 (注意:列必须按字典顺序排序)

PS here is a very well documented Pandas boolean indexing

.div(df['Column2'], axis=0) 将左侧的 DataFrame 沿 划分到 df['Column2'] >索引轴

关于python - Pandas ；通过划分大型数据帧的最后一列来创建新列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44023973/

python - Pandas ；通过划分大型数据帧的最后一列来创建新列

上一篇：python - Graphite:sumSeries 函数不起作用

下一篇：python - 如果 pandas 中的列在某个日期之前为 NaN，则删除它们