python - 相互比较相应的列并将结果存储在新列中

我有一个使用数据透视表方法进行数据透视的数据，现在数据如下所示:

rule_id  a   b   c
50211    8   0   0
50249    16  0   3
50378    0   2   0
50402    12  9   6

我已将“rule_id”设置为索引。现在，我将一列与其相应的列进行比较，并用其结果创建另一列。这个想法是，如果第一列的值不是 0 ，并且与第一列进行比较的第二列，有 0 ，则应在新创建的列中更新 100 ，但如果情况反之亦然，则“空”应该更新。如果两列都有 0 ，那么“Null”也应该更新。如果最后一列的值为 0 ，则应更新 'Null' ，如果最后一列的值为 0 以外的值，则应更新 100 。但是，如果两列的值都不是 0(例如我数据的最后一行)，那么 a 列和 b 列的比较应该如下所示:

value_of_b/value_of_a *50 + 50

对于 b 列和 c 列:

value_of_c/value_of_b *25 + 25

同样，如果列数较多，则乘法和加法值应为 12.5，依此类推。

除了最后的结果(即除法和乘法)之外，我能够实现上述所有目标。我使用了这段代码:

m = df.eq(df.shift(-1, axis=1))

arr = np.select([df ==0, m], [np.nan, df], 1*100)

df2 = pd.DataFrame(arr, index=df.index).rename(columns=lambda x: f'comp{x+1}')

df3 = df.join(df2)

df 是存储我在开始时提到的数据透视表数据的数据框。使用此代码后，我的数据如下所示:

   rule_id   a   b   c  comp1 comp2 comp3
    50211    8   0   0   100   NaN   NaN
    50249    16  0   3   100   NaN   100
    50378    0   2   0   NaN   100   NaN
    50402    12  9   6   100   100   100

但我希望数据看起来像这样:

   rule_id   a   b   c  comp1 comp2 comp3
    50211    8   0   0   100   NaN   NaN
    50249    16  0   3   100   NaN   100
    50378    0   2   0   NaN   100   NaN
    50402    12  9   6   87.5  41.67 100

如果你们能帮助我获得所需的数据，我将不胜感激。

编辑: 这是我的数据的样子:

最佳答案

问题是构建新的 compx 列时使用的系数不仅仅取决于列位置。事实上，在每一行中，每个 0 值之后它都会重置为最大值 50，并且在非 0 值之后是前一行的一半。这些可重置系列很难在 pandas 中矢量化，尤其是在行中。在这里，我将构建一个仅包含这些系数的配套数据帧，并直接使用 numpy 底层数组来尽可能高效地计算它们。代码可以是:

# transpose the dataframe to process columns instead of rows
coeff = df.T

# compute the coefficients
for name, s in coeff.items():
    top = 100              # start at 100
    r = []
    for i, v in enumerate(s):
        if v == 0:         # reset to 100 on a 0 value
            top=100
        else:
            top = top/2    # else half the previous value
        r.append(top)
    coeff.loc[:, name] = r # set the whole column in one operation

# transpose back to have a companion dataframe for df
coeff = coeff.T

# build a new column from 2 consecutive ones, using the coeff dataframe
def build_comp(col1, col2, i):
    df['comp{}'.format(i)] = np.where(df[col1] == 0, np.nan,
                                      np.where(df[col2] == 0, 100,
                                               df[col2]/df[col1]*coeff[col1]
                                               +coeff[col1]))

old = df.columns[0]          # store name of first column

# Ok, enumerate all the columns (except first one)
for i, col in enumerate(df.columns[1:], 1):
    build_comp(old, col, i)
    old = col                # keep current column name for next iteration

# special processing for last comp column
df['comp{}'.format(i+1)] = np.where(df[col] == 0, np.nan, 100)

使用这个初始数据框:

date     2019-04-25 15:08:23  2019-04-25 16:14:14  2019-04-25 16:29:05  2019-04-25 16:36:32
rule_id
50402                      0                    0                    9                    0
51121                      0                    1                    0                    0
51147                      0                    1                    0                    0
51183                      2                    0                    0                    0
51283                      0                   12                    9                    6
51684                      0                    1                    0                    0
52035                      0                    4                    3                    2

它给出了预期的结果:

date     2019-04-25 15:08:23  2019-04-25 16:14:14  2019-04-25 16:29:05  2019-04-25 16:36:32  comp1  comp2       comp3  comp4
rule_id
50402                      0                    0                    9                    0    NaN    NaN  100.000000    NaN
51121                      0                    1                    0                    0    NaN  100.0         NaN    NaN
51147                      0                    1                    0                    0    NaN  100.0         NaN    NaN
51183                      2                    0                    0                    0  100.0    NaN         NaN    NaN
51283                      0                   12                    9                    6    NaN   87.5   41.666667  100.0
51684                      0                    1                    0                    0    NaN  100.0         NaN    NaN
52035                      0                    4                    3                    2    NaN   87.5   41.666667  100.0

关于python - 相互比较相应的列并将结果存储在新列中，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56425373/

python - 相互比较相应的列并将结果存储在新列中

上一篇：python - 了解 openpyxl 模块中 Worksheet 的追加方法如何工作

下一篇：python - 如何使这个队列并行？