python - 从 pandas 的行计算中计算最小值

标签 python pandas

我在 pandas DataFrame 的一列中有一个吞吐量值列表。我想计算一个值与阈值的变化作为该阈值的百分比。

因此,如果我的阈值是 2 和 7。我想计算以下函数的最小值。

(df.throughput - 2)/2  
(df.throughput - 7)/7

x   throughtput
1   3
4   4
7   9

我尝试使用以下命令创建新列,但不断收到错误。我觉得我在这里遗漏了一些非常明显的东西。

df['pct'] =  np.min(  (df.throughput-2)/2,  (df.throughput - 7)/7  )
df['pct'] =  np.min(  (df['throughput']-2)/2,  (df['throughput'] - 7)/7  )
'Series' objects are mutable, thus they cannot be hashed

最佳答案

您可以创建新的系列,比较它们并使用 numpy.where对于新列:

a = (df['throughtput'] - 2)/2
b = (df['throughtput'] - 7)/7
df['pct'] = np.where(a < b, a, b)
print (df)
   x  throughtput       pct
0  1            3 -0.571429
1  4            4 -0.428571
2  7            9  0.285714

解决方案 concatDataFrame.min :

a = (df['throughtput'] - 2)/2
b = (df['throughtput'] - 7)/7
df['pct'] = pd.concat([a,b], axis=1).min(axis=1)
print (df)
   x  throughtput       pct
0  1            3 -0.571429
1  4            4 -0.428571
2  7            9  0.285714
<小时/>

或者通过 numpy.column_stack 创建二维数组并通过 numpy.amin 获取最小值:

a = (df['throughtput'] - 2)/2
b = (df['throughtput'] - 7)/7
df['pct'] = np.amin(np.column_stack([a,b]), axis=1)
print (df)
   x  throughtput       pct
0  1            3 -0.571429
1  4            4 -0.428571
2  7            9  0.285714
<小时/>
a = (df['throughtput'].values - 2)/2
b = (df['throughtput'].values - 7)/7
df['pct'] = np.amin(np.column_stack([a,b]), axis=1)
print (df)
   x  throughtput       pct
0  1            3 -0.571429
1  4            4 -0.428571
2  7            9  0.285714

时间:

N = 1000000
#N = 10
df = pd.DataFrame({'x': np.random.randint(10,size=N),
                   'throughtput':np.random.randint(10,size=N)})
print (df)

In [50]: %%timeit
    ...: a = (df['throughtput'] - 2)/2
    ...: b = (df['throughtput'] - 7)/7
    ...: df['pct'] = np.where(a < b, a, b)
    ...: 
10 loops, best of 3: 21.1 ms per loop

In [51]: %%timeit
    ...: a = (df['throughtput'] - 2)/2
    ...: b = (df['throughtput'] - 7)/7
    ...: df['pct'] = pd.concat([a,b], axis=1).min(axis=1)
    ...: 
10 loops, best of 3: 56.4 ms per loop

In [52]: %%timeit
    ...: a = (df['throughtput'] - 2)/2
    ...: b = (df['throughtput'] - 7)/7
    ...: df['pct'] = np.amin(np.column_stack([a,b]), axis=1)
    ...: 
10 loops, best of 3: 35.1 ms per loop


In [53]: %%timeit
    ...: a = (df['throughtput'].values - 2)/2
    ...: b = (df['throughtput'].values - 7)/7
    ...: df['pct'] = np.amin(np.column_stack([a,b]), axis=1)
    ...: 
10 loops, best of 3: 38.5 ms per loop

Tiny.D的另一个回答:

In [54]: %%timeit
    ...: df['cal_1'] = (df.throughtput - 2)/2
    ...: df['cal_2'] = (df.throughtput - 7)/7
    ...: df['pct'] = df[['cal_1','cal_2']].min(axis=1)
    ...: df[['x','throughtput','pct']]
    ...: 
10 loops, best of 3: 73.7 ms per loop

In [55]: %%timeit
    ...: df['pct']=[min(i,j) for i,j in (zip((df.throughtput - 2)/2,(df.throughtput - 7)/7))]
    ...: 
1 loop, best of 3: 435 ms per loop

关于python - 从 pandas 的行计算中计算最小值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44403068/

相关文章:

python - 在 Python Pandas DataFrame 中将 timedelta64[ns] 列转换为秒

python - <type 'exceptions.ImportError' > 当导入 pandas 和 sklearn 时

python - Pandas 有条件创建系列/数据框列

python - python 有没有累积概率的函数

python - 将 TextBlob 情感分析结果拆分为两个单独的列 - Python Pandas

python - PyTorch:预期输入batch_size (12) 匹配目标batch_size (64)

python - Scrapy只返回第一个结果

python - 如何访问 CKAN 扩展中的 session 对象?

python - 如何配置一个 uWSGI 站点来使用与构建 uWSGI 时不同的 Python 解释器? (uWSGI + virtualenv + emperor)

python - Pandas 在第 0 个位置插入空行