我试图将公司交易量的平均值与同一公司的每日交易量进行比较,并找出 pandas 的差异。我对公司进行了分组并获取了每个公司卷的平均值。我想要将平均值与同一公司的每日交易量进行比较。
下面的代码是:
vol_grp.mean()
输出是:
Volume
Company
20MICRONS 947802.086957
3MINDIA 3881.608696
5PAISA 69606.521739
AAKASH 49254.217391
AARON 46435.583333
... ...
ZODJRDMKJ 50541.666667
ZOTA 36271.130435
ZUARI 285558.652174
ZUARIGLOB 149646.347826
ZYDUSWELL 72017.826087
1397 rows × 1 columns
实际数据为
Date Company Volume
1 03-MAY-2021 20MICRONS 192281
4 03-MAY-2021 3MINDIA 1707
7 03-MAY-2021 5PAISA 81581
16 03-MAY-2021 AAKASH 35865
17 03-MAY-2021 AARON 1255
... ... ... ...
47160 03-JUN-2021 ZODIACLOTH 75966
47162 03-JUN-2021 ZOTA 470978
47163 03-JUN-2021 ZUARI 137563
47164 03-JUN-2021 ZUARIGLOB 51545
47165 03-JUN-2021 ZYDUSWELL 24350
例如我的意思是20MICRONS公司,我想与20MICRONS公司的日交易量进行比较。如果我有 30 天 20 微米体积的信息,我的平均值应该与这 30 个值进行比较,并且应该返回 30 个差异值。对于所有其他公司也是如此
最佳答案
所以我创建了这个示例 df
from datetime import datetime as dt
import pandas as pd
from numpy.random import randint
df = pd.DataFrame(dict(date= [dt(2021,1,i) for i in [1]*4+[2]*4+[3]*4],
company= ["AAPL", "FB", "NVDA", "AMZN"]*3,
volume= [randint(100,10000) for _ in range(12)]))
df
date company volume
0 2021-01-01 AAPL 1470
1 2021-01-01 FB 7478
2 2021-01-01 NVDA 9156
3 2021-01-01 AMZN 5972
4 2021-01-02 AAPL 9836
5 2021-01-02 FB 1990
6 2021-01-02 NVDA 5380
7 2021-01-02 AMZN 1338
8 2021-01-03 AAPL 9235
9 2021-01-03 FB 3708
10 2021-01-03 NVDA 480
11 2021-01-03 AMZN 2805
然后我创建了一个组,并且能够以与您相同的方式获取每个公司的平均值:
grp = df.groupby("company")
grp.mean()
volume
company
AAPL 6847.000000
AMZN 3371.666667
FB 4392.000000
NVDA 5005.333333
然后我将均值系列与原始 df 合并:
# on = "company" will make it align the values on the company values
merged = df.merge(grp.mean(), on= "company", suffixes= ("_daily", "_mean"))
merged
date company volume_daily volume_mean
0 2021-01-01 AAPL 1470 6847.000000
1 2021-01-02 AAPL 9836 6847.000000
2 2021-01-03 AAPL 9235 6847.000000
3 2021-01-01 FB 7478 4392.000000
4 2021-01-02 FB 1990 4392.000000
5 2021-01-03 FB 3708 4392.000000
6 2021-01-01 NVDA 9156 5005.333333
7 2021-01-02 NVDA 5380 5005.333333
8 2021-01-03 NVDA 480 5005.333333
9 2021-01-01 AMZN 5972 3371.666667
10 2021-01-02 AMZN 1338 3371.666667
11 2021-01-03 AMZN 2805 3371.666667
最后,我通过做一个很好的旧减法创建了差异列:
merged["difference"] = merged["volume_daily"] - merged["volume_mean"]
merged
date company volume_daily volume_mean difference
0 2021-01-01 AAPL 1470 6847.000000 -5377.000000
1 2021-01-02 AAPL 9836 6847.000000 2989.000000
2 2021-01-03 AAPL 9235 6847.000000 2388.000000
3 2021-01-01 FB 7478 4392.000000 3086.000000
4 2021-01-02 FB 1990 4392.000000 -2402.000000
5 2021-01-03 FB 3708 4392.000000 -684.000000
6 2021-01-01 NVDA 9156 5005.333333 4150.666667
7 2021-01-02 NVDA 5380 5005.333333 374.666667
8 2021-01-03 NVDA 480 5005.333333 -4525.333333
9 2021-01-01 AMZN 5972 3371.666667 2600.333333
10 2021-01-02 AMZN 1338 3371.666667 -2033.666667
11 2021-01-03 AMZN 2805 3371.666667 -566.666667
百分比差异:
merged["%_diff"] = merged["difference"]/merged["volume_mean"]*100
merged
date company volume_daily volume_mean difference %_diff
0 2021-01-01 AAPL 1470 6847.000000 -5377.000000 -78.530743
1 2021-01-02 AAPL 9836 6847.000000 2989.000000 43.654155
2 2021-01-03 AAPL 9235 6847.000000 2388.000000 34.876588
3 2021-01-01 FB 7478 4392.000000 3086.000000 70.264117
4 2021-01-02 FB 1990 4392.000000 -2402.000000 -54.690346
5 2021-01-03 FB 3708 4392.000000 -684.000000 -15.573770
6 2021-01-01 NVDA 9156 5005.333333 4150.666667 82.924880
7 2021-01-02 NVDA 5380 5005.333333 374.666667 7.485349
8 2021-01-03 NVDA 480 5005.333333 -4525.333333 -90.410229
9 2021-01-01 AMZN 5972 3371.666667 2600.333333 77.123085
10 2021-01-02 AMZN 1338 3371.666667 -2033.666667 -60.316362
11 2021-01-03 AMZN 2805 3371.666667 -566.666667 -16.806723
关于python - 我试图将公司交易量的平均值与同一公司的每日交易量进行比较,并找出 pandas 的差异。我在公司上做了groupby,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67827101/