我有一个示例 DF,试图用升序排序索引替换列值列表:
DF:
df = pd.DataFrame(np.random.randint(0,10,size=(7,3)),columns=["a","b","c"])
df["d1"]=["Apple","Mango","Apple","Mango","Mango","Mango","Apple"]
df["d2"]=["Orange","lemon","lemon","Orange","lemon","Orange","lemon"]
df["date"] = ["2002-01-01","2002-01-01","2002-01-01","2002-01-01","2002-02-01","2002-02-01","2002-02-01"]
df["date"] = pd.to_datetime(df["date"])
a b c d1 d2 date
0 2 7 9 Apple Orange 2002-01-01
1 6 0 9 Mango lemon 2002-01-01
2 8 0 0 Apple lemon 2002-01-01
3 4 4 4 Mango Orange 2002-01-01
4 5 0 8 Mango lemon 2002-02-01
5 6 1 6 Mango Orange 2002-02-01
6 7 2 7 Apple lemon 2002-02-01
第1步:
Group the DF by "date" column, sample group on "2002-01-01"
a b c d1 d2 date
0 2 7 9 Apple Orange 2002-01-01
1 6 0 9 Mango lemon 2002-01-01
2 8 0 0 Apple lemon 2002-01-01
3 4 4 4 Mango Orange 2002-01-01
第2步:
在该组中,替换列
["d1","d2"]
的值带有基于 c
的排序平均值的索引(不是 DF 索引) .比如上面的组
mean(c, d1="Apple") = [9+0]/2 => 4.5
和mean(c, d1="Mango") = [9+4]/2 => 6.5
所以ascending sorted index
是 Apple:0
和 Mango:1
所以列的值
d1
将被替换如下: a b c d1 d2 date
0 2 7 9 0 Orange 2002-01-01
1 6 0 9 1 lemon 2002-01-01
2 8 0 0 0 lemon 2002-01-01
3 4 4 4 1 Orange 2002-01-01
将此应用于整个
df
.我有遍历组和每一行的蛮力方法,对更多 pandas
的任何建议基于解决方案将有助于提高效率。
最佳答案
这是您在 d1 列中寻找的内容吗?您也可以将一些类似的技术应用于 d2。虽然它不是最优雅的解决方案。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,10,size=(7,3)),columns=["a","b","c"])
df["d1"]=["Apple","Mango","Apple","Mango","Mango","Mango","Apple"]
df["d2"]=["Orange","lemon","lemon","Orange","lemon","Orange","lemon"]
df["date"] = ["2002-01-01","2002-01-01","2002-01-01","2002-01-01","2002-02-01","2002-02-01","2002-02-01"]
df["date"] = pd.to_datetime(df["date"])
df['mean_value'] = df.groupby(['date', 'd1'])['c'].transform(lambda x: np.mean(x))
df['rank_value'] = (df.groupby(['date'])['mean_value'].rank(ascending=True, method='dense') - 1).astype(int)
df['d1'] = df['rank_value']
df.drop(labels=['rank_value', 'mean_value'], axis=1, inplace=True)
df
a b c d1 d2 date
0 3 1 4 1 Orange 2002-01-01
1 9 7 5 0 lemon 2002-01-01
2 9 9 5 1 lemon 2002-01-01
3 8 1 2 0 Orange 2002-01-01
4 8 0 1 0 lemon 2002-02-01
5 1 8 3 0 Orange 2002-02-01
6 8 0 4 1 lemon 2002-02-01
关于python - 用排序索引替换 Pandas 列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62479547/