python - 如何用 pandas 数据框中的多个值替换一列值

标签 python pandas

输入数据框

Date            Geo         Shipment
2020-01-01      USA         1000
2020-01-01      BRA         5865
2020-01-01      CHN         4789
2020-01-02      EU1         6541
2020-01-02      EU2         3258
..

dict =  {"EU1":["ALA", "BEL", "AND", "AUT"] , "EU2": ["AUT", "BEL", "BGR", "HRV", "CZE"] , "EU3": ["EST", "HRV", "FRA", "DEU"]}

如何替换 Geo 列中的值,以便存在一对多映射和重复 Shipment 值?

输出df

Date            Geo         Shipment
2020-01-01      ALA         1000
2020-01-01      BEL         1000
2020-01-01      AND         1000
2020-01-01      AUT         1000
..
2020-01-01      AUT         5865
2020-01-01      BEL         5865
2020-01-01      HRV         5865
2020-01-01      BGR         5865
2020-01-01      CZE         5865
..
2020-01-01      EST         4789
2020-01-01      HRV         4789
2020-01-01      FRA         4789
2020-01-01      DEU         4789
..

最佳答案

通过构造函数使用DataFrame,然后通过DataFrame.merge 进行外部连接|并按 DataFrame.pop 重新分配列 Geo :

d =  {"EU1":["ALA", "BEL", "AND", "AUT"] , 
      "EU2": ["AUT", "BEL", "BGR", "HRV", "CZE"] , 
      "EU3": ["EST", "HRV", "FRA", "DEU"]}

df1 = pd.DataFrame(((k, x) for k, v in d.items() for x in v), columns=['Geo','New'])


df = df.merge(df1, on='Geo', how='outer')
df['Geo'] = df.pop('New')
df = df.sort_values('Date', ignore_index=True)

print (df)
          Date  Geo  Shipment
0   2020-01-01  ALA      1000
1   2020-01-01  HRV      4789
2   2020-01-01  EST      4789
3   2020-01-01  CZE      5865
4   2020-01-01  HRV      5865
5   2020-01-01  FRA      4789
6   2020-01-01  BEL      5865
7   2020-01-01  AUT      5865
8   2020-01-01  BGR      5865
9   2020-01-01  AUT      1000
10  2020-01-01  AND      1000
11  2020-01-01  BEL      1000
12  2020-01-01  DEU      4789
13  2020-01-02  AND      6541
14  2020-01-02  BEL      6541
15  2020-01-02  ALA      6541
16  2020-01-02  AUT      3258
17  2020-01-02  BEL      3258
18  2020-01-02  BGR      3258
19  2020-01-02  HRV      3258
20  2020-01-02  CZE      3258
21  2020-01-02  AUT      6541

新数据的解决方案:

d =  {"EU1":["ALA", "BEL", "AND", "AUT"] , 
      "EU2": ["AUT", "BEL", "BGR", "HRV", "CZE"] , 
      "EU3": ["EST", "HRV", "FRA", "DEU"]}

df1 = pd.DataFrame(((k, x) for k, v in d.items() for x in v), columns=['Geo','New'])

df = df.merge(df1, on='Geo', how='left')
df['Geo'] = df.pop('New').fillna(df['Geo'])
print (df)
          Date  Geo  Shipment
0   2020-01-01  USA      1000
1   2020-01-01  BRA      5865
2   2020-01-01  CHN      4789
3   2020-01-02  ALA      6541
4   2020-01-02  BEL      6541
5   2020-01-02  AND      6541
6   2020-01-02  AUT      6541
7   2020-01-02  AUT      3258
8   2020-01-02  BEL      3258
9   2020-01-02  BGR      3258
10  2020-01-02  HRV      3258
11  2020-01-02  CZE      3258

关于python - 如何用 pandas 数据框中的多个值替换一列值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69114105/

相关文章:

python - 如果行以字母表中的任何字母开头,则打印行

python - matplotlib 在 IPython 中不使用 matplotlibrc 文件

python - 如何在执行某些操作时融化数据帧?

python - 如何准备图像分类训练数据

java - Selenium argument[0].scroll.to 函数第二次不起作用

python - 使用 Python 保存从 Facebook 收集的评论的最佳方法是什么?

Python Pandas - 合并数据框中的两列

python - 拆分字母数字列,不带分隔符 pandas 数据框

从同一目录调用时,Python subprocess.run 不起作用

python - 如何在python中合并两个数据框,其中包含长度不等的列中的文本?