python - 根据另一列填写空单元格

标签 python pandas dictionary

我想根据另一列匹配/映射数据框中的缺失值。例如,

         City         State              Country
      Chicago            IL        United States
       Boston            MA        United States
    San Diego            
  Los Angeles            CA        United States
San Francisco
   Sacramento     
    Vancouver            BC               Canada

所以,如果我想填写与洛杉矶相同的那三个城市的省份和国家的空单元格。我该怎么办?

下面是我的代码,但我完全陷入其中。

CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']
df.loc[df['City'] == CA_cities, 'State' = 'CA' and 'Country' = 'United States']

任何帮助将不胜感激。

最佳答案

您可以将 groupbyisin 创建的掩码一起使用,然后通过前后填充替换 NaN:

CA_cities = ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento']

df = df.groupby(df['City'].isin(CA_cities)).apply(lambda x: x.ffill().bfill())
print (df)
            City State        Country
0        Chicago    IL  United States
1         Boston    MA  United States
2      San Diego    CA  United States
3    Los Angeles    CA  United States
4  San Francisco    CA  United States
5     Sacramento    CA  United States
6      Vancouver    BC         Canada

更通用的解决方案是创建城市组,例如在字典中,将 keys 与值和 map 列交换:

print (df)
            City State        Country
0        Chicago    IL  United States
1       Chicago1   NaN            NaN
2         Boston    MA  United States
3      San Diego   NaN            NaN
4    Los Angeles    CA  United States
5  San Francisco   NaN            NaN
6     Sacramento   NaN            NaN
7      Vancouver    BC         Canada

cities = {'CA': ['San Diego', 'Los Angeles', 'San Francisco', 'Sacramento'], 
          'IL':['Chicago','Chicago1']}
d = {k: oldk for oldk, oldv in cities.items() for k in oldv}

df = df.groupby(df['City'].map(d).fillna(df['City'])).apply(lambda x: x.ffill().bfill())
#slowier alternative
#df = df.groupby(df['City'].replace(d)).apply(lambda x: x.ffill().bfill())
print (df)
            City State        Country
0        Chicago    IL  United States
1       Chicago1    IL  United States
2         Boston    MA  United States
3      San Diego    CA  United States
4    Los Angeles    CA  United States
5  San Francisco    CA  United States
6     Sacramento    CA  United States
7      Vancouver    BC         Canada

详细信息:

print (df['City'].map(d).fillna(df['City']))
0           IL
1           IL
2       Boston
3           CA
4           CA
5           CA
6           CA
7    Vancouver
Name: City, dtype: object

print (d)
{'San Diego': 'CA', 'Los Angeles': 'CA', 'San Francisco': 'CA', 
 'Sacramento': 'CA', 'Chicago': 'IL', 'Chicago1': 'IL'}

关于python - 根据另一列填写空单元格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49479432/

相关文章:

python - 导入错误 : No module named comtypes in Python 3. 6

python-3.x - 在 Pandas 数据透视表中显示映射标签而不是代码

java - 如何使用接口(interface)作为 map 的值?

python - 对导出到excel的字典进行排序?

python - 展开DataFrame的索引级别

python - 没有名为 'line_profiler' 的模块

python - 名字和姓氏未以 django 形式显示

python - 如何从 db PyQt4 获取 'real-time' 数据

python - 尝试从 json 解析值时出现类型错误

python - 仅当其值不属于特定数据类型时才连接 3+ 列