我有一个 CSV 文件:
State, Region
AK, Pacific Non Continuous
HI, Pacific Non Continuous
AL, East South Central
AZ, Mountain
CA, Pacific
OR, Pacific
当我运行时:
df = pd.read_csv('C:...\input.csv')
df['SuperRegion'] = pd.np.where(df.Region.str.match("New England|Middle Atlantic|South Atlantic"), "East",
pd.np.where(df.Region.str.match("East North Central|East South Central|West North Central|West South Central"), "Mid West",
pd.np.where(df.Region.str.match("Mountain|Pacific"), "West", "Other")))
df.to_csv('C:...\Output.csv', index=False)
我希望前两行的 SuperRegion
值为 Other
State, Region, SuperRegion
AK, Pacific Non Continuous, **Other**
HI, Pacific Non Continuous, **Other**
AL, East South Central, Mid West
AZ, Mountain, West
CA, Pacific, West
OR, Pacific, West
但我得到的是:
State, Region, SuperRegion
AK, Pacific Non Continuous, **West**
HI, Pacific Non Continuous, **West**
AL, East South Central, Mid West
AZ, Mountain, West
CA, Pacific, West
OR, Pacific, West
我假设当它运行时,它不会像我希望的那样区分Pacific
和Pacific Non Continuous
。有什么建议吗?
最佳答案
为什么不改变:
pd.np.where(df.Region.str.match("Mountain|Pacific"), "West", "Other")))
至:
pd.np.where(df.Region.str.match("Mountain|Pacific|Pacific Non Continuous"), "West", "West", "Other")))
或者单独添加案例:
df['SuperRegion'] = pd.np.where(df.Region.str.match("New England|Middle Atlantic|South Atlantic"), "East",
pd.np.where(df.Region.str.match("East North Central|East South Central|West North Central|West South Central"), "Mid West",
pd.np.where(df.Region.str.match("Pacific Non Continuous"), "Other",
pd.np.where(df.Region.str.match("Mountain|Pacific"), "West")))
对此的理想解决方案是创建一个字典,其中键作为区域,值作为 super 区域,并使用
df['Regions'].map(dict)
关于python - 由于没有考虑后面的字符,str.match 不完全匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46717122/