python - 由于没有考虑后面的字符,str.match 不完全匹配

标签 python pandas csv

我有一个 CSV 文件:

State,  Region                  
AK,     Pacific Non Continuous
HI,     Pacific Non Continuous 
AL,     East South Central  
AZ,     Mountain                
CA,     Pacific                
OR,     Pacific                

当我运行时:

df = pd.read_csv('C:...\input.csv')

df['SuperRegion'] = pd.np.where(df.Region.str.match("New England|Middle Atlantic|South Atlantic"), "East",
                pd.np.where(df.Region.str.match("East North Central|East South Central|West North Central|West South Central"), "Mid West",
                pd.np.where(df.Region.str.match("Mountain|Pacific"), "West", "Other")))

df.to_csv('C:...\Output.csv', index=False)

我希望前两行的 SuperRegion 值为 Other

State,  Region,                  SuperRegion
AK,     Pacific Non Continuous,  **Other**
HI,     Pacific Non Continuous,  **Other**
AL,     East South Central,      Mid West
AZ,     Mountain,                West
CA,     Pacific,                 West
OR,     Pacific,                 West

但我得到的是:

State,  Region,                  SuperRegion
AK,     Pacific Non Continuous,  **West**
HI,     Pacific Non Continuous,  **West**
AL,     East South Central,      Mid West
AZ,     Mountain,                West
CA,     Pacific,                 West
OR,     Pacific,                 West

我假设当它运行时,它不会像我希望的那样区分PacificPacific Non Continuous。有什么建议吗?

最佳答案

为什么不改变:

pd.np.where(df.Region.str.match("Mountain|Pacific"), "West", "Other")))

至:

pd.np.where(df.Region.str.match("Mountain|Pacific|Pacific Non Continuous"), "West", "West", "Other")))

或者单独添加案例:

df['SuperRegion'] = pd.np.where(df.Region.str.match("New England|Middle Atlantic|South Atlantic"), "East",
                pd.np.where(df.Region.str.match("East North Central|East South Central|West North Central|West South Central"), "Mid West",
                pd.np.where(df.Region.str.match("Pacific Non Continuous"), "Other",
                pd.np.where(df.Region.str.match("Mountain|Pacific"), "West")))

对此的理想解决方案是创建一个字典,其中键作为区域,值作为 super 区域,并使用

df['Regions'].map(dict)

关于python - 由于没有考虑后面的字符,str.match 不完全匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46717122/

相关文章:

python - NoneType对象没有属性 'play'

python - 确定 k 均值聚类的准确性

python - 将现有类别列表中的类别列添加到 pandas 数据框中

python - 根据另一个数据帧的范围从数据帧中选择最小值

python - netCDF 到 *.csv,无需循环(!)

linux - 使用awk比较csv文件的字段长度

unix - CSV - 删除任何列为空的行

python - 通过定义排序顺序,根据子字典键对 python 字典键进行排序

python - Django 模型,与反向引用的一对多关系 [如何]

python - 格式化 python pandas dataframe iterrows() 的行输出