python - 从字符串中删除最后四位数字 - 将 Zip+4 转换为邮政编码

下面的代码...

data = np.array([['','state','zip_code','collection_status'],
                ['42394','CA','92637-2854', 'NaN'],
                ['58955','IL','60654', 'NaN'],
                ['108365','MI','48021-1319', 'NaN'],
                ['109116','MI','48228', 'NaN'],
                ['110833','IL','60008-4227', 'NaN']])

print(pd.DataFrame(data=data[1:,1:],
                  index=data[1:,0],
                  columns=data[0,1:]))

...给出以下数据框:

         state            zip_code    collection_status
42394       CA          92637-2854                  NaN
58955       IL               60654                  NaN
108365      MI          48021-1319                  NaN
109116      MI               48228                  NaN
110833      IL          60008-4227                  NaN

目标是将“zip_code”列均质化为 5 位数字格式，即当该特定数据点有 9 位而不是 5 位数字时，我想从 zip_code 中删除最后四位数字。顺便说一句，zip_code 的类型是“对象”类型。

有什么想法吗？

最佳答案

使用indexing with str只是，谢谢John Galt :

df['collection_status'] = df['zip_code'].str[:5]
print (df)
       state    zip_code collection_status
42394     CA  92637-2854             92637
58955     IL       60654             60654
108365    MI  48021-1319             48021
109116    MI       48228             48228
110833    IL  60008-4227             60008

如果需要添加条件使用where或numpy.where :

df['collection_status'] = df['zip_code'].where(df['zip_code'].str.len() == 5, 
                                               df['zip_code'].str[:5])
print (df)
       state    zip_code collection_status
42394     CA  92637-2854             92637
58955     IL       60654             60654
108365    MI  48021-1319             48021
109116    MI       48228             48228
110833    IL  60008-4227             60008

<小时/>

df['collection_status'] = np.where(df['zip_code'].str.len() == 5, 
                                   df['zip_code'],
                                   df['zip_code'].str[:5])
print (df)
       state    zip_code collection_status
42394     CA  92637-2854             92637
58955     IL       60654             60654
108365    MI  48021-1319             48021
109116    MI       48228             48228
110833    IL  60008-4227             60008

关于python - 从字符串中删除最后四位数字 - 将 Zip+4 转换为邮政编码，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44776115/

python - 从字符串中删除最后四位数字 - 将 Zip+4 转换为邮政编码

上一篇：python - 检查 python 中两个函数的字符串是否正确(字符串操作)

下一篇：python - Tensorflow 中的变量、常量和图形