我一直在处理最初导出到 CSV 的数据,后来从同一 CSV 导入数据以进行进一步的 EDA。有一个地址列,附有“郊区/地区”名称。我试图找到一种方法使用 Excel 将这些特定的郊区名称拆分/提取到不同的列中。但我没有得到想要的输出。了解我是否可以使用 Python(NLTK) 函数执行此操作会有所帮助吗?
这是我的示例数据。
**Address column**
4a Mcarthurs Road, Altona north
1 Neal court, Altona North
4 Vermilion Drive, Greenvale
Lot 307 Bonds Lane, Greenvale
430 Blackshaws rd, Altona North
159 Bonds lane, Greenvale
Lot 1105 4 compass Drive Greenvale
6005 Bethany dr tarneet
Lot 655 Potofino Way Wollert
lot 403 Binds Lane, Greenvale
157 Maidstone street Altona
11 Laramie Street, Greenvale
10 Preveli Way Wollert
21 Laramie Street, Greenvale
20 taipan crt tarneit
4 bisect road greenvale
83 everton road truganina
Lot 450 Vermilion Drive, Greenvale
Lot 641 Preveli Way Wollert
648 hogans rd tarneit
期望的输出:
Address Suburb
4a Mcarthurs Road Altona North
1 Neal court Altona North
4 Vermilion Drive Greenvale
Lot 307 Bonds Lane Greenvale
430 Blackshaws rd Altona North
159 Bonds lane Greenvale
Lot 1105 4 compass Drive Greenvale
6005 Bethany dr Tarneet
Lot 655 Potofino Way Wollert
lot 403 Binds Lane Greenvale
157 Maidstone street Altona
11 Laramie Street Greenvale
10 Preveli Way Wollert
21 Laramie Street Greenvale
20 taipan crt Tarneit
4 bisect road Greenvale
83 everton road Truganina
Lot 450 Vermilion Drive Greenvale
Lot 641 Preveli Way Wollert
648 hogans rd Tarneit
对此的任何帮助将不胜感激。
提前感谢您的支持!
最佳答案
你可以试试这个:
df['local'] = df['Address column']\
.str.extract(r'.+\, (.*)')\
.fillna(df['Address column'].str.extract(r'.* (.*)$'))
print(df['local'])
0 Altona north
1 Altona North
2 Greenvale
3 Greenvale
4 Altona North
5 Greenvale
6 Greenvale
7 tarneet
8 Wollert
9 Greenvale
10 Altona
11 Greenvale
12 Wollert
13 Greenvale
14 tarneit
15 greenvale
16 truganina
17 Greenvale
18 Wollert
19 tarneit
Name: local, dtype: object
关于python - 如何在 Python 中从列中拆分和提取位置名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69216951/