python - 如何在给定的数据框中划分列 'location'？

标签 python string python-3.x pandas dataframe

我正在处理一个数据集，其中的列被命名为标题。其值如前所述。

df = pd.DataFrame(data={"location":["düsseldorf, nordrhein-westfalen, germany",
                                    "durbanville , cape town, cape town , south africa"]})

我想将此列划分为['city', 'state', 'country']。请注意，第二行有重复项。

我尝试了以下方法，但这不处理重复项:

location = df.location.str.split(', ', n=2, expand=True)

location.columns = ['city', 'state', 'country']

最佳答案

您可以使用 itertools docs 中提供的 unique_everseen 配方。，也可在第三方库中使用，例如 toolz.unique .

该逻辑可以合并到迭代df['location']的列表理解中。这可能比 Pandas 基于字符串的方法更有效，后者不提供矢量化功能。

from toolz import unique

res = pd.DataFrame([list(unique(map(str.strip, i.split(',')))) for i in df['location']])

res.columns = ['city', 'state', 'country']

print(res)

          city                state       country
0   düsseldorf  nordrhein-westfalen       germany
1  durbanville            cape town  south africa

关于python - 如何在给定的数据框中划分列 'location'？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52567930/

上一篇：python - 向模型添加新字段

下一篇：python - 为什么 Python 中的 @foo.setter 对我不起作用？

Python matplotlib plot3d 轮廓长度

java - 如何检测和删除 URL 中的一句话？

python-3.x - 对 Tornado 使用elasticsearch_async

Python "in"range() 上的运算符时间复杂度

Java:从字符串中删除引号

python - 如何更新 ipython 中的包，如 jupyter & spyder

python - 添加数据框中的所有列

python - 使用BeautifulSoup提取div内的页面信息

string - 在每个给定输入文本中查找每个给定模式首次出现的算法