只是一个简单的问题。
我有一个包含很多列的 CSV。我有 1 个专栏,名为:美食,有很多值(value)。
name,Cuisine
Real Talent Cafe,"Italian, American, Pizza, Mediterranean, European, Fusion"
Dogma,"International, Mediterranean, Barbecue, Spanish, Fusion"
Taberna El Callejon,"Mediterranean, European, Spanish"
Astor,"International, Mediterranean, European, Fusion"
La Gaditana Castellana,"Spanish, Seafood, International, Diner, Wine Bar"
我想从此 CSV 创建一个新的 CSV,包含 2 列: - 姓名 - 美食(通过拆分第一个 CSV)
这是我创建的脚本,我只选择我感兴趣的两列:名称和美食:
# -*- coding: utf-8 -*-
from itertools import chain
import numpy as np
import pandas as pd
df = pd.read_csv('res_madrid.csv', usecols=['name','Cuisine'])
items_count = df["Cuisine"].str.count(",") +1
pd.DataFrame({"name": np.repeat(df["name"], items_count),
"Cuisine": list(chain.from_iterable(df["Cuisine"].str.split(",")))})
我收到以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.6/site-packages/numpy/core/fromnumeric.py", line 471, in repeat
return _wrapfunc(a, 'repeat', repeats, axis=axis)
File "/usr/lib64/python3.6/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
File "/usr/lib64/python3.6/site-packages/pandas/core/series.py", line 1157, in repeat
new_index = self.index.repeat(repeats)
File "/usr/lib64/python3.6/site-packages/pandas/core/indexes/base.py", line 862, in repeat
return self._shallow_copy(self._values.repeat(repeats))
ValueError: count < 0
请注意,如果您进行测试,复制我分享给您的数据,它将起作用...... 当我加载包含更多列的 CSV 文件并使用“usecols”参数时,会出现此问题。
预期结果如下:
name Cuisine
0 Real Talent Cafe Italian
0 Real Talent Cafe American
0 Real Talent Cafe Pizza
0 Real Talent Cafe Mediterranean
0 Real Talent Cafe European
0 Real Talent Cafe Fusion
1 Dogma International
1 Dogma Mediterranean
1 Dogma Barbecue
1 Dogma Spanish
1 Dogma Fusion
2 Taberna El Callejon Mediterranean
2 Taberna El Callejon European
2 Taberna El Callejon Spanish
3 Astor International
3 Astor Mediterranean
3 Astor European
3 Astor Fusion
4 La Gaditana Castellana Spanish
4 La Gaditana Castellana Seafood
4 La Gaditana Castellana International
4 La Gaditana Castellana Diner
4 La Gaditana Castellana Wine Bar
编辑:出现错误是因为我在 Cuisine 列中具有空值。我怎样才能避免这种情况?
感谢您的帮助:) 问候 亚历山大
最佳答案
data = pd.read_csv(#path to txt file)
数据
name Cuisine
0 Real Talent Cafe Italian, American, Pizza, Mediterranean, Europ...
1 Dogma International, Mediterranean, Barbecue, Spanis...
2 Taberna El Callejon Mediterranean, European, Spanish
3 Astor International, Mediterranean, European, Fusion
4 La Gaditana Castellana Spanish, Seafood, International, Diner, Wine Bar
使用
data.set_index('name')['Cuisine'].apply(lambda x: x.split(',')).apply(pd.Series).stack().reset_index().drop('level_1', axis=1)
data.columns = ['name', 'cusisine']
输出
data.head()
name cusisine
0 Real Talent Cafe Italian
1 Real Talent Cafe American
2 Real Talent Cafe Pizza
3 Real Talent Cafe Mediterranean
4 Real Talent Cafe European
关于python - 使用 panda 将列拆分为 csv,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54652674/