我试图将列的每一行中的值拆分为多行,同时复制其他列的相应值。我是 python 的新手,正在尝试找出一种方法来将此解决方案实现到更大的数据集。
这是输入文件:
Name Year Subject State
Jack 2003 Math, Sci, Music MA
Sam 2004 Math, PE, Language, Social CA
Nicole 2005 Math, Life Sci, Geography, Music, Computer Sci NY
这是我想要的输出:
Name Year Subject State
Jack 2003 Math MA
Jack 2003 Sci MA
Jack 2003 Music MA
Sam 2004 Math CA
Sam 2004 PE CA
Sam 2004 Language CA
Sam 2004 Social CA
Nicole 2005 Math NY
Nicole 2005 Life Sci NY
Nicole 2005 Geography NY
Nicole 2005 Music NY
Nicole 2005 Computer Sci NY
我试过这段代码:
import pandas as pd
df= pd.read_csv('C:/Users/3216140/Desktop/test.csv', delimiter=',', skiprows = 1, names = ["Name","Year","Subject","State"] )
print(df)
sub = df['Subject'].str.split(',').apply(pd.Series, 1).stack()
sub.index = sub.index.droplevel(-1)
sub.name = 'Subject'
print (sub)
del df['Subject']
df.join(sub)
print(df)
但是连接似乎没有起作用。我只是得到没有“主题”的输入文件作为输出。
最佳答案
你可以在这里使用np.repeat
和itertools.chain
。
from itertools import chain
v = df.pop('Subject').str.split(r'\s*,\s*')
df_new = pd.DataFrame(
df.values.repeat(v.str.len(), axis=0),
columns=df.columns
)
df_new['Subject'] = list(itertools.chain.from_iterable(v))
df_new
Name State Year Subject
0 Jack 2003 MA Math
1 Jack 2003 MA Sci
2 Jack 2003 MA Music
3 Sam 2004 CA Math
4 Sam 2004 CA PE
5 Sam 2004 CA Language
6 Sam 2004 CA Social
7 Nicole 2005 NY Math
8 Nicole 2005 NY Life Sci
9 Nicole 2005 NY Geography
10 Nicole 2005 NY Music
11 Nicole 2005 NY Computer Sci
关于python - 在复制其他列数据的同时拆分一列中的行中的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49740410/