所以我已经将一个表从 pdf 初始化为 pandas Dataframe,如下所示:
df_current= pd.DataFrame({'Country': ['NaN','NaN','Nan','NaN','Denmark', 'Sweden',
'Germany'],
'Explained Part':['Personal and job characteristics',
'Education Occupation Job Employment', 'experience contract',
'Employment contract','20 -7 2 0','4 6 2 0', '-9 -6 -1 :']})
预期(或我最终目标的输出):
df_expected = pd.DataFrame({'Country': ['Denmark', 'Sweden',
'Germany'],'Personal and job characteristics':[20 ,4,-9],
'Education Occupation Job Employment':[-7,6,-6],
'experience contract':[2,2,-1],'Employment contract':[0,0,':']})
问题是:“解释部分”列包含 4 列数据,并且某些数据显示为符号,例如“:”。
我正在考虑使用
df[['Personal and job characteristics',
'Education Occupation Job Employment',
'experience contract',
'experience contract']] = df['Explained part'].str.split(" ",expand=True,)
但我无法让它工作。
我想将列拆分为 3,但由于某些单元格已拆分数字。 有任何想法吗 ?
先谢谢了~ 附言。我已经更新了问题,因为我认为我的第一篇文章太难理解了,我现在添加了实际问题中的一些数据,并添加了预期的输出,感谢迄今为止的反馈!。
最佳答案
如果 NaN
缺少值,首先按 DataFrame.dropna
删除包含它们的行然后使用 DataFrame.pop
应用您的解决方案对于提取列:
df_current= pd.DataFrame({'Country': [np.nan,np.nan,np.nan,np.nan,'Denmark', 'Sweden',
'Germany'],
'Explained Part':['Personal and job characteristics',
'Education Occupation Job Employment', 'experience contract',
'Employment contract','20 -7 2 0','4 6 2 0', '-9 -6 -1 :']})
print (df_current)
Country Explained Part
0 NaN Personal and job characteristics
1 NaN Education Occupation Job Employment
2 NaN experience contract
3 NaN Employment contract
4 Denmark 20 -7 2 0
5 Sweden 4 6 2 0
6 Germany -9 -6 -1 :
<小时/>
df = df_current.dropna(subset=['Country']).copy()
cols = ['Personal and job characteristics','Education Occupation Job Employment',
'experience contract','Employment contract']
df[cols] = df.pop('Explained Part').str.split(expand=True)
print (df)
Country Personal and job characteristics \
4 Denmark 20
5 Sweden 4
6 Germany -9
Education Occupation Job Employment experience contract Employment contract
4 -7 2 0
5 6 2 0
6 -6 -1 :
或者没有pop
:
df = df_current.dropna(subset=['Country']).copy()
cols = ['Personal and job characteristics','Education Occupation Job Employment',
'experience contract','Employment contract']
df[cols] = df['Explained Part'].str.split(expand=True)
df = df.drop('Explained Part', axis=1)
关于python - 将一列拆分为多列/清洗数据集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60074155/