python - 将一列拆分为多列/清洗数据集

所以我已经将一个表从 pdf 初始化为 pandas Dataframe，如下所示:

df_current= pd.DataFrame({'Country': ['NaN','NaN','Nan','NaN','Denmark', 'Sweden',
                            'Germany'],
                 'Explained Part':['Personal and job characteristics',
                'Education Occupation Job Employment', 'experience contract',
'Employment contract','20 -7 2 0','4 6 2 0', '-9 -6 -1 :']})

预期(或我最终目标的输出):

df_expected = pd.DataFrame({'Country': ['Denmark', 'Sweden',
'Germany'],'Personal and job characteristics':[20 ,4,-9],
'Education Occupation Job Employment':[-7,6,-6],
'experience contract':[2,2,-1],'Employment contract':[0,0,':']})

问题是:“解释部分”列包含 4 列数据，并且某些数据显示为符号，例如“:”。

我正在考虑使用

     df[['Personal and job characteristics',
'Education Occupation Job Employment',
'experience contract',
'experience contract']] = df['Explained part'].str.split(" ",expand=True,)

但我无法让它工作。

我想将列拆分为 3，但由于某些单元格已拆分数字。有任何想法吗？

先谢谢了~ 附言。我已经更新了问题，因为我认为我的第一篇文章太难理解了，我现在添加了实际问题中的一些数据，并添加了预期的输出，感谢迄今为止的反馈!。

最佳答案

如果 NaN 缺少值，首先按 DataFrame.dropna 删除包含它们的行然后使用 DataFrame.pop 应用您的解决方案对于提取列:

df_current= pd.DataFrame({'Country': [np.nan,np.nan,np.nan,np.nan,'Denmark', 'Sweden',
                            'Germany'],
                 'Explained Part':['Personal and job characteristics',
                'Education Occupation Job Employment', 'experience contract',
'Employment contract','20 -7 2 0','4 6 2 0', '-9 -6 -1 :']})
print (df_current)
   Country                       Explained Part
0      NaN     Personal and job characteristics
1      NaN  Education Occupation Job Employment
2      NaN                  experience contract
3      NaN                  Employment contract
4  Denmark                            20 -7 2 0
5   Sweden                              4 6 2 0
6  Germany                           -9 -6 -1 :

<小时/>

df = df_current.dropna(subset=['Country']).copy()
cols = ['Personal and job characteristics','Education Occupation Job Employment',
        'experience contract','Employment contract']
df[cols] = df.pop('Explained Part').str.split(expand=True)
print (df)
   Country Personal and job characteristics  \
4  Denmark                               20   
5   Sweden                                4   
6  Germany                               -9   

  Education Occupation Job Employment experience contract Employment contract  
4                                  -7                   2                   0  
5                                   6                   2                   0  
6                                  -6                  -1                   :

或者没有pop:

df = df_current.dropna(subset=['Country']).copy()
cols = ['Personal and job characteristics','Education Occupation Job Employment',
        'experience contract','Employment contract']
df[cols] = df['Explained Part'].str.split(expand=True)
df = df.drop('Explained Part', axis=1)

关于python - 将一列拆分为多列/清洗数据集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60074155/

python - 将一列拆分为多列/清洗数据集

上一篇：python - 如何禁用内部 pytest 警告？

下一篇：python - 如何修复错误:invalid literal for int() with base 10: 'Luck' ?