python - 在一个单元格中转换具有多个值的数据框

我有一个如下所示的数据框

id                          value       index
5eb3cbcc434474213e58b49a    [1,2,3,4,6] [0,1,2,3,4]
5eb3f335434474213e58b49d    [1,2,3,4]   [0,2,3,4]
5eb3f853434474213e58b49f    [1,2,3,4]   [0,2,3,4]
5eb40395434474213e58b4a2    [1,2,3,4]   [0,1,2,3]
5eb40425434474213e58b4a5    [1,2]       [0,2]

我尝试在以下问题中转换此数据框，因为索引旨在作为每个单独值的标题，看起来像这样:

id                          0   1   2   3   4
5eb3cbcc434474213e58b49a    1   2   3   4   6
5eb3f335434474213e58b49d    1   Nan 2   3   4
5eb3f853434474213e58b49f    1   Nan 2   3   4
5eb40395434474213e58b4a2    1   2   3   4   Nan
5eb40425434474213e58b4a5    1   Nan 2   Nan Nan

我尝试首先拆分列表列表:

new_df = pd.DataFrame(df.Value.str.split(',').tolist(), index=df.Index).stack()
new_df = new_df.reset_index([0, 'Index'])
new_df.columns = ['Value', 'Index']

但是我收到了错误

TypeError: unhashable type: 'list'

是什么导致了这个错误？

最佳答案

您可以使用 .apply()连同 pd.Series() ，如下:

df = df.set_index('id').apply(lambda x: pd.Series(x['value'], index=x['index']), axis=1).reset_index()


print(df)

                         id    0    1    2    3    4
0  5eb3cbcc434474213e58b49a  1.0  2.0  3.0  4.0  6.0
1  5eb3f335434474213e58b49d  1.0  NaN  2.0  3.0  4.0
2  5eb3f853434474213e58b49f  1.0  NaN  2.0  3.0  4.0
3  5eb40395434474213e58b4a2  1.0  2.0  3.0  4.0  NaN
4  5eb40425434474213e58b4a5  1.0  NaN  2.0  NaN  NaN

这利用了.apply()功能特点:

The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.

此功能非常方便，可帮助我们为需要将数据扩展至列的问题提供简单的解决方案，同时通过保留现有行索引并将其代代到这些新列，将新列合并到现有数据中。我用它来提供 simple answer回答一个经典问题:How to merge a Series and DataFrame .

关于python - 在一个单元格中转换具有多个值的数据框，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67225274/

python - 在一个单元格中转换具有多个值的数据框

上一篇：haskell - 如何在组合类型类的函数时添加中间值的类型注释？

下一篇：Xcode 12 不支持新的 iOS 版本