python - 拆分数据框中的字符串

标签 python python-3.x pandas python-2.7 dataframe

我有一个像这样的数据框:

col1|col2
{"test":"23","test1":"12"}|1992
{"test":"24","test1":"19","test3":"24"}|1993
{"test":"27","test1":"20","test3":"21","test4":"40"}|1994

我想要一个像这样的数据框:

col1_a|col1_b|col2
test|23|1992
test1|12|1992
test|24|1993
test1|19|1993
.
.
.
.
.
.

我怎样才能实现这个解决方案？虽然数据是类型字典，但它在数据帧中存储为字符串

最佳答案

考虑下面的 df 例如:

In [2063]: df = pd.DataFrame({'col1':[{"test":"23","test1":"12"}, {"test":"24","test1":"19","test3":"24"}, {"test":"27","test1":"20","test3":"21","test4":"40"}], 'col2':[1992, 1993, 1994]})

In [2064]: df
Out[2064]: 
                                                col1  col2
0                      {'test': '23', 'test1': '12'}  1992
1       {'test': '24', 'test1': '19', 'test3': '24'}  1993
2  {'test': '27', 'test1': '20', 'test3': '21', '...  1994

您可以使用df.apply与 df.explode() :

In [2085]: df.col1 = df.col1.apply(lambda x: list(x.items()))

In [2086]: df = df.explode('col1')

In [2091]: df[['col1_a', 'col1_b']] = pd.DataFrame(df.col1.tolist(), index=df.index)

In [2093]: df = df[['col1_a', 'col1_b', 'col2']]

In [2094]: df
Out[2094]: 
  col1_a col1_b  col2
0   test     23  1992
0  test1     12  1992
1   test     24  1993
1  test1     19  1993
1  test3     24  1993
2   test     27  1994
2  test1     20  1994
2  test3     21  1994
2  test4     40  1994

关于python - 拆分数据框中的字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64847859/

上一篇：node.js - npm 和 package.json 中的所有者存储库出现问题 _ npm ERR! 400 错误请求(npm 发布)

下一篇：java - 将包含 double 的 .txt 文件转换为 ArrayList

相关文章：

python - python中的异常

python - 将发散颜色居中至零

python - 如何更改outputtypehandler中的datetime.date类型？

python - 为什么 Xtick 标签没有对齐？

python - Pygame 游戏中角色被卡住

python - 如何将 django 查询集从模型方法返回到模板？

python - pyodbc.ProgrammingError : ('42000' , "[42000] [Microsoft][ODBC SQL Server 驱动程序][SQL Server]附近的语法不正确

python - 为 ARM 交叉编译静态 Python 3.5/3.6

python - 如何使用 Pandas 查找两个不同日期时间之间的最小值？

python - 从 DataFrame 列中提取特定字符/文本