python - 使用 read_json 函数将 json 行规范中的 json 转换为 panda？

Scrapy 和 python 中的一些其他库开始编写和读取 json 文件的 json 行格式:

我尝试使用 json lines 转换 json 文件使用 read_json(...) 函数规范 Pandas 数据框:

我的文件“input.json”与此类似，捕获一行:

{"A": {"page": 1, "name": "foo", "url": "xxx"}, "B": {"page": 1, "name": "bar", "url": "http://xxx"}, "C": {"page": 3, "name": "foo", "url": "http://xxx"}}
{"D": {"page": 2, "name": "bar", "url": "xxx"}, "E": {"page": 2, "name": "bar", "url": "http://xxx"}, "F": {"page": 3, "name": "foo", "url": "http://xxx"}}

我想要的输出:

  page name url
A 1    foo  http://xxx
B 1    bar  http://xxx
C 3    foo  http://xxx
D 2    bar  http://xxx
E 2    bar  http://xxx
F 3    boo  http://xxx

最初，我尝试使用它，但结果不正确:

print( pd.read_json("file:///input.json", orient='index', lines=True))

我在 panda doc 中看到 orient='index'使用此规范 {index -> {column -> value}} 但产生的结果表明我不明白某些事情:

                                                 0                                                1
A         {'page': 1, 'url': 'xxx', 'name': 'foo'}                                              NaN
B  {'page': 1, 'url': 'http://xxx', 'name': 'bar'}                                              NaN
C  {'page': 3, 'url': 'http://xxx', 'name': 'foo'}                                              NaN
D                                              NaN         {'page': 2, 'url': 'xxx', 'name': 'bar'}
E                                              NaN  {'page': 2, 'url': 'http://xxx', 'name': 'bar'}
F                                              NaN  {'page': 3, 'url': 'http://xxx', 'name': 'foo'}

最佳答案

您可以考虑结合使用 stack()、reset_index() 和 apply() 来获得您想要的结果。你只需要两行:

df = pd.read_json("file:///input.json", orient='index', lines=True).stack().reset_index(level=1, drop=True)

# Here the .stack() basically flattens your extraneous columns into one.
# .reset_index() is to remove the extra index level that was added by stack()
#
# df
#
# A           {'page': 1, 'name': 'foo', 'url': 'xxx'}
# B    {'page': 1, 'name': 'bar', 'url': 'http://xxx'}
# C    {'page': 3, 'name': 'foo', 'url': 'http://xxx'}
# D           {'page': 2, 'name': 'bar', 'url': 'xxx'}
# E    {'page': 2, 'name': 'bar', 'url': 'http://xxx'}
# F    {'page': 3, 'name': 'foo', 'url': 'http://xxx'}
# dtype: object

df = df.apply(pd.Series, index=df[0].keys())

# Here you use .apply() to extract the dictionary into columns by applying them as a Series.
# the index keyword is to sort it per the keys of first dictionary in the df.
#
# df
#
#        page name         url
#  A        1  foo         xxx
#  B        1  bar  http://xxx
#  C        3  foo  http://xxx
#  D        2  bar         xxx
#  E        2  bar  http://xxx
#  F        3  foo  http://xxx

有点 hack，但可以帮助您在不经过循环的情况下正确解释 jsonlines。

关于python - 使用 read_json 函数将 json 行规范中的 json 转换为 panda？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48692190/

python - 使用 read_json 函数将 json 行规范中的 json 转换为 panda？

上一篇：Python str.contains 其他列相关的函数

下一篇：python - 如何使用 numpy 数组有效地获取由特定值选择的索引列表？