python - 从python中的复杂数据结构中提取数据

我有一个像

这样的数据结构

[ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]... ]}  ]  },
  {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ]... ]}  ] }, ...]

它是一个包含很多字典的列表，每个字典有3对 'uid': 'test_subject145', 'class':'?', 'data':[] . 在最后一对 'data' ，值是一个列表，它又包含一个字典，其中有2对 'chunk':1, 'writing':[] , 在对 'writing' 中，它的值为 a list 又包含 many lists。我要提取的是所有这些句子的内容，如 'this is exciting'和 'you are good'等等，然后放入一个简单的列表中。它的最终形式应该是 list_final = ['this is exciting', 'you are good', 'he died',... ]

最佳答案

鉴于您的原始列表名为 input，只需使用列表理解:

[elem for <b>dic</b> in input
      for dat in dic.<b>get('data',())</b>
      for writing in dat.<b>get('writing',())</b>
      for elem in writing]

您可以使用 .get(..,()) 这样如果没有这样的键，它仍然有效:如果没有这样的键，我们返回空元组 () 所以没有迭代。

根据您的示例输入，我们得到:

>>> input = [ {'uid': 'test_subject145', 'class':'?',  'data':[  {'chunk':1, 'writing':[ ['this is exciting'],[ 'you are good' ]]}  ]  },
...       {'uid': 'test_subject166', 'class':'?',  'data':[  {'chunk':2, 'writing':[ ['he died'],[ 'go ahead' ] ]}  ] }]
>>> 
>>> [elem for dic in input
...       for dat in dic.get('data',())
...       for writing in dat.get('writing',())
...       for elem in writing]
['this is exciting', 'you are good', 'he died', 'go ahead']

关于python - 从python中的复杂数据结构中提取数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43000315/

python - 从python中的复杂数据结构中提取数据

上一篇：python - Pandas - 按日期识别最后一行

下一篇：python - 清除 Jupyter Python 笔记本中的 MatPlotLib 图