python - 从 pandas 数据帧内的 nltk.stanford.StanfordDependencyParser 中解压 list_iterator

我正在尝试在 pandas DataFrame 内使用斯坦福依赖解析器。

from nltk.parse import stanford
import pandas as pd
dep_parser=stanford.StanfordDependencyParser()
df = pd.DataFrame({'ID' : [0,1,2], 'sentence' : ['This is the first s.', 'This is the 2nd s.', 'This isn''t the third s.']})
df['parsed'] = df.sentence.apply(dep_parser.raw_parse)
print(df)

   ID                sentence                                        parsed
0   0    This is the first s.  <list_iterator object at 0x000000000E849C18>
1   1      This is the 2nd s.  <list_iterator object at 0x000000000E8691D0>
2   2  This isnt the third s.  <list_iterator object at 0x000000000E8696A0>

但我更喜欢 DataFrame 列内部依赖图的文本表示，而不是迭代器，如下所示:

    ID                sentence                                        parsed
0   0    This is the first s.  [[(('s.', 'NN'), 'nsubj', ('This', 'DT')),(('s.', 'NN'), 'cop', ('is', 'VBZ')), (('s.', 'NN'), 'det', ('the', 'DT')),(('s.', 'NN'), 'amod', ('first', 'JJ'))]]
                   ...

我尝试按照 pandas 中的步骤遵循 nltk 文档，但会导致属性错误:

 df['dep'] = [list(parse.triples()) for parse in df.parsed]
 AttributeError: 'list_iterator' object has no attribute 'triples'

有没有办法解压在 DataFrame 中显示为值的迭代器？欢迎任何帮助。

最佳答案

list_iterator 是一种“按需”生成列表的机制。它确实没有方法 triples()，但它在您的情况下生成的列表确实是一个三元组列表:

df['dep'] = [list(parse) for parse in df['parsed']]

关于python - 从 pandas 数据帧内的 nltk.stanford.StanfordDependencyParser 中解压 list_iterator，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41274728/

上一篇：python - 如何运行本地服务器并从同一个 python 程序打开 url？

下一篇：python - 为什么 Monkeypatching os.path 需要路径参数？

相关文章：

python - 通过比较多列来组合 pandas DataFrame 中的行

python - 如何从句子中提取名词形容词对

python - Python NLTK 中的形容词名词化

python - 加入日期约束

python - 如何连接 pandas 中的 DatetimeIndex 对象？

python - 如何在Python中的语料库上使用 "collocation_list"函数？

python - 使用 SymPy 简化索引指数

jquery - 使用 jQuery 显示新的 Django 对象

python - 什么 HTTP 框架用于简单但可扩展的应用程序？

python - 我是否有办法在不破坏 python 代码缩进的情况下安排多个文字行进行显示？