python - 我该如何修复 NLTK 分块错误?

标签 python python-2.7 python-3.x nltk

我正在尝试使用教程 http://streamhacker.com/2008/12/29/how-to-train-a-nltk-chunker/ 训练我自己的 NLTK 分块器

我把代码写成,

>>> import nltk
>>> import nltk.chunk
>>> def conll_tag_chunks(chunk_sents):
    tag_sents = [nltk.chunk.tree2conlltags(tree) for tree in chunk_sents]
    return [[(t, c) for (w, t, c) in chunk_tags] for chunk_tags in tag_sents]

>>> import nltk.corpus, nltk.tag
>>> from nltk.metrics import accuracy
>>> def ubt_conll_chunk_accuracy(train_sents, test_sents):
    train_chunks = conll_tag_chunks(train_sents)
        test_chunks = conll_tag_chunks(test_sents)

        u_chunker = nltk.tag.UnigramTagger(train_chunks)
        print 'u:', accuracy(u_chunker, test_chunks)

        ub_chunker = nltk.tag.BigramTagger(train_chunks, backoff=u_chunker)
        print 'ub:', accuracy(ub_chunker, test_chunks)

        ubt_chunker = nltk.tag.TrigramTagger(train_chunks, backoff=ub_chunker)
        print 'ubt:', accuracy(ubt_chunker, test_chunks)

        ut_chunker = nltk.tag.TrigramTagger(train_chunks, backoff=u_chunker)
        print 'ut:', accuracy(ut_chunker, test_chunks)

        utb_chunker = nltk.tag.BigramTagger(train_chunks, backoff=ut_chunker)
        print 'utb:', accuracy(utb_chunker, test_chunks)


>>> conll_train = nltk.corpus.conll2000.chunked_sents('train.txt')
>>> conll_test = nltk.corpus.conll2000.chunked_sents('test.txt')
>>> ubt_conll_chunk_accuracy(conll_train, conll_test)

但是在这里,我得到的错误是,

>>> ubt_conll_chunk_accuracy(conll_train, conll_test)
u:

Traceback (most recent call last):
  File "<pyshell#10>", line 1, in <module>
    ubt_conll_chunk_accuracy(conll_train, conll_test)
  File "<pyshell#7>", line 6, in ubt_conll_chunk_accuracy
    print 'u:', accuracy(u_chunker, test_chunks)
  File "C:\Python27\lib\site-packages\nltk\metrics\scores.py", line 38, in accuracy
    if len(reference) != len(test):
TypeError: object of type 'UnigramTagger' has no len()
>>> treebank_sents = nltk.corpus.treebank_chunk.chunked_sents()
>>> ubt_conll_chunk_accuracy(treebank_sents[:2000], treebank_sents[2000:])
u:

Traceback (most recent call last):
  File "<pyshell#12>", line 1, in <module>
    ubt_conll_chunk_accuracy(treebank_sents[:2000], treebank_sents[2000:])
  File "<pyshell#7>", line 6, in ubt_conll_chunk_accuracy
    print 'u:', accuracy(u_chunker, test_chunks)
  File "C:\Python27\lib\site-packages\nltk\metrics\scores.py", line 38, in accuracy
    if len(reference) != len(test):
TypeError: object of type 'UnigramTagger' has no len()
>>> 

如果有人好心建议,我该如何解决这个错误?提前致谢。 我在 MS-Windows 10 上使用 NLTK 3.1、Python2.7.11。

最佳答案

查看 accuracy 的文档nltk 的方法包裹

nltk.metrics.scores.accuracy(reference, test)

reference values and a corresponding list of test values, return the fraction of corresponding values that are equal. In particular, return the fraction of indices 0

Parameters:
- reference (list) – An ordered list of reference values.
- test (list) – A list of values to compare against the corresponding reference values.

关于python - 我该如何修复 NLTK 分块错误?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35520579/

相关文章:

python - 从 imshow 开始运行 OpenCV 脚本

python - 有 python 3.x 的 LINQ 吗?

Python theano.scan taps 参数

python - 使用 pip3 安装软件包时出现“x86_64-linux-gnu-gcc”错误

python - 在Python中调用有返回值的方法

python - 使用 Paramiko Python 模块时如何避免这些 keepalive@openssh.com 日志消息?

python-3.x - 如何解决在 python 中阅读时维基百科 API 页面错误?

Python - 字典比较

python - 如何在databricks上运行python3?

python - 网络服务器的瓶加载时间非常慢