python - Gensim:ValueError:无法创建意图(缓存|隐藏)|可选数组 - 必须定义尺寸但得到(0,)

标签 python gensim latent-semantic-indexing

我正在尝试模拟某些文档的流式传输,并更新流入的其他文档的 LSI。我发现这个错误:

Traceback (most recent call last):
  File "gensimStreamGen_tutorial5.py", line 57, in <module>
    for vector in corpus_memory_friendly: # load one vector into memory at a time
  File "gensimStreamGen_tutorial5.py", line 44, in __iter__
    lsi = models.LsiModel(corpus, num_topics=10) # initialize an LSI transformation
  File "/Users/Desktop/gensim-0.12.0/gensim/models/lsimodel.py", line 331, in __init__
    self.add_documents(corpus)
  File "/Users/Desktop/gensim-0.12.0/gensim/models/lsimodel.py", line 388, in add_documents
    update = Projection(self.num_terms, self.num_topics, job, extra_dims=self.extra_samples, power_iters=self.power_iters)
  File "/Users/Desktop/gensim-0.12.0/gensim/models/lsimodel.py", line 126, in __init__
    extra_dims=self.extra_dims)
  File "/Users/Desktop/gensim-0.12.0/gensim/models/lsimodel.py", line 677, in stochastic_svd
    q, _ = matutils.qr_destroy(y) # orthonormalize the range
  File "/Users/Desktop/gensim-0.12.0/gensim/matutils.py", line 398, in qr_destroy
    qr, tau, work, info = geqrf(a, lwork=-1, overwrite_a=True)
ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (0,)

流文档和更新LSI模型的代码:

class MyCorpus(object):
    def __iter__(self):
        for document in documents:
            # Stream-in documents and build TF-IDF model to construct new_vec
            yield new_vec
            corpus.append(new_vec)
            tfidf = models.TfidfModel(corpus)
            corpus_tfidf = tfidf[corpus]
            lsi = models.LsiModel(corpus_tfidf,  num_topics=2)
            corpus_lsi = lsi[corpus_tfidf]
            lsi.print_topics(2)
            for doc in corpus_lsi:
                print(doc)

corpus_memory_friendly = MyCorpus()
for vector in corpus_memory_friendly:
    print(vector)

语料库每次迭代都会获得一个新的 new_vec。不同迭代的每次产量的 new_vec:

[]
[(0, 1)]
[(1, 1), (2, 1), (3, 1)]
[(3, 2), (4, 1), (5, 1)]
[(2, 1), (6, 1), (7, 1)]
[]
[(8, 1)]
[(8, 1), (9, 1)]
[(9, 1), (10, 1), (11, 1)]

第一次迭代时出现错误(预期 new_vec 中的第一行)。其余的是 new_vec 的预期输出。

最佳答案

我认为是因为您的文档中的数据有空白 尝试添加

if(document!=[]and document!=[[]])

关于python - Gensim:ValueError:无法创建意图(缓存|隐藏)|可选数组 - 必须定义尺寸但得到(0,),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31512853/

相关文章:

python - 加载在 Python 2 和 Python 3 中计算的 gensim Word2Vec

python - 在gensim中创建一个新的矢量模型

python - LSI 在 python 中使用 gensim

api - 免费的 LSI 服务或 API 来获取相关关键字

python - 将由文章组成的语料库标记为句子 Python

python - gensim 生成LSI模型导致 "Python has stopped working"

python - ProgrammingError at "url"关系 "app_model"不存在 LINE 1 : SELECT COUNT(*) AS "__count" FROM "app_model"

python - Pylance 在 VSCode Jupyter 笔记本中不起作用

python - multiprocessing.Process 在哪里

python - Pyramid 和 python 请求库之间的奇怪行为