我按照 here 的指示进行操作
在幻灯片中。 9 tolower 在我使用过的 tm 0.6 及以上版本中存在问题
myCorpus <- tm_map(myCorpus, content_transformer(tolower)
它与此重复stackoverflow 但运行 StemCompletion 时仍然出现错误
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)
我关注这个instruction对于变量 myCorpus 和 myCorpusCopy 到 PlainTextDocument
corpus <- tm_map(corpus, PlainTextDocument)
我能够执行
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)
但我收到 50 条警告
There were 50 or more warnings (use warnings() to see the first 50) warnings()
我收到了全部 50 条警告:
1: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 2: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 3: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 4: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 5: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 6: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 7: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 8: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 9: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 10: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used
我尝试忽略警告并创建 TermDocumentMatrix()
tdm <- TermDocumentMatrix(myCorpus, control = list(wordLengths = c(1,
Inf)))
我收到错误:
Error: inherits(doc, "TextDocument") is not TRUE
最佳答案
以下是创建词干术语文档矩阵并随后重新完成词干标记的方法:
txt <- " was followed the instruction from here In slide no. 9 tolower has issue in package tm 0.6 and above I have used "
myCorpus <- Corpus(VectorSource(txt))
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
tdm <- TermDocumentMatrix(myCorpus, control = list(stemming = TRUE))
cbind(stems = rownames(tdm), completed = stemCompletion(rownames(tdm), myCorpus))
# stems completed
# 0.6 "0.6" "0.6"
# abov "abov" "above"
# and "and" "and"
# follow "follow" "followed"
# from "from" "from"
# has "has" "has"
# have "have" "have"
# here "here" "here"
# instruct "instruct" "instruction"
# issu "issu" "issue"
# no. "no." "no."
# packag "packag" "package"
# slide "slide" "slide"
# the "the" "the"
# tolow "tolow" "tolower"
# use "use" "used"
# was "was" "was"
关于R StemCompletion 中的警告和 TermDocumentMatrix 中的错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30321770/