使用来自的描述功能 Online retail dataset , 我创建了一个词云。
descCorpus <- Corpus(VectorSource(without_weird$Description))
descCorpus <- tm_map(descCorpus, removePunctuation)
descCorpus <- tm_map(descCorpus, removeWords, c('the', 'this',
stopwords('english')))
descCorpus <- tm_map(descCorpus, stemDocument)
wordcloud(descCorpus , max.words = 100, random.order = FALSE)
但是,我希望词云的决定性特征是销售额而不是频率。所以销售额越高,这个词就越大。
可重现的例子:
description <- c("36 PENCILS TUBE RED RETROSPOT","HANGING HEART JAR T-LIGHT HOLDER","VICTORIAN SEWING BOX LARGE","CINAMMON SET OF 9 T-LIGHTS","ZINC T-LIGHT HOLDER STARS SMALL","T-LIGHT HOLDER","RABBIT NIGHT LIGHT","WHITE SOAP RACK WITH 2 BOTTLES","BOUDOIR SQUARE TISSUE BOX", "WHITE SKULL HOT WATER BOTTLE","STRAWBERRY CERAMIC TRINKET POT")
sales <-c(4.56,24.96,11.40,15.00,17.85,10.50,20.40,27.04,20.40,15.00,13.00)
df <- data.frame(description, sales)
最佳答案
这是一个使用精彩的 wordcloud2
包的例子。
使用您的小示例数据,我们得到
description <- c("36 PENCILS TUBE RED RETROSPOT","HANGING HEART JAR T-LIGHT HOLDER","VICTORIAN SEWING BOX LARGE","CINAMMON SET OF 9 T-LIGHTS","ZINC T-LIGHT HOLDER STARS SMALL","T-LIGHT HOLDER","RABBIT NIGHT LIGHT","WHITE SOAP RACK WITH 2 BOTTLES","BOUDOIR SQUARE TISSUE BOX", "WHITE SKULL HOT WATER BOTTLE","STRAWBERRY CERAMIC TRINKET POT")
sales <-c(4.56,24.96,11.40,15.00,17.85,10.50,20.40,27.04,20.40,15.00,13.00)
df <- data.frame(description, sales)
wordcloud2
函数需要将变量命名为 word
和 freq
所以我们这样做。句子很长,所以我使用 size
参数缩小了整体大小。
library(dplyr)
library(wordcloud2)
df %>% rename(word=description, freq=sales) %>% wordcloud2(size=.1)
这会产生以下内容(并且它是顶部的交互式 html 小部件!)
根据你的原始数据,我得到了这样的结果(不确定这是你之后的特定数据争论,indata
是读取的 excel 文件)
indata %>% group_by(Description) %>% count(Quantity) %>%
rename(freq=n, word=Description) %>%
wordcloud2(size=1, minSize=3)
看起来像这样
更新:如果你想计算字数并显示它们,我会使用 tidytext
:
library(tidytext)
indata %>% unnest_tokens(word, Description, token="words") %>% group_by(word) %>% tally(Quantity) %>% rename(freq=n) %>% ungroup() %>% wordcloud2(minSize=5)
这个结果
您可能需要跳过这些障碍,删除您已经在 OP 中暗示的数字和停用词。
关于r - R 中的 Wordcloud 使用不同的功能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46118149/