excel - 列表中最常见的单词

标签 excel list frequency

我在 Excel 中有一个列表,其中一部分如下所示:

Food and Human Nutrition
Food and Human Nutrition with Placement
Food and Nutrition with Professional Experience
Food Marketing and Nutrition
Food Marketing and Nutrition with Placement
Food, Nutrition and Health

我想在此列表中查找 n 个最常用的单词。我尝试使用这个公式来查找最常见的单词:

=INDEX(rng,MODE(MATCH(rng,rng,0)))

问题在于,它将每个单元格视为单个字符串,并且由于 6 行中的每一行都不同,因此无法找到最常见的单词。我希望它做的是输出“食物”、“营养”和“和”作为最常见的单词,然后是“营销”、“放置”、“与”等。

最佳答案

这是一个 VBA 宏,可以提供您似乎想要的内容。

  • 使用字典对象测试唯一性
  • 在字典中进行计数
  • 然后对结果进行排序

仔细阅读代码中的注释以了解需要做出的假设。以及需要设置的引用

另请注意,标点符号可能会导致相同的单词被计入不同的类别。如果这可能是一个问题,我们只需要以不同的方式分割源数据,要么在分割空格之前消除所有标点符号,要么使用正则表达式进行分割。

'Set Reference to Microsoft Scripting Runtime

Option Explicit
Sub UniqueWordCounts()
    Dim wsSrc As Worksheet, wsRes As Worksheet
    Dim rSrc As Range, rRes As Range
    Dim vSrc As Variant, vRes As Variant
    Dim vWords As Variant
    Dim dWords As Dictionary
    Dim I As Long, J As Long
    Dim V As Variant, vKey As Variant

'Assume source data is in column 1, starting at A1
'  Could easily be anyplace
Set wsSrc = Worksheets("sheet2")
With wsSrc
    Set rSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With

'Results to go a few columns over
Set wsRes = Worksheets("sheet2")
    Set rRes = rSrc(1, 1).Offset(0, 2)

'Read source data into vba array (for processing speed)
vSrc = rSrc

'Collect individual words and counts into dictionary
Set dWords = New Dictionary
    dWords.CompareMode = TextCompare

For I = 1 To UBound(vSrc, 1)

    'Split the sentence into individual words
    For Each vKey In Split(vSrc(I, 1))
        If Not dWords.Exists(vKey) Then
            dWords.Add Key:=vKey, Item:=1
        Else
            dWords(vKey) = dWords(vKey) + 1
        End If
    Next vKey
Next I

'Size results array
ReDim vRes(0 To dWords.Count, 1 To 2)

'Column headers
    vRes(0, 1) = "Word"
    vRes(0, 2) = "Count"

'Populate the columns
    I = 0
    For Each V In dWords.Keys
        I = I + 1
        vRes(I, 1) = V
        vRes(I, 2) = dWords(V)
    Next V

'Size results range
Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2))

'Populate, format and sort the Results range
With rRes
    .EntireColumn.Clear
    .Value = vRes
    With .Rows(1)
        .Font.Bold = True
        .HorizontalAlignment = xlCenter
    End With
    .EntireColumn.AutoFit
    .Sort key1:=.Columns(2), order1:=xlDescending, key2:=.Columns(1), order2:=xlAscending, MatchCase:=False, Header:=xlYes
End With

End Sub

enter image description here

关于excel - 列表中最常见的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47771738/

相关文章:

excel - VBA 错误函数 InstrRev = Instr

c# - 从 c# 读取 excel,有什么新的吗?

python - 为什么在循环变量使用相同名称时列表会被最后一项覆盖?

java - double 组——最常见的值方法? (没有 HashMap 或排序)

python - 计算html文件中的短语频率

vba - Excel VBA - PivotItems 返回无效值

html - 有两个相邻的导航子菜单 block

list - NetLogo - 将单个值添加到列表列表

python - 在 AudioLazy 库中从 python 中的 zfilter 对象中提取数值

Excel 链接到 JIRA 问题,强制重新身份验证