我得到的印象是,这在单词上是不可能的,但是我认为,如果您要查找的是一个很长的论文中任何地方出现的以相同顺序排列的3-4个单词,那么我会发现相同短语的重复。
我从以前的论文中复制并粘贴了很多文档,希望能找到一种简单的方法来查找此40多页文档中的重复信息,其中有很多不同的格式,但是我愿意暂时摆脱格式的限制寻找重复的信息。
最佳答案
要突出显示所有重复的句子,您还可以使用ActiveDocument.Sentences(i)
。这是一个例子
LOGIC
1)从单词文档中获取数组中的所有句子
2)对数组进行排序
3)提取重复项
4)突出显示重复项
CODE
Option Explicit
Sub Sample()
Dim MyArray() As String
Dim n As Long, i As Long
Dim Col As New Collection
Dim itm
n = 0
'~~> Get all the sentences from the word document in an array
For i = 1 To ActiveDocument.Sentences.Count
n = n + 1
ReDim Preserve MyArray(n)
MyArray(n) = Trim(ActiveDocument.Sentences(i).Text)
Next
'~~> Sort the array
SortArray MyArray, 0, UBound(MyArray)
'~~> Extract Duplicates
For i = 1 To UBound(MyArray)
If i = UBound(MyArray) Then Exit For
If InStr(1, MyArray(i + 1), MyArray(i), vbTextCompare) Then
On Error Resume Next
Col.Add MyArray(i), """" & MyArray(i) & """"
On Error GoTo 0
End If
Next i
'~~> Highlight duplicates
For Each itm In Col
Selection.Find.ClearFormatting
Selection.HomeKey wdStory, wdMove
Selection.Find.Execute itm
Do Until Selection.Find.Found = False
Selection.Range.HighlightColorIndex = wdPink
Selection.Find.Execute
Loop
Next
End Sub
'~~> Sort the array
Public Sub SortArray(vArray As Variant, i As Long, j As Long)
Dim tmp As Variant, tmpSwap As Variant
Dim ii As Long, jj As Long
ii = i: jj = j: tmp = vArray((i + j) \ 2)
While (ii <= jj)
While (vArray(ii) < tmp And ii < j)
ii = ii + 1
Wend
While (tmp < vArray(jj) And jj > i)
jj = jj - 1
Wend
If (ii <= jj) Then
tmpSwap = vArray(ii)
vArray(ii) = vArray(jj): vArray(jj) = tmpSwap
ii = ii + 1: jj = jj - 1
End If
Wend
If (i < jj) Then SortArray vArray, i, jj
If (ii < j) Then SortArray vArray, ii, j
End Sub
SNAPSHOTS
之前
之后
关于vba - 突出显示(而不是删除)重复的句子或短语,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10301009/