regex - 移动主题与特定 RegEx 匹配的电子邮件

标签 regex vba outlook

我想循环浏览文件夹(“abc@outlook.com”的“收件箱”)中的电子邮件,并将主题与特定 RegEx 匹配的电子邮件移动到不同的文件夹。

Sub RegExpMoveEmailToFolderSO()
    Dim MyFolder As Outlook.Folder
    Dim MyNS As NameSpace
    Dim MyEmail As Outlook.MailItem
    Dim MyItems As Outlook.Items
    Dim CountMatches As Integer
    Dim MySubject As String
    Dim MyRegExp As RegExp
    Dim MyDestinationFolder As Outlook.Folder
           
    Set MyNS = Application.GetNamespace("MAPI")
    Set MyFolder = MyNS.Folders("xyz@abc.com").Folders("Inbox")
    Set MyDestinationFolder = MyNS.Folders("uvw@def.com").Folders("Inbox")
    Set MyItems = MyFolder.Items
    Set MyRegExp = New RegExp
        
    CountMatches = 1
    MyRegExp.Pattern = "(Reg).*(Exp)"
        
    For Each Item In MyItems
        MySubject = Item.Subject
        If MyRegExp.Test(MySubject) Then
            Item.Move MyDestinationFolder
            CountMatches = CountMatches + 1
        End If
    Next
            
    MsgBox "The total number of emails moved is: " & CountMatches & "."
End Sub
与 Outlook 中的类似规则相比,这相当慢,并在我的 i7 机器上启动风扇。
这段代码有什么明显低效的吗?

最佳答案

我不是正则表达式专家,所以我使用测试工具来帮助我开发模式。我尝试将您的模式和一些变体与与您的主题相匹配的一些字符串进行匹配。我之前没有想过为不同的模式计时,但现在我已将其添加为我的测试工具的一个选项。下面的结果并不像我预期的那样。

Pattern        Text                   Duration

(Reg).*(Exp)   xxxRegyyyExpzzz        0.00000216
(Reg).*(Exp)   xxxxRegExpzzz          0.00000212
(Reg).*(Exp)   xxxxxRegyEyyExpzzz     0.00000220
(Reg).*(Exp)   xxxxxxRegyyExyExpzzz   0.00000220

Reg.*Exp       xxxRegyyyExpzzz        0.00000199
Reg.*Exp       xxxxRegExpzzz          0.00000198
Reg.*Exp       xxxxxRegyEyyExpzzz     0.00000204
Reg.*Exp       xxxxxxRegyyExyExpzzz   0.00000205

Reg.*?Exp      xxxRegyyyExpzzz        0.00000205
Reg.*?Exp      xxxxRegExpzzz          0.00000188
Reg.*?Exp      xxxxxRegyEyyExpzzz     0.00000214
Reg.*?Exp      xxxxxxRegyyExyExpzzz   0.00000220

为 VBA 例程计时很困难,因为后台解释器和操作系统例程会显着影响计时。在总持续时间足以让我认为平均持续时间可靠之前,我必须将重复次数增加到 10,000,000。

正如您所看到的,删除捕获括号可以节省一点时间,尽管在您注意到之前您需要数以千计的电子邮件。只有“Reg”和“Exp”之间的字符数似乎有很大影响。

我不明白为什么前两种模式有效。 .*据说很贪心。它应该匹配直到字符串末尾或下一个换行符的每个字符。该模式不应找到“Exp”,因为它们与 .* 匹配.只有懒人.*?当它找到“Exp”时应该停止匹配字符。要么我误解了贪婪匹配与惰性匹配,要么 VBA 正则表达式引擎不处理 .*作为贪婪。

我的结论是正则表达式匹配不是你的例行程序缓慢的原因。我建议你试试蒂姆的建议。 IAmANerd2000 添加了一个演示 Tim 建议的例程,但他/她已将其删除。 (我可以看到删除的答案,因为我的声誉超过 10K。)也许 Tim 想添加一个答案来证明他的建议。

我在下面包含了我的测试工具,以防您发现它有帮助。每个模式和文本的输出是:
===========================================
   Pattern: "(Reg).*(Exp)"
      Text: "xxxRegyyyExpzzz"
Av Durat'n: 0.00000216
-------------------------------------------
     Match: 1
     Value: "RegyyyExp"
    Length: 9
FirstIndex: 3
  SubMatch: 1 "Reg"
  SubMatch: 2 "Exp"
===========================================

Option Explicit
Sub Test2()

  Dim Patterns As Variant
  Dim Texts As Variant

  Texts = Array("xxxRegyyyExpzzz", _
                "xxxxRegExpzzz", _
                "xxxxxRegyEyyExpzzz", _
                "xxxxxxRegyyExyExpzzz")

  Patterns = Array("(Reg).*(Exp)", _
                   "Reg.*Exp", _
                   "Reg.*?Exp")

  Call TestCapture(Patterns, Texts, True)

End Sub
Sub TestCapture(ByRef Patterns As Variant, ByRef Texts As Variant, _
                Optional ByVal TimeDuration As Boolean = False)

  ' Patterns      an array of patterns to be tested
  ' Texts         an array of text to be matched against the patterns
  ' TimeDuration  if True, record the average duration of the match

  ' Attempts to match each text against each pattern and reports on the result
  ' If TimeDuration is True, repeats the match 10,000,000 times and reports the
  ' average duration so the efficiency of different patterns can be determined

  Dim CountCrnt As Long
  Dim CountMax As Long
  Dim InxM As Long
  Dim InxS As Long
  Dim Matches As MatchCollection
  Dim PatternCrnt As Variant
  Dim RegEx As New RegExp
  Dim TimeEnd As Double
  Dim TimeStart As Double
  Dim SubMatchCrnt As Variant
  Dim TextCrnt As Variant

  With RegEx
    .Global = True         ' Find all matches
    .MultiLine = False     ' Match cannot extend across linebreak
    .IgnoreCase = True

    For Each PatternCrnt In Patterns
     .Pattern = PatternCrnt

      For Each TextCrnt In Texts
        Debug.Print "==========================================="
        Debug.Print "   Pattern: """ & PatternCrnt & """"
        Debug.Print "      Text: """ & TidyTextForDspl(TextCrnt) & """"
        If TimeDuration Then
          CountMax = 10000000
          TimeStart = Timer
        Else
          CountMax = 1
        End If
        For CountCrnt = 1 To CountMax
          If Not .test(TextCrnt) Then
            Debug.Print Space(12) & "Text does not match pattern"
            Exit For
          Else
            Set Matches = .Execute(TextCrnt)
            If CountCrnt = CountMax Then
              TimeEnd = Timer
              If TimeDuration Then
                Debug.Print "Av Durat'n: " & Format((TimeEnd - TimeStart) / CountMax, "0.00000000")
              End If
              If Matches.Count = 0 Then
                Debug.Print Space(12) & "Match but no captures"
              Else
                For InxM = 0 To Matches.Count - 1
                  Debug.Print "-------------------------------------------"
                  With Matches(InxM)
                    Debug.Print "     Match: " & InxM + 1
                    Debug.Print "     Value: """ & TidyTextForDspl(.Value) & """"
                    Debug.Print "    Length: " & .Length
                    Debug.Print "FirstIndex: " & .FirstIndex
                    For InxS = 0 To .SubMatches.Count - 1
                      Debug.Print "  SubMatch: " & InxS + 1 & " """ & _
                                  TidyTextForDspl(.SubMatches(InxS)) & """"
                    Next
                  End With
                Next InxM
              End If
            End If
          End If
        Next CountCrnt
      Next TextCrnt
    Next PatternCrnt
    Debug.Print "==========================================="

  End With

End Sub
Public Function TidyTextForDspl(ByVal Text As String) As String

  ' Tidy Text for dsplay by replacing white space with visible strings:
  '   Replace spaces by          ‹s› or ‹n s›
  '   Replace line feed by       ‹lf› or ‹n lf›
  '   Replace carriage return by ‹cr› or ‹n cr›
  '   Replace tab by             ‹tb› or ‹n tb›
  '   Replace non-break space by ‹nbs› or {n nbs›
  ' Where n is a count if the character repeats

  ' 15Mar16  Coded
  '  3Feb19  Replaced "{" (\x7B) and "}" (\x7D) by "‹" (\u2039) and "›" (\u203A)
  '          on the grounds that the angle quotation marks were not likely to
  '          appear in text to be displayed.

  Dim InsStr As String
  Dim InxWsChar As Long
  Dim NumWsChar As Long
  Dim PosWsChar As Long
  Dim RetnVal As String
  Dim WsCharCrnt As Variant
  Dim WsCharValue As Variant
  Dim WsCharDspl As Variant

  WsCharValue = Array(" ", vbLf, vbCr, vbTab, Chr(160))
  WsCharDspl = Array("s", "lf", "cr", "tb", "nbs")

  RetnVal = Text
  For InxWsChar = LBound(WsCharValue) To UBound(WsCharValue)
    Do While True
      PosWsChar = InStr(1, RetnVal, WsCharValue(InxWsChar))
      If PosWsChar = 0 Then
        Exit Do
      End If
      NumWsChar = 1
      Do While Mid(RetnVal, PosWsChar + NumWsChar, 1) = WsCharValue(InxWsChar)
        NumWsChar = NumWsChar + 1
      Loop
      If NumWsChar = 1 Then
        InsStr = "‹" & WsCharDspl(InxWsChar) & "›"
      Else
        InsStr = "‹" & NumWsChar & WsCharDspl(InxWsChar) & "›"
      End If
      RetnVal = Mid(RetnVal, 1, PosWsChar - 1) & InsStr & Mid(RetnVal, PosWsChar + NumWsChar)
    Loop
  Next

  TidyTextForDspl = RetnVal

End Function

关于regex - 移动主题与特定 RegEx 匹配的电子邮件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54504274/

相关文章:

java - 为什么 "3.5".matches ("[0-9]+") 返回 false?

正则表达式匹配中断字符的总数

sql - Access VBA Query中的recordset.GetString在结果后返回一个额外的字符

c# - 如何使用 add-in express 获取 outlook 电子邮件主题

regex - 正则表达式无法正确匹配 "</p>"

javascript - 在 RegEx 中为一个组获取多个匹配项

vba - 如何查找项目符号列表并设置项目符号格式(VBA _ Word 文档)

vba - Excel VBA - 我的 VBA 代码搞砸了 : why the spanish characters (á, é,í,ó,ú,ñ,¡,¿) 突变?

c# - 使用 GetProcessesByName 是检查进程是否正在运行的最佳方法吗?

outlook - 发生更新时刷新 .ICS 文件