python - 什么是识别 gmail 中的 "original message"前缀的好正则表达式?

标签 python regex roundup

示例签名可能是:

On Tue, Mar 20, 2012 at 2:38 PM, Johnny Walker <johnny.talker@gmail.com> wrote:

然后是引用的回复。我确实有一种离散的感觉,这是特定于语言环境的,但这让我成为一个悲伤的程序员。

我问这个的原因是因为 roundup通过 gmail 回复问题时没有正确删除这些。我认为 origmsg_re是我需要与 keep_quoted_text = no 一起设置的 config.ini 变量解决这个问题。

现在是默认值 origmsg_re = ^[>|\s]*-----\s?Original Message\s?-----$

编辑:现在我正在使用 origmsg_re = ^On[^<]+<.+@.+>[ \n]wrote:[\n]它适用于一些断行太长的 gmail 客户端。

最佳答案

以下正则表达式将以非常安全的方式匹配 gmails 前缀。它确保有 3 个逗号和升文本 On ... wrote

On([^,]+,){3}.*?wrote:

如果正则表达式应该以不区分大小写的方式匹配,那么不要忘记添加修饰符。

if re.search("On([^,]+,){3}.*?wrote:", subject, re.IGNORECASE):
    # Successful match
else:
    # Match attempt failed

亲切的问候,巴克利

Match the characters “On” literally «On»
Match the regular expression below and capture its match into backreference number 1 «([^,]+,){3}»
   Exactly 3 times «{3}»
   Note: You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «{3}»
   Match any character that is NOT a “,” «[^,]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
   Match the character “,” literally «,»
Match any single character that is not a line break character «.*?»
   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the characters “wrote:” literally «wrote:»

Created with RegexBuddy

关于python - 什么是识别 gmail 中的 "original message"前缀的好正则表达式?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9787427/

相关文章:

F# 向上舍入到特定的小数位数

python - PIP 的 UnicodeDecodeError?

python - 修复在 python 中用 BS4 提取的损坏的 html 表

python - 如何使用 pandas 从当前行获取过去 12 个月的产品

php - 如何在 PHP 中取整值?

python - 使用 Apache 配置 Roundup

Python 适合给定输入的数据类型

python - 使用 python 抓取数组到数据表

regex - 如何使用 ColdFusion 从 XML 字符串中删除所有多余的空格?

python - Beautiful Soup - 如何修复损坏的标签