python - 如何从文件加载多个正则表达式模式并匹配给定的字符串？

标签 python regex python-3.x string regex-negation

<分区>

根据提供的代码(针对这篇文章进行了简化)，有人可以帮助展示我如何获取正则表达式模式的列表(如果“列表”是要使用的正确类型)以从文本文件加载并匹配到一个字符串？

有许多从文件中加载文本/文本字符串并匹配正则表达式模式的示例，但反之则不然——许多正则表达式模式匹配一个文本字符串。

如果我手动创建列表并运行 re.compile，您可能会在代码中看到，我可以使用模式列表来匹配字符串。然而，从文件加载时，re.compile 适合什么位置？

import regex as re

fname = 'regex_strings_short.txt'

string_to_match = 'onload=alert'

# Create a manual list of regexes
manual_regexes = [
    re.compile(r'(?i)\bHP\b(?:[^.,;]{1,20}?)\bnumber\b'),
    re.compile(r'(?i)\bgmail\b(?:[^.,;]{1,20}?)\bnumber\b'),
    re.compile(r'(?i)\bearthlink\b(?:[^.,;]{1,20}?)\bnumber\b '),
    re.compile(r'(?i)onload=alert')
]

# Create a text file with these five example patterns
'''
(?i)\bHP\b(?:[^.,;]{1,20}?)\bnumber\b
(?i)\bgmail\b(?:[^.,;]{1,20}?)\bnumber\b
(?i)\bearthlink\b(?:[^.,;]{1,20}?)\bnumber\b
(?i)onload=alert
(?i)hello
'''

# Import a list of regex patterns from the created file
with open(fname, 'r') as file:
    imported_regexes = file.readlines()

# Notice the difference in the formatting of the manual list with 'regex.Regex' and 'flags=regex.I | regex.V0' wrapping each item
print(manual_regexes)
print('---')
print(imported_regexes)

# A match is found in the manual list, but no match found in the imported list
if re.match(imported_regexes[3], my_string):
    print('Match found in imported_regexes.')
else:
    print('No match in imported_regexes.')

print('---')

if re.match(manual_regexes[3], my_string):
    print('Match found in manual_regexes.')
else:
    print('No match in manual_regexes.')

imported_regexes 没有匹配项，但 manual_regexes 有匹配项。

更新:下面的代码是对我有用的最终解决方案。发布它，因为它可能会帮助有人登陆这里并需要解决方案。

# You must use regex as re and not just 'import re' as \p{} is not correctly escaped

import regex as re



# Add the post/string to match below

my_string = '<p>HP Support number</p>'



fname = 'regex_strings.txt'



# Contents of text file similar to the below

# but without the leading # space - that's only because it's an inline comment here

# (?i)\bHP\b(?:[^.,;]{1,20}?)\bnumber\b

# (?i)\bgmail\b(?:[^.,;]{1,20}?)\bnumber\b

# (?i)】\b(?:[^.,;]{1,1000}?)\p{Lo}



# Import a list of regex patterns from a file

with open(fname, 'r', encoding="utf8") as f:

    loaded_patterns = f.read().splitlines()



# print(loaded_patterns)

print(len(loaded_patterns))



found = 0

for index, pattern in enumerate (loaded_patterns):

    if re.findall(loaded_patterns[index],my_string):

        print('Match found. ' + loaded_patterns[index])

        found = 1



if found == 0:

    print('No matching regex found.')

最佳答案

re.match 接受字符串和编译后的正则表达式作为参数，并在内部将字符串转换为编译后的正则表达式对象。您可以调用 re.compile 进行优化(多次调用相同的正则表达式)，但这对于程序正确性来说是可选的。

如果程序不打印导入的正则表达式是匹配的，那是因为 readlines() 在您的字符串中保持尾随 '\n'。因此 re.match('(?i)onload=alert\n') 返回 False 和要匹配的字符串。您可以对清理后的字符串调用 re.compile，也可以不调用。

with open(fname, 'r') as file:
    imported_regexes = file.readlines()
print(re.match(imported_regexes[3].strip('\n'), string_to_match))

输出匹配对象。

关于python - 如何从文件加载多个正则表达式模式并匹配给定的字符串？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56094010/

上一篇：python - 如何将掩码应用于维持形状和检索值的 numpy 数组？

下一篇：python - 如何在 python 中获取 GFCC 而不是 MFCC？

相关文章：

python - 为什么 'return self' 返回 None？

javascript - 我想要 dd.d.dd.ddddd 或 dd.d.d.ddddd 类型的数据的正则表达式

python - 解析txt的特定区域，与字符串列表进行比较，然后生成由匹配项组成的新列表

python - 如何查看 Django View 的错误日志

c++ - 如何将 QGraphicsScene 动画渲染到电影文件中？

python - 如何使用多处理并行 Theano 函数？

java - String.split() 意外返回 ""

regex - Perl 正则表达式，它在一行中获取所有双字母出现

python-3.x - LSTM陷入循环

python - 如何对颠倒的单词进行列表理解？