python - 在多个文件中搜索多个正则表达式,然后输出每个匹配项及其各自的文件

标签 python regex nlp

我正在尝试将输出格式化为表格。例如,所有匹配的文件应为列,而匹配实例应为行。

这是我的代码:

import glob
import re
folder_path = "/home/e136320"
file_pattern = "/*.txt"

match_list = []

folder_contents = glob.glob(folder_path + file_pattern)

#Search for Emails
regex1= re.compile(r'\S+@\S+')
#Search for Phone Numbers
regex2 = re.compile(r'\d\d\d[-]\d\d\d[-]\d\d\d\d')
#Search for Physician's Name
regex3=re.compile(r'\b\w\w\.\w+\b')


for file in folder_contents:
    read_file = open(file, 'rt').read()
    words=read_file.split()
    for line in words:
        email=regex1.findall(line)
        phone=regex2.findall(line)
        for word in email:
            print(file,email)
        for word in phone:
            print(file,phone)

这是我的输出:

('/home/e136320/sample.txt', ['bcbs@aol.com'])
('/home/e136320/sample.txt', ['James@aol.com'])
('/home/e136320/sample.txt', ['248-981-3420'])
('/home/e136320/wow.txt', ['soccerfif@yahoo.com'])
('/home/e136320/wow.txt', ['313-806-6666'])
('/home/e136320/wow.txt', ['444-444-4444'])
('/home/e136320/wow.txt', ['248-805-6233'])
('/home/e136320/wow.txt', ['maliva@gmail.com'])

有什么想法吗?

最佳答案

我会尝试将您找到的项目附加到列表中,以便组织结果并在循环之间保留它们。然后你可以尝试打印出来。你可以尝试这样的事情:

import glob
import re

folder_path = "/home/e136320"
file_pattern = "/*.txt"

match_list = []

folder_contents = glob.glob(folder_path + file_pattern)

# Search for Emails
regex1= re.compile(r'\S+@\S+')

# Search for Phone Numbers
regex2 = re.compile(r'\d\d\d[-]\d\d\d[-]\d\d\d\d')

# Search for Physician's Name
regex3=re.compile(r'\b\w\w\.\w+\b')

results = {}

for file in folder_contents:
    read_file = open(file, 'rt').read()
    words=read_file.split()
    current_results = []

    for line in words:
        email=regex1.findall(line)
        phone=regex2.findall(line)

        for word in email:
            # Append email Regex matches to a list
            current_results.append(word)

        for word in phone:
            # Append phone Regex matches to a list
            current_results.append(word)

     # Save results per file in a dictionary
     # The file name is the key.
     results[file] = current_results

for key in results.keys():
    print(key, [str(item) for item in results[key]]

关于python - 在多个文件中搜索多个正则表达式,然后输出每个匹配项及其各自的文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52995272/

相关文章:

python - 在 Python 中使用原始文档字符串的部分函数应用程序?

Python 加密库无法使用我的 PEM 格式加载_pem_public_key

r - spaCy 或语言模型 en_core_web_sm 未安装在任何 python 可执行文件中

machine-learning - 语义网络的概率生成

python - 确定网站是否使用 Django 开发

javascript - 匹配两个 html 自定义标签之间的文本,但不匹配其他自定义标签

正则表达式仅用于字母数字而不用于数字

regex - 正则表达式匹配包含子集的整个字符串

Python/从有异常的文件中获取唯一标记

python - 多元线性回归 python