Python，遍历文件中的行；如果行等于另一个文件中的行，则返回原始行

文本文件 1 具有以下格式:

'WORD': 1
'MULTIPLE WORDS': 1
'WORD': 2

等等

即，一个由冒号分隔的单词，后跟一个数字。

文本文件 2 具有以下格式:

'WORD'
'WORD'

等

我需要从文件 1 中提取单个单词(即，只有单词而不是多个单词)，如果它们与文件 2 中的单词匹配，则返回文件 1 中的单词及其值。

我有一些功能不佳的代码:

def GetCounts(file1, file2):
    target_contents  = open(file1).readlines()  #file 1 as list--> 'WORD': n
    match_me_contents = open(file2).readlines()   #file 2 as list -> 'WORD'
    ls_stripped = [x.strip('\n') for x in match_me_contents]  #get rid of newlines

    match_me_as_regex= re.compile("|".join(ls_stripped))   

    for line in target_contents:
        first_column = line.split(':')[0]  #get the first item in line.split
        number = line.split(':')[1]   #get the number associated with the word
        if len(first_column.split()) == 1: #get single word, no multiple words 
            """ Does the word from target contents match the word
            from match_me contents?  If so, return the line from  
            target_contents"""
            if re.findall(match_me_as_regex, first_column):  
                print first_column, number

#OUTPUT: WORD, n
         WORD, n
         etc.

由于使用了正则表达式，输出很不稳定。例如，代码将返回“asset, 2”，因为 re.findall() 将匹配 match_me 中的“set”。我需要将 target_word 与 match_me 中的整个单词进行匹配，以阻止部分正则表达式匹配导致的错误输出。

最佳答案

如果 file2 不是很大，将它们合并成一个集合:

file2=set(open("file2").read().split())
for line in open("file1"):
    if line.split(":")[0].strip("'") in file2:
        print line

关于Python，遍历文件中的行；如果行等于另一个文件中的行，则返回原始行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/7218643/

Python，遍历文件中的行；如果行等于另一个文件中的行，则返回原始行

上一篇：python - 如何在 Python 中传播树节点

下一篇：python - 将 IronPython WPF 项目编译为 exe : Missing dlls