python - 用另一个文件中的单词替换替换单词

标签 python translation nltk

我的文本文件 (mytext.txt) 中的单词需要替换为另一个文本文件 (replace.txt) 中提供的其他单词

cat mytext.txt
this is here. and it should be there. 
me is this will become you is that.

cat replace.txt
this that
here there
me you

以下代码未按预期工作。

with open('mytext.txt', 'r') as myf:
    with open('replace.txt' , 'r') as myr:
        for line in myf.readlines():
            for l2 in myr.readlines():
                original, replace = l2.split()
                print line.replace(original, replace)

预期输出:

that is there. and it should be there. 
you is that will become you is that.

最佳答案

编辑: 我的观点是正确的,OP 要求逐字替换而不是简单的字符串替换('become' -> 'become' 而不是 'becoyou')。我想一个字典版本可能看起来像这样,使用在 Splitting a string into words and punctuation 的已接受答案的评论中找到的正则表达式拆分方法。 :

import re

def clean_split(string_input):
    """ 
    Split a string into its component tokens and return as list
    Treat spaces and punctuations, including in-word apostrophes as separate tokens

    >>> clean_split("it's a good day today!")
    ["it", "'", "s", " ", "a", " ", "good", " ", "day", " ", "today", "!"]
    """
    return re.findall(r"[\w]+|[^\w]", string_input)

with open('replace.txt' , 'r') as myr:
    replacements = dict(tuple(line.split()) for line in myr)

with open('mytext.txt', 'r') as myf:
    for line in myf:
        print ''.join(replacements.get(word, word) for word in clean_split(line)),

我无法很好地推理re 效率,如果有人指出明显的低效率,我将不胜感激。

编辑 2: 好吧,我在单词和标点符号之间插入空格,现在 通过将空格视为标记并执行 ''.join() 来修复 而不是 ' '.join()

关于python - 用另一个文件中的单词替换替换单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27773802/

相关文章:

python - 将 True 的前导岛和尾随岛保留在 bool 数组中 - NumPy

mysql - 如何在带有连接的复杂选择中获取由mysql中的公共(public)列合并的多行

php - 使用 gettext 和 smarty 获取特殊字符的时间

python - 如何使用networkx找到距源节点距离为2的节点?

javascript - Python 版本的 JavaScript ES6 符号

python - 无法在 python 中导入 snappy

postgresql - 如何在postgresql中存储多语言字符串?

python - 使用 python 量化情感分析

python - 如何计算语料库文档中的单词

python - 在多个内核上运行进程可能会导致 python 性能下降?