python - 在文件中就地(多次)替换

我正在尝试在文件中执行一些替换:

'\t' --> '◊'
 '⁞' --> '\t'

This question推荐以下程序:

import fileinput

with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
    for line in file:
        line = line.replace('\t','◊')
        print(line.replace('⁞','\t'), end='')

我不允许在那里发表评论，但是当我运行这段代码时，我收到一条错误消息:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 10: character maps to <undefined>

我之前通过添加 encoding='utf-8' 纠正了这种错误。问题是 fileinput.FileInput() 不允许使用编码参数。

问题:如何摆脱这个错误？

上述解决方案，如果可行并且速度与以下方法相当，我将非常满意。它似乎在进行就地替换，这是应该进行的。

我也试过:

replacements = {'\t':'◊', '⁞':'\t'}
with open(filename, encoding='utf-8') as inFile:
    contents = inFile.read()
with open(filename, mode='w', encoding='utf-8') as outFile:
    for i in replacements.keys():
        contents = contents.replace(i, replacements[i])
    outFile.write(contents)

它相对较快，但在内存方面非常贪婪。

对于 UNIX 用户，我需要做以下事情的东西:

sed -i 's/\t/◊/g' 'file.csv'
sed -i 's/⁞/\t/g' 'file.csv'

事实证明这相当慢。

最佳答案

通常，使用FileInput，您可以指定要传递给fileinput.hook_encoded 的编码。作为 openhook 参数:

import fileinput

with fileinput.FileInput(filename, openhook=fileinput.hook_encoded('utf-8')) as file:
    # ...

但是，这不适用于 inplace=True。在这种情况下，您可以将文件视为二进制文件并自行解码/编码字符串。对于阅读，只需指定 mode='rb' 即可完成，这将为您提供 bytes 而不是 str 行。对于编写它有点复杂，因为 print 总是使用 str，或者将给定的输入转换为 str，所以传递字节将不起作用预期的。但是，您可以 write binary data to sys.stdout直接，这将起作用:

import sys
import fileinput

filename = '...'
with fileinput.FileInput(filename, mode='rb', inplace=True, backup='.bak') as file:
    for line in file:
        line = line.decode('utf-8')
        line = line.replace('\t', '◊')
        line = line.replace('⁞', '\t')
        sys.stdout.buffer.write(line.encode('utf-8'))

关于python - 在文件中就地(多次)替换，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49633831/

python - 在文件中就地(多次)替换

上一篇：python - 在 python 中，ValueError : No JSON object could be decoded

下一篇：python - 安装后找不到pip命令