python - sed to python替换a中的额外分隔符

sed 's/\t/_tab_/3g'

我有一个 sed 命令，它基本上替换了我的文本文档中所有多余的制表符分隔符。我的文档应该是 3 列，但偶尔会有一个额外的分隔符。我无法控制这些文件。

我使用上面的命令来清理文档。然而，我对这些文件的所有其他操作都是在 python 中进行的。有没有办法在 python 中执行上述 sed 命令？

示例输入:

Column1   Column2         Column3
James     1,203.33        comment1
Mike      -3,434.09       testing testing 123
Sarah     1,343,342.23    there   here

示例输出:

Column1   Column2         Column3
James     1,203.33        comment1
Mike      -3,434.09       testing_tab_testing_tab_123
Sarah     1,343,342.23    there_tab_here

最佳答案

你可以逐行读取文件，用tab分割，如果超过3条，用_tab_加入第3条之后的条:

lines = []
with open('inputfile.txt', 'r') as fr:
    for line in fr:
        split = line.split('\t')
        if len(split) > 3:
            tmp = split[:2]                      # Slice the first two items
            tmp.append("_tab_".join(split[2:]))  # Append the rest joined with _tab_
            lines.append("\t".join(tmp))         # Use the updated line
        else:
            lines.append(line)                   # Else, put the line as is

参见 Python demo

lines 变量将包含如下内容

Mike    -3,434.09   testing_tab_testing_tab_123
Mike    -3,434.09   testing_tab_256
No  operation   here

关于python - sed to python替换a中的额外分隔符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51317050/

上一篇：python - 具有缺失值的 SMOTE

下一篇：python - 对代码对象使用 uncompyle6 的正确方法是什么？

相关文章：

c++ - Boost::python 使用和返回模板公开 C++ 函数

c# - 处理从 CSV 导入的引号

linux - sed 仅最后一个匹配模式

php - Python SimpleHTTPServer 与 PHP

python - 如何在Python中创建基于时间的BufferingHandler？

Python Matplotlib 并排绘制两个数据集的箱线图

python - 字典中的 .csv 具有更改的标题名称和索引

python - 如果值匹配，则自动比较 2 个 csv 文件的值，将第二个 csv 读入 DataFrame

bash - 如何使用 sed/awk 解析文件的内容？

bash - 在一行上打印 shell 环境函数