python - 如何在 Python 中用 0 替换文本表中的空白条目?

标签 python text

我的表格如下所示:

text = """
ID = 1234

Hello World              135,343    117,668    81,228
Another line of text    (30,632)              (48,063)
More text                  0         11,205       0    
Even more text                       1,447       681

ID = 18372

Another table                        35,323              38,302      909,381
Another line with text                 13                  15
More text here                                              7           0    
Even more text here                   7,011               1,447        681
"""

有没有办法将每个表中的“空白”条目替换为 0?我试图在条目之间设置分隔符,但使用以下代码无法处理表中的空白点:

for line in text.splitlines():
    if 'ID' not in line:
        line1 = line.split()
        line = '|'.join((' '.join(line1[:-3]), '|'.join(line1[-3:])))
        print line
    else:
        print line

输出为:

ID = 1234
|
Hello World|135,343|117,668|81,228
Another line of|text|(30,632)|(48,063)
More text|0|11,205|0
Even more|text|1,447|681
|
ID = 18372
|
Another table|35,323|38,302|909,381
Another line with|text|13|15
More text|here|7|0
Even more text here|7,011|1,447|681

如您所见,第一个问题出现在第一个表的第二行。 “文本”一词被视为第一列。有什么方法可以在 Python 中解决这个问题,用 0 替换空白条目吗?

最佳答案

这是一个用于在一堆行中查找列的函数。第二个参数 pat 定义列是什么,可以是任何正则表达式。

import itertools as it
import re

def find_columns(lines, pat = r' '):
    '''
    Usage:
    widths = find_columns(lines)
    for line in lines:
        if not line: continue
        vals = [ line[widths[i]:widths[i+1]].strip() for i in range(len(widths)-1) ]
    '''
    widths = []
    maxlen = max(len(line) for line in lines)
    for line in lines:
        line = ''.join([line, ' '*(maxlen-len(line))])
        candidates = []
        for match in re.finditer(pat, line):
            candidates.extend(range(match.start(), match.end()+1))
        widths.append(set(candidates))
    widths = sorted(set.intersection(*widths))
    diffs = [widths[i+1]-widths[i] for i in range(len(widths)-1)]
    diffs = [None]+diffs
    widths = [w for d, w in zip(diffs, widths) if d != 1]
    if widths[0] != 0: widths = [0]+widths
    return widths

def report(text):
    for key, group in it.groupby(text.splitlines(), lambda line:line.startswith('ID')):
        lines = list(group)
        if key:
            print('\n'.join(lines))
        else:
            # r' (?![a-zA-Z])' defines a column to be any whitespace
            # not followed by alphabetic characters.
            widths = find_columns(lines, pat = r'\s(?![a-zA-Z])')
            for line in lines:
                if not line: continue
                vals = [ line[widths[i]:widths[i+1]] for i in range(len(widths)-1) ]
                vals = [v if v.strip() else v[1:]+'0' for v in vals]
                print('|'.join(vals))

text = """\
ID = 1234

Hello World              135,343    117,668    81,228
Another line of text    (30,632)              (48,063)
More text                  0         11,205       0    
Even more text                       1,447       681

ID = 18372

Another table                        35,323              38,302      909,381
Another line with text                 13                  15
More text here                                              7           0    
Even more text here                   7,011               1,447        681
"""

report(text)

产量

ID = 1234
Hello World         |     135,343|    117,668|    81,228
Another line of text|    (30,632)|          0|   (48,063)
More text           |       0    |     11,205|       0   
Even more text      |           0|     1,447 |      681
ID = 18372
Another table         |               35,323|              38,302|      909,381
Another line with text|                 13  |                15|0
More text here        |                    0|                 7  |         0   
Even more text here   |                7,011|               1,447|        681

关于python - 如何在 Python 中用 0 替换文本表中的空白条目?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7700081/

相关文章:

Python - 如何存储文本文件中的数据而不同时将所有数据存储在主内存中?

regex - 如何使用 Powershell 从多行中选择字符串

android - 在 Android Activity 中将颜色设置为文本?

python - 我必须将监听套接字绑定(bind)到哪个主机?

python - 使用 python/pandas 的字典理解与 str.contains 进行部分字符串匹配

python - 找出列表中每个项目的对象类型的最佳方法

search - 在 Java 中查找两个单词之间的文本

c# - 读取同风格的dox,docx文件

python - 属性错误 : 'module' object has no attribute 'reader'

python - 尝试散点图时, Pandas 不在索引错误中