python - 解析python中的可变行

来自此链接:Splitlines in Python a table with empty spaces

它运行良好，但当列的大小发生变化时会出现问题:

COMMAND     PID       USER   FD      TYPE DEVICE  SIZE/OFF   NODE NAME
init          1       root  cwd   unknown                         /proc/1/cwd (readlink: Permission denied)
init          1       root  rtd   unknown                         /proc/1/root

问题开始于 col Device 或 Size/OFF，但在其他情况下可能会发生在所有列中。

COMMAND     PID       USER   FD      TYPE             DEVICE  SIZE/OFF       NODE NAME
init          1       root  cwd       DIR                8,1      4096          2 /
init          1       root  rtd       DIR                8,1      4096          2 /
init          1       root  txt       REG                8,1     36992     139325 /sbin/init
init          1       root  mem       REG                8,1     14696     190970 /lib/libdl-2.11.3.so
init          1       root  mem       REG                8,1   1437064     190958 /lib/libc-2.11.3.so
python    30077     carlos    1u      CHR                1,3       0t0        700 /dev/null

检查总是第一行相同，第一列从COMMAND的C开始，第二列结束于PID的D，四列。在 FD 的 D +1 中.... 有什么方法可以计算第一行中的空格数以使用它们来填充此代码以解析其他行？

# note: variable-length NAME field at the end intentionally omitted
base_format = '8s 1x 6s 1x 10s 1x 4s 1x 9s 1x 6s 1x 9s 1x 6s 1x'
base_format_size = struct.calcsize(base_format)

有什么办法解决这个问题吗？

最佳答案

在查看另一个线程后，我对 lsof -F 做了一些阅读，发现它确实产生了易于解析的输出。这是一般想法的快速演示。它解析并打印解析输出的一小部分以显示格式。您能否将 -F 用于您的用例？

import subprocess
import copy
import pprint


def get_rows(output_to_parse, whitelist_keys):
    lines = output_to_parse.split("\n")
    rows = []
    while lines:
        row = _get_new_row(lines, whitelist_keys)
        rows.append(row)
    return rows


def _get_new_row(lines, whitelist_keys):
    new_row_keys = set()
    output = {}
    repeat = False
    while lines and repeat is False:
        line = lines.pop()
        if line == '':
            continue
        key = line[0]
        if key not in whitelist_keys:
            raise(ValueError(key))
        value = line[1:]
        if key not in new_row_keys:
            new_row_keys.add(key)
            output[key] = value
        else:
            repeat = True
    return output

if __name__ == "__main__":
    identifiers = subprocess.Popen(["lsof", "-F", "?"], stderr=subprocess.PIPE).communicate()

    keys = set([line.strip()[0] for line in identifiers[1].split("\n") if line != ''][1:])

    lsof_output = subprocess.check_output(["lsof", "-F"])
    rows = get_rows(lsof_output, whitelist_keys=keys)
    pprint.pprint(rows[:20])

关于python - 解析python中的可变行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20303558/

python - 解析python中的可变行

上一篇：php - Linux 是否缓存(小)文件以优化 I/O？

下一篇：c++ - 循环更新 Linux 的终端屏幕