python - 通过正则表达式匹配将字符串拆分为单独的列表

标签 python regex

我有一个如下所示的文本文件

127.0.0.1
  159.187.32.13, 3:00:15, flags: S
    Incoming interface: Ethernet51/1
    RPF route: [U] 151.177.45.0/27 [20/0] via 190.150.1.2
    Outgoing interface list:
      Vlan4054
  159.187.32.20, 2:20:11, flags: S
    Incoming interface: Ethernet51/1
    RPF route: [U] 151.177.45.59/27 [20/0] via 190.150.1.2
    Outgoing interface list:
      Vlan4054
      Vlan4056
  198.140.45.77, 2:36:15, flags: S
    Incoming interface: Ethernet51/1
    RPF route: [U] 151.177.45.88/27 [20/0] via 190.150.1.2
    Outgoing interface list:
      Vlan4054
127.0.0.2
  188.125.45.13, 3:00:15, flags: S
    Incoming interface: Ethernet51/1
    RPF route: [U] 199.150.45.0/27 [20/0] via 195.32.1.2
    Outgoing interface list:
      Vlan4054
      Vlan4056
  221.125.45.77, 2:20:11, flags: S
    Incoming interface: Ethernet51/1
    RPF route: [U] 199.150.45.10/27 [20/0] via 195.32.1.2
    Outgoing interface list:
      Vlan4054
      Vlan4056

我正在尝试创建数据字典,以便可以解析它,目前正在尝试通过正则表达式来执行此操作

import re

content = []
content_dict = {}

group_ip = re.compile("^(\d+\.\d+\.\d+\.\d+$)")
ip_subnet = re.compile("^(\d+\.\d+\.\d+\.\d+\/+\d+)")
two_space_start = re.compile("^( {2})\S")
four_space_start = re.compile("^( {4})\S")
six_space_start = re.compile("^( {6})\S")

我计划将正则表达式应用于每一行并创建如下所示的字典

if group_ip.match(line):
    content_dict["group"] = line.strip()

elif two_space.match(line) and "RP" in line:
    line = line.split(",")

    content_dict["source"] = line[0].strip()
    content_dict["uptime"] = line[1].strip()
    content_dict["rp"] = line[2].split(" ")[-1]
    content_dict["source_flags"] = line[-1].split(":")[-1].strip()

content.append(copy.copy(content_dict))

但我们已经意识到,这不会大规模工作,因为每个组 IP(127.0.0.1、127.0.0.2)将具有我要覆盖的可变数量的子组。我想要达到的目标是

"127.0.0.1": [
    "159.187.32.13": [
        "uptime": "3:00:15",
        "flags": "S",
        "rpf_ip": "151.177.45.0/27",
        "via": "190.150.1.2",
        "outgoing_interface": ["vlan4054"]
        ],
    "159.187.32.20": [
       "uptime": "2:20:11",
        "flags": "S",
        "rpf_ip": "151.177.45.59/27",
        "via": "190.150.1.2",
        "outgoing_interface": ["Vlan4054", "Vlan4056"]
        ]
    ]

是否可以通过正则表达式或其他方式从文本中获取此数据结构?

最佳答案

由于输入相当容易标记化,因此正则表达式可能有点过分了。您可以使用 str.startswithstr.isdigitstr.split 来达到您的目的:

from pprint import pprint
content = {}
with open('file.txt', 'r') as f:
    for line in f:
        line = line.rstrip()
        if line[0].isdigit():
            group = line
            content[group] = {}
        elif line.startswith('  ') and line[2].isdigit():
            ip, uptime, flags = line.lstrip().split(', ')
            _, flags = flags.split()
            content[group][ip] = {'uptime': uptime, 'flags': flags, 'outgoing_interface': []}
        elif line.startswith('    RPF route:'):
            _, _, _, rpf_ip, _, _, via = line.split()
            content[group][ip]['rpf_ip'] = rpf_ip
            content[group][ip]['via'] = via
        elif line.startswith('      '):
            content[group][ip]['outgoing_interface'].append(line.lstrip())
pprint(content)

此输出(带有您的示例输入):

{'127.0.0.1': {'159.187.32.13': {'flags': 'S',
                                 'outgoing_interface': ['Vlan4054'],
                                 'rpf_ip': '151.177.45.0/27',
                                 'uptime': '3:00:15',
                                 'via': '190.150.1.2'},
               '159.187.32.20': {'flags': 'S',
                                 'outgoing_interface': ['Vlan4054', 'Vlan4056'],
                                 'rpf_ip': '151.177.45.59/27',
                                 'uptime': '2:20:11',
                                 'via': '190.150.1.2'},
               '198.140.45.77': {'flags': 'S',
                                 'outgoing_interface': ['Vlan4054'],
                                 'rpf_ip': '151.177.45.88/27',
                                 'uptime': '2:36:15',
                                 'via': '190.150.1.2'}},
 '127.0.0.2': {'188.125.45.13': {'flags': 'S',
                                 'outgoing_interface': ['Vlan4054', 'Vlan4056'],
                                 'rpf_ip': '199.150.45.0/27',
                                 'uptime': '3:00:15',
                                 'via': '195.32.1.2'},
               '221.125.45.77': {'flags': 'S',
                                 'outgoing_interface': ['Vlan4054', 'Vlan4056'],
                                 'rpf_ip': '199.150.45.10/27',
                                 'uptime': '2:20:11',
                                 'via': '195.32.1.2'}}}

关于python - 通过正则表达式匹配将字符串拆分为单独的列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52368431/

相关文章:

python - 这个 python "for"复合语句是如何工作的?

python - django如何使从bash执行的python脚本将stdout写入文件?

regex - 为什么我不能连接在 qr 下编译和运行时评估的模式?

r - 文本挖掘 R 包和正则表达式来处理替换智能 curl 引号

regex - 使用 PCRE 表达式进行 URL 重写 - 将前缀附加到除一种模式之外的所有传入 URI

javascript - IDN 感知工具,用于将人类可读的 IRI 编码/解码到/从有效 URI

python - Mapreduce:数据到节点的复杂分布

python - 如何在 spark python 的一列中连接两个字符串列

java - Flex 正则表达式 到 Java 正则表达式

java - 正则表达式导致 StackOverflowError