python - 正则表达式查找搜索词并将结果放入另一个数据文件中？

不确定这个问题的标题是否正确，所以让我提供一些背景信息。我有两个文本文件。一个名为 data.txt，另一个名为 results.txt。在数据文件中，我有思科网络设备上“显示版本”的结果。它看起来像下面这样:

Cisco IOS Software, s72033_rp Software (s72033_rp-ADVIPSERVICESK9_WAN-M), Version 12.2(33)SXI4, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2010 by Cisco Systems, Inc.
Compiled Sat 29-May-10 17:54 by prod_rel_team

ROM: System Bootstrap, Version 12.2(17r)SX6, RELEASE SOFTWARE (fc1)

 core-router uptime is 2 years, 5 weeks, 1 day, 5 hours, 47 minutes
Uptime for this control processor is 2 years, 5 weeks, 1 day, 4 hours, 50 minutes
Time since san-qrc1 switched to active is 2 years, 5 weeks, 1 day, 4 hours, 56 minutes
System returned to ROM by reload at 16:12:08 PDT Fri Aug 27 2010 (SP by reload)
System restarted at 16:19:33 PDT Fri Aug 27 2010
System image file is "sup-bootdisk:s72033-advipservicesk9_wan-mz.122-33.SXI4.bin"
Last reload reason: Reload Command



This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
export@cisco.com.

cisco WS-C6509-E (R7000) processor (revision 1.5) with 983008K/65536K bytes of memory.
Processor board ID XXXXXXXXXX
SR71000 CPU at 600Mhz, Implementation 0x504, Rev 1.2, 512KB L2 Cache
Last reset from s/w reset
35 Virtual Ethernet interfaces
51 Gigabit Ethernet interfaces
26 Ten Gigabit Ethernet interfaces
1917K bytes of non-volatile configuration memory.
8192K bytes of packet buffer memory.

65536K bytes of Flash internal SIMM (Sector size 512K).
Configuration register is 0x2102

简单地说，我想读取data.txt并提取某些字符串并将它们放入results.txt中。 CSV 格式会很好，但我很乐意提取数据。

例如，该脚本将提取相关数据，例如设备的主机名(在本例中为核心路由器)、系统镜像文件名 (s72033-advipservicesk9_wan-mz.122-33.SXI4.bin)、正常运行时间(2 年、5 周、1 天、5 小时、47 分钟)、序列号 (XXXXXXXX) 和型号 (WS-C6509-E)。所有这些信息都将以制表符分隔的格式放入 results.txt 中。

将来，可以使用不同的 data.txt 文件，并将数据附加到 results.txt，为我提供数据点的运行记录。我希望这是有道理的。我尝试过对我要查找的内容进行搜索，但我发现的大多数内容要么是在文本文件中查找整行，要么是获取某个单词出现的索引号。

最后一件事:根据思科设备的型号，所有项目都会有所不同。它周围的文字通常是相同的，但我正在寻找的项目会有所不同。如果您能提供任何帮助，我们将不胜感激。提前致谢。

更新

我使用了您提供的脚本，但我仍然看到以下内容:

python
Python 2.7.3 (default, Aug  1 2012, 05:16:07) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> with open("data.txt") as infile:
...     text = infile.read()
... 
>>> import re
>>> regex = re.compile(
...     r"""^(?P<device>\S*)           # Match non-whitespace device name
...     \suptime\sis\s                 # Match " uptime is "
...     (?P<uptime>[^\r\n]*)           # Match until end of line --> uptime
...     .*?^System\simage\sfile\sis\s  # Match intervening text
...     "[^:]*:                        # Match from quote to colon
...     (?P<sifilename>[^"]*)          # Match everything until quote --> filename
...     .*?^cisco\s                    # Match intervening text
...     (?P<model>\S*)                 # Match non-whitespace model name
...     .*?^Processor\sboard\sID\s     # Match intervening text
...     (?P<serialno>[^\r\n]*)         # Match until end of line --> serial no""", 
...     re.DOTALL | re.MULTILINE | re.VERBOSE)
>>> match = regex.search(text)
>>> match.groups()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'groups'

问题是主机名缩进 1 个空格。当我粘贴它时，缩进消失了。但是，当发出命令“show version”时，会出现 1 个空格缩进。尝试运行上面的代码将会破坏脚本。删除空间即可使其正常工作。

最佳答案

您可以将文件读入如下字符串:

with open("data.txt") as infile:
    text = infile.read()

然后您可以使用正则表达式来提取相关信息:

import re
regex = re.compile(
    r"""^(?P<device>\S*)           # Match non-whitespace device name
    \suptime\sis\s                 # Match " uptime is "
    (?P<uptime>[^\r\n]*)           # Match until end of line --> uptime
    .*?^System\simage\sfile\sis\s  # Match intervening text
    "[^:]*:                        # Match from quote to colon
    (?P<sifilename>[^"]*)          # Match everything until quote --> filename
    .*?^cisco\s                    # Match intervening text
    (?P<model>\S*)                 # Match non-whitespace model name
    .*?^Processor\sboard\sID\s     # Match intervening text
    (?P<serialno>[^\r\n]*)         # Match until end of line --> serial no""", 
    re.DOTALL | re.MULTILINE | re.VERBOSE)
match = regex.search(text)

现在 match.groups() 包含:

>>> match.groups()
('core-router', '2 years, 5 weeks, 1 day, 5 hours, 47 minutes', 
's72033-advipservicesk9_wan-mz.122-33.SXI4.bin', 'WS-C6509-E', 'XXXXXXXXXX')

您可以使用它写入 csv 文件，如下所示:

import csv
with open("results.txt", "a") as outfile:
    outcsv = csv.Writer(outfile)
    outcsv.writerow(match.groups())

关于python - 正则表达式查找搜索词并将结果放入另一个数据文件中？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12685368/

python - 正则表达式查找搜索词并将结果放入另一个数据文件中？

上一篇：python - 使用 PyYAML 有选择地转储对象属性

下一篇：python - 返回列表中以给定最小频率出现的项目