python - 如何选择txt文件中的某个字符串并将其列出在csv文件中?

标签 python csv text-files export-to-csv

这是我的文本文件中的内容:我只想获取此 sha1 和描述,然后使用前缀和分隔符将其解析为 csv 文件,修剪字符串,然后选择“\”和“之间的 sha1 ->”然后我想获取描述。

         +----------------------------------------------------+
         |          VSCAN32            Ver 2.00-1655          |
         |                                                    |
         |     Copyright (c) 1990 - 2012 xxx xxx xxx Inc.     |
         |                                                    |
         |    Maintained by xxxxxxxxx  QA for VSAPI Testing   |
         +----------------------------------------------------+

Setting Process Priority to NORMAL: Success 1

Successfully setting POL Flag to 0
VSGetVirusPatternInformation is invoked
Reading virus pattern from lpt$vpn.527 (2018/09/25) (1452700)


Scanning samples_extracted\88330686ae94a9b97e1d4f5d4cbc010933f90f9a->(MS Office 2007 Word 4045-1)
->Found Virus [TROJ_FRS.VSN11I18]



Scanning samples_extracted\8d286d610f26f368e7a18d82a21dd68b68935d6d->(Microsoft RTF 6008-0)
->Found Virus [Possible_SMCCVE20170199]



Scanning samples_extracted\a10e5f964eea1036d8ec50810f1d87a794e2ae8c->(ASCII text 18-0)
->Found Virus [Trojan.VBS.NYMAIM.AA]


18 files have been checked.
 Found 16 files containing viruses.
(malloc count, malloc total, free total) = (0, 35, 35)

到目前为止,这是我的代码:它仍然输出许多字符串,但我只需要在 csv 中解析 sha1 和描述,我使用 split 以便可以在“\”和“->”之间选择 sha1确实放置了 sha1,但描述未修剪,内容仍然存在

import csv

INPUTFILE = 'input.txt'
OUTPUTFILE = 'output.csv'
PREFIX = '\\'
DELIMITER = '->'

def read_text_file(inputfile):
    data = []
    with open(inputfile, 'r') as f:
        lines = f.readlines()

    for line in lines:
        line = line.rstrip('\n')
        if not line == '':
            line = line.split(PREFIX, 1)[-1]
            parts = line.split(DELIMITER)
            data.append(parts)

    return data

def write_csv_file(data, outputfile):
    with open(outputfile, 'wb') as csvfile:
        csvwriter = csv.writer(csvfile, delimiter=',', quotechar='"',
                                quoting=csv.QUOTE_ALL)
        for row in data:
            csvwriter.writerow(row)

def main():
    data = read_text_file(INPUTFILE)
    write_csv_file(data, OUTPUTFILE)

if __name__ == '__main__':
    main()

这是我在 csv 中想要的内容:sha1 和描述,但我的输出文件显示整个文本文件,但它过滤了 sha1 并将其放入列中 sha1 and description only

编辑:起初它可以工作,但是由于该行文本有多行,因此可以将其放入 csv 文件中,有什么答案吗?

Scanning samples_extracted\0191a23ee122bdb0c69008971e365ec530bf03f5
 - Invoice_No_94497.doc->Found Virus [Trojan.4FEC5F36]->(MIME 6010-0)

 - Found 1/3 Viruses in samples_extracted\0191a23ee122bdb0c69008971e365ec530bf03f5

最佳答案

只需进行最少的更改 - 您可以使用这部分代码:

for line in lines:
    line = line.rstrip('\n')
    if not line == '' and DELIMITER in line and not "Found" in line: # <---
        line = line.split(PREFIX, 1)[-1]
        parts = line.split(DELIMITER)

但我更喜欢使用正则表达式:

import re

for line in lines:
    line = line.rstrip('\n')
    if re.search(r'[a-zA-Z0-9]{40}->\(', line): # <----
        line = line.split(PREFIX, 1)[-1]
        parts = line.split(DELIMITER)
        data.append(parts)

结果将是:

cat output.csv
"88330686ae94a9b97e1d4f5d4cbc010933f90f9a","(MS Office 2007 Word 4045-1)"
"8d286d610f26f368e7a18d82a21dd68b68935d6d","(Microsoft RTF 6008-0)"
"a10e5f964eea1036d8ec50810f1d87a794e2ae8c","(ASCII text 18-0)"

关于python - 如何选择txt文件中的某个字符串并将其列出在csv文件中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52549102/

相关文章:

python - 将重复数据从行转换为列

python - 比较泛型类型对象的类型

python - 从 MySQL StoredProcedure 获取多个列表/数据

python - 删除 csv 中的最后一行

windows - 删除文本文件的第一行

python - 读入 python 后从文本文件中删除第一行标题

java - 如何在打开应用程序时读取文本文件?

python - np.genfromtxt 多个分隔符?

python - 保留 x 行并从 csv 文件中删除所有行

VB6 逗号分隔 CSV 文件