python - 使用 python 计算平面文件中字符串的出现次数

我正在尝试完成一门在线类(class)，有一个问题是计算一个大文件中“fantastic”一词出现的次数。当发现某个事件时，需要存储该行的第一个元素(id)以构建包含该单词的行(id)列表。到目前为止，我有下面的内容可以正确读取行，但我不知道如何检查“fantastic”是否在该行的大写/小写中。我尝试过使用 row.count('fantastic')这不起作用，因为我不确定 csv 阅读器如何存储行，如果我可以对它们进行计数，我可以将 id 添加到数组中，并在每行发现一次或多次出现时在末尾打印它。

#!/usr/bin/python
import sys
import csv

def main():
    f = open("test_file.txt", 'rt')
    filereader = csv.reader(f, delimiter='      ', quotechar='"')
    for row in filereader:
        print row[0]
        print row.count('fantastic')

if __name__ == "__main__":
    main()

下面是一个非常小的示例集，我在其中添加了一些精彩的内容。

"6361"  "When will unit 2 be online? fantastic"   "cs101 unit2"   "100003292"     "<p>When will unit 2 be online?</p>"    "question"      "\N"    "\N"    "2012-02-26 15:47:12.522262+00" "0"     "(closed)"      "51919" "100003292"     "2012-03-03 10:12:27.41521+00"  "21196" "\N"    "\N"    "186"   "t"
"7185"  "Hungarian group"       "cs101 hungarian nationalities" "100003268"     "<p>Hi there! This is FANTASTIC</p>
<p>Any Hungarians doing the course? We could form a group!<br>
;)</p>" "question"      "\N"    "\N"    "2012-02-27 15:09:11.184434+00" "0"     ""      "\N"    "100003268"     "2012-02-27 15:09:11.184434+00" "9322"  "\N"    "\N"    "106"   "f"
"26454" "Course Application."   "cs101 application."    "100003192"     "<p>Please tell about the Course Application.  How to use the Course for higher education and jobs?</p>" "question"      "\N"    "\N"    "2012-03-08 08:34:06.704674+00" "-1"    ""      "\N"    "100003192"     "2012-03-08 08:34:06.704674+00" "34477" "\N"    "\N"    "73"    "f"

我期望输出为 6361, 7185

最佳答案

默认的引号字符已经是 " 所以你不需要指定它，但是如果你有一个制表符分隔的文件，则传入 '\t' > 作为分隔符将正确解释列。

您可以做的是构建一个生成器，根据子字符串 'fantastic' 是否出现在 ID 之后的任何列中来过滤行，然后使用列表理解来提取 ID，例如:

with open('test_file.txt') as fin:
    csvin = csv.reader(fin, delimiter='\t')
    has_fantastic = (row for row in csvin if any('fantastic' in col.lower() for col in row[1:]))
    ids = [row[0] for row in has_fantastic]

关于python - 使用 python 计算平面文件中字符串的出现次数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33013592/

python - 使用 python 计算平面文件中字符串的出现次数

上一篇：python - Pandas:将日期时间对象分配给时间间隔

下一篇：python - 设计我的票务 API