我正在尝试完成一门在线类(class),有一个问题是计算一个大文件中“fantastic”一词出现的次数。当发现某个事件时,需要存储该行的第一个元素(id)以构建包含该单词的行(id)列表。到目前为止,我有下面的内容可以正确读取行,但我不知道如何检查“fantastic”是否在该行的大写/小写中。我尝试过使用 row.count('fantastic')
这不起作用,因为我不确定 csv 阅读器如何存储行,如果我可以对它们进行计数,我可以将 id 添加到数组中,并在每行发现一次或多次出现时在末尾打印它。
#!/usr/bin/python
import sys
import csv
def main():
f = open("test_file.txt", 'rt')
filereader = csv.reader(f, delimiter=' ', quotechar='"')
for row in filereader:
print row[0]
print row.count('fantastic')
if __name__ == "__main__":
main()
下面是一个非常小的示例集,我在其中添加了一些精彩的内容。
"6361" "When will unit 2 be online? fantastic" "cs101 unit2" "100003292" "<p>When will unit 2 be online?</p>" "question" "\N" "\N" "2012-02-26 15:47:12.522262+00" "0" "(closed)" "51919" "100003292" "2012-03-03 10:12:27.41521+00" "21196" "\N" "\N" "186" "t"
"7185" "Hungarian group" "cs101 hungarian nationalities" "100003268" "<p>Hi there! This is FANTASTIC</p>
<p>Any Hungarians doing the course? We could form a group!<br>
;)</p>" "question" "\N" "\N" "2012-02-27 15:09:11.184434+00" "0" "" "\N" "100003268" "2012-02-27 15:09:11.184434+00" "9322" "\N" "\N" "106" "f"
"26454" "Course Application." "cs101 application." "100003192" "<p>Please tell about the Course Application. How to use the Course for higher education and jobs?</p>" "question" "\N" "\N" "2012-03-08 08:34:06.704674+00" "-1" "" "\N" "100003192" "2012-03-08 08:34:06.704674+00" "34477" "\N" "\N" "73" "f"
我期望输出为 6361, 7185
最佳答案
默认的引号字符已经是 "
所以你不需要指定它,但是如果你有一个制表符分隔的文件,则传入 '\t'
> 作为分隔符将正确解释列。
您可以做的是构建一个生成器,根据子字符串 'fantastic'
是否出现在 ID 之后的任何列中来过滤行,然后使用列表理解来提取 ID,例如:
with open('test_file.txt') as fin:
csvin = csv.reader(fin, delimiter='\t')
has_fantastic = (row for row in csvin if any('fantastic' in col.lower() for col in row[1:]))
ids = [row[0] for row in has_fantastic]
关于python - 使用 python 计算平面文件中字符串的出现次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33013592/