python - 计算 csv 文件的列中字符串的出现次数

标签 python string csv pandas

我有一个很大的 csv 文件(超过 66k 行),我想计算字符串在每行中出现的次数。我特别关注一列,该列中的每一行都有一个小句子,如下所示:

Example of data:
Sam ate an apple and she felt great
Jill thinks the sky is purple but Bob says it's blue
Ralph wants to go apple picking this fall

我知道如何对文本文件执行此操作,但我很难将相同的技术应用于 csv。我一直在使用 pandas 并尝试了几种方法,但它们返回错误代码或空数据帧。

my_file = "NEISS2014.csv"
df = pandas.read_csv(my_file)

df.groupby(df['sentence'].map(lambda x:'apple' if 'apple' in x else x)).sum()
df[df['sentence'].str.contains("apple") == True]




print df
#                                            sentence
#0    Sam ate an apple and she felt great apple apple
#1  Jill thinks the sky is purple but Bob says it'...
#2          Ralph wants to go apple picking this fall

print df.columns
#Index([u'sentence'], dtype='object')

df['count'] = df['sentence'].str.count('apple')
print df
#                                            sentence  count
#0    Sam ate an apple and she felt great apple apple      3
#1  Jill thinks the sky is purple but Bob says it'...      0
#2          Ralph wants to go apple picking this fall      1

