我有一个很大的 csv 文件(超过 66k 行),我想计算字符串在每行中出现的次数。我特别关注一列,该列中的每一行都有一个小句子,如下所示:
Example of data:
Sam ate an apple and she felt great
Jill thinks the sky is purple but Bob says it's blue
Ralph wants to go apple picking this fall
我知道如何对文本文件执行此操作,但我很难将相同的技术应用于 csv。我一直在使用 pandas 并尝试了几种方法,但它们返回错误代码或空数据帧。
Attempts:
my_file = "NEISS2014.csv"
df = pandas.read_csv(my_file)
df.groupby(df['sentence'].map(lambda x:'apple' if 'apple' in x else x)).sum()
df[df['sentence'].str.contains("apple") == True]
如果有人能帮我调试这个,我将不胜感激!
最佳答案
我认为你可以使用str.count
与列句子
:
print df
# sentence
#0 Sam ate an apple and she felt great apple apple
#1 Jill thinks the sky is purple but Bob says it'...
#2 Ralph wants to go apple picking this fall
print df.columns
#Index([u'sentence'], dtype='object')
df['count'] = df['sentence'].str.count('apple')
print df
# sentence count
#0 Sam ate an apple and she felt great apple apple 3
#1 Jill thinks the sky is purple but Bob says it'... 0
#2 Ralph wants to go apple picking this fall 1
关于python - 计算 csv 文件的列中字符串的出现次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36905967/