python - 连接列中的单词列表

标签 python pandas join

是否可以在pandas中加入单词?我有一个单词列表,我正在尝试再次将它们变成短语

数据

0    [hello, she, can, seem, to, form, something, like, a, coherent,...
1    [not, any, more,...
2    [it, is, unclear, if, any, better, deal,...
3    [but, few, in, her, party, seem, inclined ...
4    [it, is, unclear, if, the, basic, conditions, for, any,...
Name: Data, dtype: object

stop_words = set(stopwords.words('english'))

#new words
new_stopwords = {'hello'}

new_list = stop_words.union(new_stopwords)

#remove from NLTK stop list
not_stopwords = {'no', 'not, 'any'}

stopwords_list = set([word for word in new_list if word not in not_stopwords])

df['Data'] = df['Data'].' '.join([wrd for wrd in Data if wrd not in stopwords_list])

输出:

File "<ipython-input-281-498b9daa386f>", line 1
    df['Description_pretraites'] = df['Description_pretraites'].' '.join([wrd for wrd in replace_hour_token if wrd not in stopwords_list])
                                                              ^
SyntaxError: invalid syntax

良好的输出

0    [can seem form something like coherent...
1    [not any more...
2    [is unclear any better deal...
3    [few party seem inclined ...
4    [is unclear basic conditions any...
Name: Data, dtype: object

据我所知,在 pandas 中,连接用于连接列。但是是否可以在一列中进行连接?

最佳答案

.apply 与生成器一起使用:

df['Data']=df['Data'].apply(lambda x: ' '.join(wrd for wrd in x if wrd not in stopwords_list))

或嵌套列表理解:

df['Data'] =  [' '.join(wrd for wrd in x if wrd not in stopwords_list) for x in df['Data']]

示例:

d = {'Data':[['hello', 'she', 'can'],
             ['not', 'no', 'more', 'to']]}
df = pd.DataFrame(data=d)
print (df)
                  Data
0    [hello, she, can]
1  [not, no, more, to]

stopwords_list = set(['no','not'])
df['Data'] =  [' '.join(wrd for wrd in x if wrd not in stopwords_list) for x in df['Data']]
print (df)
            Data
0  hello she can
1        more to

关于python - 连接列中的单词列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54196953/

相关文章:

python - 如何在 django orm 中使用过滤器值作为变量

python - 将以逗号分隔的字符串的pandas列转换为虚拟变量

php - 将数据插入交集表

python - set_context 和 set_style 的 seaborn rc 参数

python - 读取和写入 csv 文件的最佳方式 : pandas functions vs csv library?

Python多线程并没有提高速度

python - 用平均值替换值

sql - 按状态获取每个位置的数量

SQL JOIN 值小于或等于数字

python - Pandas 数据框分组