python - 在 Pandas 列中搜索其他列中的子字符串

标签 python string pandas dataframe substring

我有一个例子.csv，导入为df.csv，如下:

    Ethnicity, Description
  0 French, Irish Dance Company
  1 Italian, Moroccan/Algerian
  2 Danish, Company in Netherlands
  3 Dutch, French
  4 English, EnglishFrench
  5 Irish, Irish-American

我想检查 pandas test1['Description'] 中的字符串 test1['Ethnicity']。这应该返回第 0、3、4 和 5 行，因为描述字符串包含种族列中的字符串。

到目前为止我已经尝试过:

df[df['Ethnicity'].str.contains('French')]['Description']

这会返回任何特定的字符串，但我想遍历而不搜索每个特定的种族值。我也尝试过将列转换为列表并进行迭代，但似乎无法找到返回行的方法，因为它不再是 DataFrame()。

提前致谢!

最佳答案

您可以使用 str.contains Ethnicity 列中的值转换为 tolist然后通过 | join regex 中的内容或:

print ('|'.join(df.Ethnicity.tolist()))
French|Italian|Danish|Dutch|English|Irish

mask = df.Description.str.contains('|'.join(df.Ethnicity.tolist()))
print (mask)
0     True
1    False
2    False
3     True
4     True
5     True
Name: Description, dtype: bool

#boolean-indexing
print (df[mask])
  Ethnicity          Description
0    French  Irish Dance Company
3     Dutch               French
4   English        EnglishFrench
5     Irish       Irish-American

看起来你可以省略 tolist():

print (df[df.Description.str.contains('|'.join(df.Ethnicity))])
  Ethnicity          Description
0    French  Irish Dance Company
3     Dutch               French
4   English        EnglishFrench
5     Irish       Irish-American

关于python - 在 Pandas 列中搜索其他列中的子字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38128353/

上一篇：python - Plotly python 如何绘制无界线和跨度？

下一篇：python - 加载文本数据文件

相关文章：

python - 如何在python中使用shapely计算重心？

python - 根据同一列的先前值更改Python数据框中的列

Python字典，如果在列表中多次出现，则向键添加新值

python - 如何将top命令的输出保存为XML格式？

c - 在 C 中反向打印字符串

python - 在Python中如何检查空字符串是否在单词中评估为true？

php - 拆分字符串时强制使用两个数组元素的优雅方法

python - 如何在python中获取groupby函数内的计数

python - 为列中的每个因子级别创建新的数据框

python - python + opencv-如何绘制hsv范围？