python - 根据子字符串列表过滤 Pandas 数据框

我有一个包含多列字符串的 Pandas Dataframe。我现在想根据允许的子字符串列表检查特定列，然后获取包含结果的新子集。

substr = ['A', 'C', 'D']
df = pd.read_excel('output.xlsx')
df = df.dropna()
# now filter all rows where the string in the 2nd column doesn't contain one of the substrings

我发现的唯一方法是创建相应列的列表，然后进行列表理解，但随后我松开了其他列。我可以使用列表理解作为例如的一部分吗？ df.str.contains()？

year  type     value   price
2000  ty-A     500     10000
2002  ty-Q     200     84600
2003  ty-R     500     56000
2003  ty-B     500     18000
2006  ty-C     500     12500
2012  ty-A     500     65000
2018  ty-F     500     86000
2019  ty-D     500     51900

预期输出:

year  type     value   price
2000  ty-A     500     10000
2006  ty-C     500     12500
2012  ty-A     500     65000
2019  ty-D     500     51900

最佳答案

您可以使用pandas.Series.isin

>>> df.loc[df['type'].isin(substr)]
   year type  value  price
0  2000    A    500  10000
4  2006    C    500  12500
5  2012    A    500  65000
7  2019    D    500  51900

关于python - 根据子字符串列表过滤 Pandas 数据框，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57785687/

上一篇：laravel - 如何在 Laravel 本地化中的路由和 Controller 中发送 id

下一篇：ansible - 如何在ansible中仅发送文件路径？

相关文章：

python - 在 Spark 中堆叠 ML 算法

python - Flask + SqlAlchemy 修改表

python 导入依赖于导入类的方法

python - 从 groupby pandas 中获取 MultiIndex

python - Pandas 数据框 | groupby 绘图 |堆叠图和并排图

python - 从现有数据框创建多索引

python - 如何在 Python (Flask) 应用程序中捕获 psycopg2.errors.UniqueViolation 错误？

python - Pandas pct_change() 反向

python - 使用 Pandas 计算日期时间行平均值的最快方法

python - 使用 matplotlib 绘制缩放和旋转的二元分布