python - 使用 Pandas 提取包含特定字符的数据

标签 python string python-3.x pandas dataframe

我想在另一列中提取包含特定字符串的数据。

例如，提取的目标是“另一列字符串+3位数字”字符。
它有错误。我想获取 TARGET 行。

df = pd.DataFrame({'col1':['xxxx', 'yyyy', 'zzzz'],'col2':['xxxx123','yyyy1234','aaa123']})

col1 | col2
xxxx | xxxx123 <- TARGET 
yyyy | yyyy1234  <- Not TARGET
zzzz | aaaa123  <- Not TARGET

这是我的代码，它不起作用。

print(df[df['col1'].str.match(df['col2'] + [0-9][0-9][0-9])])

我已经尝试过 str.contains 和 str.match 以及 isin。可能我不知道如何使用它们。

请告诉我怎么做。

最佳答案

两个模式匹配，并过滤dataframe

cond1 = df.col2.str.extract('([A-Za-z]+)\d', expand = False).eq(df.col1)
cond2 = df.col2.str.extract('[A-Za-z](\d{3})$', expand = False)

df[(cond1) & (cond2)]

    col1    col2
0   xxxx    xxxx123@gmail.com

关于python - 使用 Pandas 提取包含特定字符的数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53056395/

上一篇：python - 相当于 Python 3 中的 thread.interrupt_main()

下一篇：python - 制作 Pandas 系列的直方图

相关文章：

python - 按顺序对要上传的文件列表进行排序

python - Pyramid .httpexceptions.HTTPNotFound : The resource could not be found

python - 从作为 python 脚本的 Linux 进程调用类方法

python - 每次添加新的 Google Sheet 行时都会触发 AWS Lambda

java - 字符串异常

java - 扫描仪JAVA的某些要求

ruby - 每行读取固定数量的管道分隔字段？

python - 关于使用类型注释时python3中的python递归导入

python-3.x - 图像未保存在打开的简历写入功能中

Python 网页抓取 : Image incomplete when using urllib