python-3.x - 如果两个字符串列是 Python 中另一个数据框中的一列的子字符串,则合并

标签 python-3.x pandas dataframe

给定两个数据框如下:

df1:

   id                                      address  price
0   1         8563 Parker Ave. Lexington, NC 27292      3
1   2         242 Bellevue Lane Appleton, WI 54911      3
2   3       771 Greenview Rd. Greenfield, IN 46140      5
3   4       93 Hawthorne Street Lakeland, FL 33801      6
4   5  8952 Green Hill Street Gettysburg, PA 17325      3
5   6    7331 S. Sherwood Dr. New Castle, PA 16101      4

df2:

  state            street  quantity
0    PA       S. Sherwood        12
1    IN  Hawthorne Street         3
2    NC       Parker Ave.         7

假设 df2 中的 statestreet 都包含在 df2< 中的 address,然后合并df2df1

我怎么能在 Pandas 中做到这一点?谢谢。

预期结果df:

   id                                      address  ...       street quantity
0   1         8563 Parker Ave. Lexington, NC 27292  ...  Parker Ave.     7.00
1   2         242 Bellevue Lane Appleton, WI 54911  ...          NaN      NaN
2   3       771 Greenview Rd. Greenfield, IN 46140  ...          NaN      NaN
3   4       93 Hawthorne Street Lakeland, FL 33801  ...          NaN      NaN
4   5  8952 Green Hill Street Gettysburg, PA 17325  ...          NaN      NaN
5   6    7331 S. Sherwood Dr. New Castle, PA 16101  ...  S. Sherwood    12.00

[6 rows x 6 columns]

我的测试代码:

df2['addr'] = df2['state'].astype(str) + df2['street'].astype(str)

pat = '|'.join(r'\b{}\b'.format(x) for x in df2['addr'])
df1['addr']= df1['address'].str.extract('\('+ pat + ')', expand=False)

df = df1.merge(df2, on='addr', how='left')

输出:

   id                                      address  ...  street_y quantity_y
0   1         8563 Parker Ave. Lexington, NC 27292  ...       NaN        nan
1   2         242 Bellevue Lane Appleton, WI 54911  ...       NaN        nan
2   3       771 Greenview Rd. Greenfield, IN 46140  ...       NaN        nan
3   4       93 Hawthorne Street Lakeland, FL 33801  ...       NaN        nan
4   5  8952 Green Hill Street Gettysburg, PA 17325  ...       NaN        nan
5   6    7331 S. Sherwood Dr. New Castle, PA 16101  ...       NaN        nan

[6 rows x 10 columns]

最佳答案

尝试:

pat_state = f"({'|'.join(df2['state'])})"
pat_street = f"({'|'.join(df2['street'])})"
df1['street'] = df1['address'].str.extract(pat=pat_street) 
df1['state'] = df1['address'].str.extract(pat=pat_state) 
df1.loc[df1['state'].isna(),'street'] = np.NAN
df1.loc[df1['street'].isna(),'state'] = np.NAN
df1 = df1.merge(df2, left_on=['state','street'], right_on=['state','street'], how ='left')

关于python-3.x - 如果两个字符串列是 Python 中另一个数据框中的一列的子字符串,则合并,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67415605/

相关文章:

python - 有什么方法可以根据 Pandas 中的特定条件在数据框的所有行中添加列名?

python - Pandas :分组

python - 将文本列拆分为最小值 :sec

python - 如何将计算列按多列分组?

python - 在seaborn中绘制单独的组时如何将数据作为一组包含

python - 在python中从键盘读取原始输入

使用 UTF-8 字符串写入文件时出现 Python 编解码器错误

python - 在 Python 中解析文件

python - 在 pandas python 中添加列

python - 读取带有负数的文本文件时出现问题