python-3.x - 如果两个字符串列是 Python 中另一个数据框中的一列的子字符串，则合并

给定两个数据框如下:

df1:

   id                                      address  price
0   1         8563 Parker Ave. Lexington, NC 27292      3
1   2         242 Bellevue Lane Appleton, WI 54911      3
2   3       771 Greenview Rd. Greenfield, IN 46140      5
3   4       93 Hawthorne Street Lakeland, FL 33801      6
4   5  8952 Green Hill Street Gettysburg, PA 17325      3
5   6    7331 S. Sherwood Dr. New Castle, PA 16101      4

df2:

  state            street  quantity
0    PA       S. Sherwood        12
1    IN  Hawthorne Street         3
2    NC       Parker Ave.         7

假设 df2 中的 state 和 street 都包含在 df2< 中的 address 中，然后合并df2到df1。

我怎么能在 Pandas 中做到这一点？谢谢。

预期结果df:

   id                                      address  ...       street quantity
0   1         8563 Parker Ave. Lexington, NC 27292  ...  Parker Ave.     7.00
1   2         242 Bellevue Lane Appleton, WI 54911  ...          NaN      NaN
2   3       771 Greenview Rd. Greenfield, IN 46140  ...          NaN      NaN
3   4       93 Hawthorne Street Lakeland, FL 33801  ...          NaN      NaN
4   5  8952 Green Hill Street Gettysburg, PA 17325  ...          NaN      NaN
5   6    7331 S. Sherwood Dr. New Castle, PA 16101  ...  S. Sherwood    12.00

[6 rows x 6 columns]

我的测试代码:

df2['addr'] = df2['state'].astype(str) + df2['street'].astype(str)

pat = '|'.join(r'\b{}\b'.format(x) for x in df2['addr'])
df1['addr']= df1['address'].str.extract('\('+ pat + ')', expand=False)

df = df1.merge(df2, on='addr', how='left')

输出:

   id                                      address  ...  street_y quantity_y
0   1         8563 Parker Ave. Lexington, NC 27292  ...       NaN        nan
1   2         242 Bellevue Lane Appleton, WI 54911  ...       NaN        nan
2   3       771 Greenview Rd. Greenfield, IN 46140  ...       NaN        nan
3   4       93 Hawthorne Street Lakeland, FL 33801  ...       NaN        nan
4   5  8952 Green Hill Street Gettysburg, PA 17325  ...       NaN        nan
5   6    7331 S. Sherwood Dr. New Castle, PA 16101  ...       NaN        nan

[6 rows x 10 columns]

最佳答案

尝试:

pat_state = f"({'|'.join(df2['state'])})"
pat_street = f"({'|'.join(df2['street'])})"
df1['street'] = df1['address'].str.extract(pat=pat_street) 
df1['state'] = df1['address'].str.extract(pat=pat_state) 
df1.loc[df1['state'].isna(),'street'] = np.NAN
df1.loc[df1['street'].isna(),'state'] = np.NAN
df1 = df1.merge(df2, left_on=['state','street'], right_on=['state','street'], how ='left')

关于python-3.x - 如果两个字符串列是 Python 中另一个数据框中的一列的子字符串，则合并，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67415605/

python-3.x - 如果两个字符串列是 Python 中另一个数据框中的一列的子字符串，则合并

上一篇：python - 使用 tempmute 命令时机器人取消成员静音

下一篇：javascript - Elm:从 Browser.element 切换到 Browser.application - JS 方面需要进行哪些更改？