给定两个数据框如下:
df1:
id address price
0 1 8563 Parker Ave. Lexington, NC 27292 3
1 2 242 Bellevue Lane Appleton, WI 54911 3
2 3 771 Greenview Rd. Greenfield, IN 46140 5
3 4 93 Hawthorne Street Lakeland, FL 33801 6
4 5 8952 Green Hill Street Gettysburg, PA 17325 3
5 6 7331 S. Sherwood Dr. New Castle, PA 16101 4
df2:
state street quantity
0 PA S. Sherwood 12
1 IN Hawthorne Street 3
2 NC Parker Ave. 7
假设 df2
中的 state
和 street
都包含在 df2< 中的
,然后合并address
中df2
到df1
。
我怎么能在 Pandas 中做到这一点?谢谢。
预期结果df
:
id address ... street quantity
0 1 8563 Parker Ave. Lexington, NC 27292 ... Parker Ave. 7.00
1 2 242 Bellevue Lane Appleton, WI 54911 ... NaN NaN
2 3 771 Greenview Rd. Greenfield, IN 46140 ... NaN NaN
3 4 93 Hawthorne Street Lakeland, FL 33801 ... NaN NaN
4 5 8952 Green Hill Street Gettysburg, PA 17325 ... NaN NaN
5 6 7331 S. Sherwood Dr. New Castle, PA 16101 ... S. Sherwood 12.00
[6 rows x 6 columns]
我的测试代码:
df2['addr'] = df2['state'].astype(str) + df2['street'].astype(str)
pat = '|'.join(r'\b{}\b'.format(x) for x in df2['addr'])
df1['addr']= df1['address'].str.extract('\('+ pat + ')', expand=False)
df = df1.merge(df2, on='addr', how='left')
输出:
id address ... street_y quantity_y
0 1 8563 Parker Ave. Lexington, NC 27292 ... NaN nan
1 2 242 Bellevue Lane Appleton, WI 54911 ... NaN nan
2 3 771 Greenview Rd. Greenfield, IN 46140 ... NaN nan
3 4 93 Hawthorne Street Lakeland, FL 33801 ... NaN nan
4 5 8952 Green Hill Street Gettysburg, PA 17325 ... NaN nan
5 6 7331 S. Sherwood Dr. New Castle, PA 16101 ... NaN nan
[6 rows x 10 columns]
最佳答案
尝试:
pat_state = f"({'|'.join(df2['state'])})"
pat_street = f"({'|'.join(df2['street'])})"
df1['street'] = df1['address'].str.extract(pat=pat_street)
df1['state'] = df1['address'].str.extract(pat=pat_state)
df1.loc[df1['state'].isna(),'street'] = np.NAN
df1.loc[df1['street'].isna(),'state'] = np.NAN
df1 = df1.merge(df2, left_on=['state','street'], right_on=['state','street'], how ='left')
关于python-3.x - 如果两个字符串列是 Python 中另一个数据框中的一列的子字符串,则合并,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67415605/