我有一个数据框 df 如下
d = {'letter_num' :['Nr. 1', 'Nr. 2', 'Nr. 3', 'Nr. 3']}
df = pd.DataFrame(d)
print(df)
letter_num
0 Nr. 1
1 Nr. 2
2 Nr. 3
3 Nr. 3
letters = pd.DataFrame(d, columns=['letter_num'])
我想将以下字典的键和值作为新列添加到上述数据框中,条件是键(中的数字)与 df 中 letter_num 列中的现有(数字)值匹配。
labels = {'[1]': 'budget', '[2]': 'budget', '[3 a]': 'expensive', '[3 b]': 'sport'}
def apply_and_concat(dataframe, field, func, column_names):
return pd.concat((
dataframe,
dataframe[field].apply(
lambda cell: pd.Series(func(cell), index=column_names))), axis=1)
def matcher(k):
for i,j in labels.items():
num = re.search('(\d+)', i).group()
if num in k.split(' '):
return i,j
apply_and_concat(df, 'letter_num', matcher, ['letters','content'])
上面的代码给出的输出如下:
letter_num letters content
0 Nr. 1 [1] budget
1 Nr. 2 [2] budget
2 Nr. 3 [3 a] expensive
3 Nr. 3 [3 a] expensive
Expected Output:
letter_num letters content
0 Nr. 1 [1] budget
1 Nr. 2 [2] budget
2 Nr. 3 [3 a] expensive
3 Nr. 3 [3 b] sport
有人可以帮我吗?
最佳答案
使用有点不同的方法 - 想法是通过 labels
创建新的 DataFrame
,通过 Series.str.extract
将数字获取到新的 Series
主要通过 GroupBy.cumcount
添加它们的计数器.
在此解决方案中,通过 Series.str.cat
连接在一起并设置为两者的索引,所以最后可以使用 DataFrame.join
:
d = {'letter_num' :['Nr. 1', 'Nr. 2', 'Nr. 3', 'Nr. 3']}
letters = pd.DataFrame(d, columns=['letter_num'])
labels = {'[1]': 'budget', '[2]': 'budget', '[3 a]': 'expensive', '[3 b]': 'sport'}
df1 = pd.DataFrame({(k, v) for k, v in labels.items()}, columns=['letters','content'])
num = df1['letters'].str.extract(r'(\d+)', expand=False)
df1.index = df1.groupby(num).cumcount().astype(str).str.cat(num, sep='|')
print (df1)
letters content
0|3 [3 a] expensive
0|2 [2] budget
0|1 [1] budget
1|3 [3 b] sport
<小时/>
df = pd.DataFrame(d)
num = df['letter_num'].str.extract(r'(\d+)', expand=False)
df.index = df.groupby(num).cumcount().astype(str).str.cat(num, sep='|')
print (df)
letter_num
0|1 Nr. 1
0|2 Nr. 2
0|3 Nr. 3
1|3 Nr. 3
<小时/>
df = df.join(df1).reset_index(drop=True)
print (df)
letter_num letters content
0 Nr. 1 [1] budget
1 Nr. 2 [2] budget
2 Nr. 3 [3 a] expensive
3 Nr. 3 [3 b] sport
或者创建新列并使用 DataFrame.merge
左连接:
d = {'letter_num' :['Nr. 1', 'Nr. 2', 'Nr. 3', 'Nr. 3']}
letters = pd.DataFrame(d, columns=['letter_num'])
labels = {'[1]': 'budget', '[2]': 'budget', '[3 a]': 'expensive', '[3 b]': 'sport'}
df1 = pd.DataFrame({(k, v) for k, v in labels.items()}, columns=['letters','content'])
df1['num'] = df1['letters'].str.extract(r'(\d+)', expand=False)
df1['g'] = df1.groupby('num').cumcount()
print (df1)
letters content num g
0 [3 a] expensive 3 0
1 [2] budget 2 0
2 [1] budget 1 0
3 [3 b] sport 3 1
<小时/>
df = pd.DataFrame(d)
#print (df)
df['num'] = df['letter_num'].str.extract(r'(\d+)', expand=False)
df['g'] = df.groupby('num').cumcount()
print (df)
letter_num num g
0 Nr. 1 1 0
1 Nr. 2 2 0
2 Nr. 3 3 0
3 Nr. 3 3 1
<小时/>
df = df.merge(df1, on=['num','g'], how='left')
print (df)
letter_num num g letters content
0 Nr. 1 1 0 [1] budget
1 Nr. 2 2 0 [2] budget
2 Nr. 3 3 0 [3 a] expensive
3 Nr. 3 3 1 [3 b] sport
df = df.drop(['num','g'], axis=1)
print (df)
letter_num letters content
0 Nr. 1 [1] budget
1 Nr. 2 [2] budget
2 Nr. 3 [3 a] expensive
3 Nr. 3 [3 b] sport
关于python - 当多个匹配时使用 apply() 将新值添加到数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58444231/