我有一个案例,我试图将计算出的百分比值以可理解的格式附加到我的数据帧中名为 df 的列中。当我说可理解的格式时,列中的输出应类似于 '40% Matched'
,如下例所示。
df = pd.DataFrame({ 'Col1':[['Phone', 'Watch', 'Pen', 'Pencil', 'Knife'],['apple','orange','mango','cherry','banana','kiwi','tomato','avocado']], 'Col2': [['Phone', 'Watch', 'Pen', 'Pencil', 'fork'],['orange','avocado','kiwi','mango','grape','lemon','tomato']]})
df['Matched Percent'] = 'No Match'
for index,(lst1,lst2) in enumerate(zip(df['Col1'],df['Col2'])):
if(lst1 == lst2):
print('100% Matched')
else:
c1 = Counter(lst1)
c2 = Counter(lst2)
matching = {k: c1[k]+c2[k] for k in c1.keys() if k in c2}
text = '% Matched'
if len(lst1) > len(lst2):
out = round(len(matching)/len(lst1)*100)
#df['Matched Percent'].append(out,'% Matched')
print(out,'% Matched')
else:
out = round(len(matching)/len(lst2)*100)
#df['Matched Percent'].append(out,'% Matched')
print(out,'% Matched')
80 % Matched
62 % Matched
TypeError: cannot concatenate object of type "<class 'int'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
我不断收到类型错误。我尝试了几种方法但没有运气。我可以按照我想要的方式在屏幕上打印这些值,如上所示。但是当我将它附加到我的数据框 df 时它失败了。感谢有关如何解决此问题的建议。
最佳答案
你的逻辑看起来很冗长。您可以使用列表理解:
zipper = zip(map(set, df['Col1']), map(set, df['Col2']))
df['Matched Percent'] = [len(c1 & c2) / max(len(c1), len(c2)) for c1, c2 in zipper]
print(df)
Col1 \
0 [Phone, Watch, Pen, Pencil, Knife]
1 [apple, orange, mango, cherry, banana, kiwi, t...
Col2 Matched Percent
0 [Phone, Watch, Pen, Pencil, fork] 0.800
1 [orange, avocado, kiwi, mango, grape, lemon, t... 0.625
请注意,使用 Pandas 优化此类计算的空间不大,因为 Pandas 的设计目的不是串联列表。如果您需要“漂亮”的输出,可以使用 Python 3.6+ 中支持的 f 字符串:
print((df['Matched Percent']*100).map(lambda x: f'{x:.0f}% Matched'))
0 80% Matched
1 62% Matched
Name: Matched Percent, dtype: object
关于python - 在数据框中附加带有字符串值的列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52661564/