我目前有一个 pandas 数据框,其中有许多关于单个问题的答案,所以我试图将它变成一个列表,以便我可以进行余弦相似度计算。
目前我有数据框,其中问题通过 parent_id = q_id 与答案连接,如图所示:
many answers to one question dataframe
print (df)
q_id q_body parent_id a_body
0 1 question 1 1 answer 1
1 1 question 1 1 answer 2
2 1 question 1 1 answer 3
3 2 question 2 2 answer 1
4 2 question 2 2 answer 2
我要找的产品是:
("问题1","答案1","答案2","答案3")
("问题2","答案1","答案2")
任何帮助将不胜感激!非常感谢。
最佳答案
我想你需要groupby
使用应用
:
#output is tuple with question value
df = df.groupby('q_body')['a_body'].apply(lambda x: tuple([x.name] + list(x)))
print (df)
q_body
question 1 (question 1, answer 1, answer 2, answer 3)
question 2 (question 2, answer 1, answer 2)
Name: a_body, dtype: object
#output is list with question value
df = df.groupby('q_body')['a_body'].apply(lambda x: [x.name] + list(x))
print (df)
q_body
question 1 [question 1, answer 1, answer 2, answer 3]
question 2 [question 2, answer 1, answer 2]
Name: a_body, dtype: object
#output is list without question value
df = df.groupby('q_body')['a_body'].apply(list)
print (df)
q_body
question 1 [answer 1, answer 2, answer 3]
question 2 [answer 1, answer 2]
Name: a_body, dtype: object
#grouping by parent_id without question value
df = df.groupby('parent_id')['a_body'].apply(list)
print (df)
parent_id
1 [answer 1, answer 2, answer 3]
2 [answer 1, answer 2]
Name: a_body, dtype: object
#output is string, values are concanecated by ,
df = df.groupby('parent_id')['a_body'].apply(', '.join)
print (df)
parent_id
1 answer 1, answer 2, answer 3
2 answer 1, answer 2
Name: a_body, dtype: object
但如果需要输出为列表添加tolist
:
L = df.groupby('q_body')['a_body'].apply(lambda x: tuple([x.name] + list(x))).tolist()
print (L)
[('question 1', 'answer 1', 'answer 2', 'answer 3'), ('question 2', 'answer 1', 'answer 2')]
关于python - 如何将 pandas dataframe 转换为具有多对一关系的有序列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42663646/