python - 如何将 pandas dataframe 转换为具有多对一关系的有序列表?

标签 python list pandas many-to-one

我目前有一个 pandas 数据框,其中有许多关于单个问题的答案,所以我试图将它变成一个列表,以便我可以进行余弦相似度计算。

目前我有数据框,其中问题通过 parent_id = q_id 与答案连接,如图所示:

many answers to one question dataframe

print (df)
   q_id      q_body  parent_id    a_body
0     1  question 1          1  answer 1
1     1  question 1          1  answer 2
2     1  question 1          1  answer 3
3     2  question 2          2  answer 1
4     2  question 2          2  answer 2

我要找的产品是:

("问题1","答案1","答案2","答案3")

("问题2","答案1","答案2")

任何帮助将不胜感激!非常感谢。

最佳答案

我想你需要groupby使用应用:

#output is tuple with question value
df = df.groupby('q_body')['a_body'].apply(lambda x: tuple([x.name] + list(x)))
print (df)
q_body
question 1    (question 1, answer 1, answer 2, answer 3)
question 2              (question 2, answer 1, answer 2)
Name: a_body, dtype: object

#output is list with question value
df = df.groupby('q_body')['a_body'].apply(lambda x: [x.name] + list(x))
print (df)
q_body
question 1    [question 1, answer 1, answer 2, answer 3]
question 2              [question 2, answer 1, answer 2]
Name: a_body, dtype: object
#output is list without question value
df = df.groupby('q_body')['a_body'].apply(list)
print (df)
q_body
question 1    [answer 1, answer 2, answer 3]
question 2              [answer 1, answer 2]
Name: a_body, dtype: object

#grouping by parent_id without question value
df = df.groupby('parent_id')['a_body'].apply(list)
print (df)
parent_id
1    [answer 1, answer 2, answer 3]
2              [answer 1, answer 2]
Name: a_body, dtype: object

#output is string, values are concanecated by ,
df = df.groupby('parent_id')['a_body'].apply(', '.join)
print (df)
parent_id
1    answer 1, answer 2, answer 3
2              answer 1, answer 2
Name: a_body, dtype: object

但如果需要输出为列表添加tolist :

L = df.groupby('q_body')['a_body'].apply(lambda x: tuple([x.name] + list(x))).tolist()
print (L)
[('question 1', 'answer 1', 'answer 2', 'answer 3'), ('question 2', 'answer 1', 'answer 2')]

关于python - 如何将 pandas dataframe 转换为具有多对一关系的有序列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42663646/

相关文章:

python - Pyglet 处理程序和删除的对象

python - 如何使用 python 和 basemap 绘制不规则间隔的 RGB 图像?

python - 将字典转换为平面数据结构(列表或元组)的有效方法

java - 在Java中用n个空的一维列表初始化多维列表

Python 取消透视具有重复列名的数据框

python - 如何更改 matplotlib 在绘制时间戳对象时使用的步长?

python - TensorFlow:计算 Hessian 矩阵(和高阶导数)

python - Tensorflow:保存和恢复 session - 多个变量

python - Python 中列表值的最大和最小上限

python - Pandas - 根据列值移动某些行的特定列