python - 将数据从数据帧映射到另一个数据帧以提取信息用户

我有两个数据框，一个包含喜欢歌曲的客户，另一个数据框包含用户及其集群。

数据1:

user    song
A   11
A   22
B   22
B   33
C   11
D   44
C   33
E   11
D   33

数据2:

user    cluster
A   1
B   2
C   1
D   2
E   1

我得到了集群听过的所有歌曲，如下所示。

cluster songs
1   11, 22, 33
2   22,33, 44

我希望它输出该特定集群的用户未收听的歌曲。

预期输出:

user    song
A   [33]
B   [44]
C   [11,22]
D   [22]
E   [22,33]

最佳答案

使用merge左连接和 drop_duplicates :

df = pd.merge(df1, df2, on='user', how='left').drop_duplicates(['cluster','song'])
print (df)
  user  song  cluster
0    A    11        1
1    A    22        1
2    B    22        2
3    B    33        2
5    C    33        1

然后聚合join，但之前需要将歌曲转换为字符串:

df = df['song'].astype(str).groupby(df['cluster']).apply(', '.join).reset_index()
print (df)
   cluster        song
0        1  11, 22, 33
1        2      22, 33

或者如果需要列表:

df = df.groupby('cluster')['song'].apply(list).reset_index()
#same as
#df = df['song'].groupby(df['cluster']).apply(list).reset_index()

print (df)
   cluster          song
0        1  [11, 22, 33]
1        2      [22, 33]

编辑:

df = pd.merge(df1, df2, on='user', how='left').drop_duplicates(['user','song'])
df1 = df.pivot('user','song', 'cluster')

df3 = df1.isnull().stack().reset_index(name='val')
df3 = df3[df3['val']].groupby('user')['song'].apply(list).reindex(df2['user'])
print (df3)
user
A        [33]
B        [11]
C        [22]
D        [11]
E    [22, 33]
Name: song, dtype: object

关于python - 将数据从数据帧映射到另一个数据帧以提取信息用户，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48922296/

上一篇：python - 为什么 from read_data import get_minibatch() 返回 ModuleNotFoundError : No module named 'read_data'

下一篇：python - 根据索引和列名逐个单元填充整个数据框？

python - 如何在 python 中将某些行存储在变量中？

python - 在数据框中创建一列，该列是汇总其他列中数据的字符串

python - 属性错误 : module 'pandas' has no attribute 'read_csv' Python3. 5

python - openpyxl:在 load_workbook() 上给出错误

python - 将字典转换为数据帧的一列，同时将字典行名称保留在另一列中(python)

python - XGBoost Python 错误 : "Size of labels must equal to number of rows"

python - 根据匹配值从不同的 Dataframe 更新 Dataframe

python - 从 Pandas 数据框中提取单个值

如果描述包含列表中的短语，Python Pandas 总结分数