mysql - 如何编写多个Inner Join的SQL查询？

示例记录:

    Row(user_id='KxGeqg5ccByhaZfQRI4Nnw', gender='male', year='2015', month='September', day='20', 
hour='16', weekday='Sunday', reviewClass='place love back', business_id='S75Lf-Q3bCCckQ3w7mSN2g', 
business_name='Notorious Burgers', city='Scottsdale', categories='Nightlife, American (New), Burgers, 
Comfort Food, Cocktail Bars, Restaurants, Food, Bars, American (Traditional)', user_funny='1', 
review_sentiment='Positive', friend_id='my4q3Sy6Ei45V58N2l8VGw')

该表有超过 1 亿条记录。我的 SQL 查询正在执行以下操作:

选择访问特定商家的特定用户的 friend (friend_id) 中出现次数最多的 review_sentiment 以及 friend 中出现次数最多的性别

friend_id 最终是 user_id

示例场景:

一名用户
已访问 4 家企业
有 10 个 friend
其中 5 位 friend 曾访问过商家 1 和 2，而其他 5 位 friend 则访问过商家 1 和 2 仅访问了第三家企业，没有人访问过第四家
现在，对于业务 1 和 2，这 5 位 friend 的积极态度比对 B1 的负面情绪，并且对 B1 的 -ve 情绪多于 +ve 情绪 B2 和 B3 的所有 -ve

我想要以下输出:

**user_id | business_id | friend_common_sentiment | mostCommonGender | .... otherCols**

user_id_1 | business_id_1 | positive | male | .... otherCols
user_id_1 | business_id_2 | negative | female | .... otherCols
user_id_1 | business_id_3 | negative | female | .... otherCols

这是我在 pyspark 中为此编写的一个简单查询:

SELECT user_id, gender, year, month, day, hour, weekday, reviewClass, business_id, business_name, city, 
categories, user_funny, review_sentiment FROM events1 GROUP BY user_id, friend_id, business_id ORDER BY 
COUNT(review_sentiment DESC LIMIT 1

此查询不会给出预期的结果，但我不确定如何准确地将 INNER-JOIN 放入其中？

最佳答案

人类的数据结构确实让事情变得困难。但让我们把它分成几个步骤，

您需要自行加入才能获取好友的数据
获得 friend 的数据后，执行聚合函数来获取每个可能值的计数，并按用户和业务进行分组
对上述内容进行子查询，以便根据计数在值之间做出决策。

我只是将你的表称为“标签”，所以连接如下，遗憾的是就像在现实生活中一样，我们不能假设每个人都有 friend ，而且因为你没有指定永远排除孤独的人群，我们需要使用左连接来保持没有 friend 的用户。

From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
    and friends.business_id = user.business_id

接下来，您必须找出给定用户和业务组合最常见的性别/评论是什么。这就是数据结构真正让我们感到震惊的地方，我们可以使用一些巧妙的窗口函数一步完成此操作，但我希望这个答案易于理解，所以我将使用子查询和案例声明。为了简单起见，我假设二元性别，但根据应用程序的唤醒级别，您可以对其他性别遵循相同的模式。

select user.user_id, user.business_id
, sum(case when friends.gender = 'Male' then 1 else 0 end) as MaleFriends
, sum(case when friends.gender = 'Female' then 1 else 0 end) as FemaleFriends
, sum(case when friends.review_sentiment = 'Positive' then 1 else 0 end) as FriendsPositive
, sum(case when friends.review_sentiment = 'Negative' then 1 else 0 end) as FriendsNegative
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
  and friends.business_id = user.business_id
where user.business_id = <<your business id here>>
group by user.user_id, user.business_id

现在我们只需要从子查询中获取数据并做出一些决定，您可能需要添加一些额外的选项，例如您可能需要添加选项，以防没有 friend ，或者 friend 平均分配性别/情感。与下面相同的模式，但有额外的值可供选择。

select user_id
, business_id
, case when MaleFriends > than FemaleFriends then 'Male' else 'Female' as MostCommonGender
, case when FriendsPositive > FriendsNegative then 'Positive' else 'Negative' as MostCommonSentiment
from (    select user.user_id, user.business_id
, sum(case when friends.gender = 'Male' then 1 else 0 end) as MaleFriends
, sum(case when friends.gender = 'Female' then 1 else 0 end) as FemaleFriends
, sum(case when friends.review_sentiment = 'Positive' then 1 else 0 end) as FriendsPositive
, sum(case when friends.review_sentiment = 'Negative' then 1 else 0 end) as FriendsNegative
From tags as user
left outer join tags as friends on user.friend_id = friends.user_id
  and friends.business_id = user.business_id
where user.business_id = <<your business id here>>
group by user.user_id, user.business_id) as a

这为您提供了要遵循的步骤，并希望对它们的工作原理有清晰的解释。祝你好运!

关于mysql - 如何编写多个Inner Join的SQL查询？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59200109/

mysql - 如何编写多个Inner Join的SQL查询？

上一篇：Mysql 查询性能(2 个查询之间的差异)

下一篇：php - 使用PHP从数据库中检索多个图像