我一直在为 mysql 连接而苦苦挣扎,但已经开始整合更多内容,但尽管阅读了数十篇教程和 mysql 手册,但仍难以理解。
我的情况是我有3张 table :
/* 基本上是一个保存粉丝记录的表 */
CREATE TABLE `fans` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `first_name` varchar(255) DEFAULT NULL, `middle_name` varchar(255) DEFAULT NULL, `last_name` varchar(255) DEFAULT NULL, `email` varchar(255) DEFAULT NULL, `join_date` datetime DEFAULT NULL, `twitter` varchar(255) DEFAULT NULL, `twitterCrawled` datetime DEFAULT NULL, `twitterImage` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `email` (`email`) ) ENGINE=MyISAM AUTO_INCREMENT=20413 DEFAULT CHARSET=latin1; /* A TABLE OF OUR TWITTER FOLLOWERS */ CREATE TABLE `twitterFollowers` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `screenName` varchar(25) DEFAULT NULL, `twitterId` varchar(25) DEFAULT NULL, `customerId` int(11) DEFAULT NULL, `uniqueStr` varchar(50) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `unique` (`uniqueStr`) ) ENGINE=InnoDB AUTO_INCREMENT=13426 DEFAULT CHARSET=utf8; /* TABLE THAT SUGGESTS A LIKELY MATCH OF A TWITTER FOLLOWER BASED ON THE EMAIL / SCREEN NAME COMPARISON OF THE FAN vs OUR FOLLOWERS IF SOMEONE (ie. a moderator) CONFIRMS OR DENIES THAT IT'S A GOOD MATCH THEY PUT A DATESTAMP IN `dismissed` */ CREATE TABLE `contentSuggestion` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT, `userId` int(11) DEFAULT NULL, `fanId` int(11) DEFAULT NULL, `twitterAccountId` int(11) DEFAULT NULL, `contentType` varchar(50) DEFAULT NULL, `contentString` varchar(255) DEFAULT NULL, `added` datetime DEFAULT NULL, `dismissed` datetime DEFAULT NULL, `uniqueStr` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `unstr` (`uniqueStr`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
What I'm trying to get is:
SELECT [fan columns] WHERE fan screen name IS IN twitterfollowers AND WHERE fan screen name IS NOT IN contentSuggestion (with a datestamp in dismissed)
My attempts so far:
~33 seconds
SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername LEFT JOIN contentSuggestion cs ON cs.contentString = tf.screenName WHERE dismissed IS NULL GROUP BY(fans.id) HAVING col1 != ''
~14 seconds
SELECT id, emailUsername FROM fans WHERE emailUsername IN(SELECT DISTINCT(screenName) FROM twitterFollowers) AND emailUsername NOT IN(SELECT DISTINCT(contentString) FROM contentSuggestion WHERE dismissed IS NULL) GROUP BY (fans.id);
9.53 seconds
SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername WHERE tf.uniqueStr NOT IN(SELECT uniqueStr FROM contentSuggestion WHERE dismissed IS NULL)
我希望有更好的方法。我一直在努力在单个 LEFT JOIN 之外真正使用 JOINS,这已经帮助我大大加快了其他查询的速度。
感谢你给与我的帮助。
最佳答案
我会选择第二种方法的变体。而不是 IN
, 使用 EXISTS
.然后添加正确的索引并删除聚合:
SELECT f.id, f.emailUsername
FROM fans f
WHERE EXISTS (SELECT 1
FROM twitterFollowers tf
WHERE f.emailUsername = tf.screenName
) AND
NOT EXISTS (SELECT 1
FROM contentSuggestion cs
WHERE f.emailUsername = cs.contentString AND
cs.dismissed IS NULL
) ;
然后确保您有以下索引:
twitterFollowers(screenName)
和 contentSuggestion(contentString, dismissed)
.一些注意事项:
IN
时, 不要使用 SELECT DISTINCT
.我不能 100% 确定 MySQL 总是足够聪明,可以忽略 DISTINCT
。在子查询中(它是多余的)。 EXISTS
比 IN
快在 MySQL 中。优化器在最近的版本中得到了改进。 然后确保您有以下索引:
twitterFollowers(screenName)
和 contentSuggestion(contentString, dismissed)
. fan.id
是唯一的(一个非常合理的假设),您不需要最终的 group by
. 关于php - 双左连接查询需要优化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33710061/