python - 为什么来自 Datagrip 的相同 SQL 查询和嵌入在 SQL 中的 Python 代码中的输出不同?

标签 python sql jupyter-notebook datagrip

无论是通过纯 SQL 在 Datagrip 上进行查询,还是在 Jupyter 上使用嵌套在某些 Python 代码中的 SQL 查询进行查询,我都会得到两个不同的输出。

查询是来自多个用户组的 COUNT 个特定状态。
以下是本案例的用户组:

                              ids
grupos                           
0       [160, 161, 365, 386, 471]
1                      [296, 306] 

我的数据库中的表是这样的:

代码:

来自 Jupyter:

for i, ids in enumerate(res['ids']):
    cur.execute("""SELECT COUNT(swipe.eclipse_id), 
    subscriber_hashtag.hashtag_id
    FROM subscriber_hashtag
    INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
    LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
    WHERE (swipe.state= 3 OR swipe.state = 6 or swipe.state=9) AND subscriber_hashtag.subscriber_id in %s
    GROUP BY subscriber_hashtag.hashtag_id
        ORDER BY COUNT(swipe.eclipse_id) DESC;""",
    (tuple(res.iloc[i]['ids']),))

n = cur.fetchall()
listado = [{"count": elem[0], "eclipse_id": elem[1]} for elem in n]

来自 Datagrip:

我必须分离查询,否则会混合结果。

SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id  FROM subscriber_hashtag
  INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
  LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
  WHERE (swipe.state= 3 OR swipe.state = 6 or swipe.state=9) AND subscriber_hashtag.subscriber_id = 160 OR subscriber_hashtag.subscriber_id = 161 OR subscriber_hashtag.subscriber_id = 365 OR subscriber_hashtag.subscriber_id = 386 OR subscriber_hashtag.subscriber_id = 471 OR subscriber_hashtag.subscriber_id = 499
    GROUP BY subscriber_hashtag.hashtag_id
      ORDER BY COUNT(swipe.eclipse_id) DESC;

然后

SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id  FROM subscriber_hashtag
  INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
  LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
  WHERE (swipe.state= 3 OR swipe.state = 6 or swipe.state=9) AND subscriber_hashtag.subscriber_id = 296 OR subscriber_hashtag.subscriber_id = 306
    GROUP BY subscriber_hashtag.hashtag_id
      ORDER BY COUNT(swipe.eclipse_id) DESC;

输出:

来自 Jupyter:

[(1500L, 996), (1185L, 592), (480L, 1214), (432L, 329), (375L, 398), (306L, 357), (300L, 473), (288L, 325), (225L, 322), (207L, 321), (207L, 1385), (195L, 1445), (180L, 1049), (108L, 334), (105L, 1183), (90L, 387), (81L, 324), (75L, 617), (72L, 379), (63L, 1331), (54L, 2546), (54L, 2545), (48L, 961), (48L, 962), (45L, 1382), (30L, 1432), (30L, 1429), (27L, 1334), (24L, 1128), (18L, 1376), (18L, 386), (18L, 1345), (18L, 1335), (9L, 1354), (9L, 1356), (9L, 1355), (9L, 1357), (9L, 1361), (9L, 1364), (9L, 1374), (9L, 1375), (9L, 1373)]

来自 Datagrip:

您可以看到两种输出之间没有任何共同点:

第一个查询[160, 161, 365, 386, 471]:

3000    397
2967    321
2352    329
2233    960
2000    392
1975    685
1896    337
1536    529
637 328
553 704
240 2545
240 2546
237 652
196 758
196 573
147 483
98  584
98  450
98  448
79  2549
79  2554
79  2552
79  2553
79  2551
79  2550
58  1376
56  428
49  451
49  759
49  449
49  760
34  2580
32  325
29  2547
29  425
25  322
13  594
12  334
9   427
6   323
3   347
3   595
3   345
1   521
1   333

[296, 306] 的第二个查询:

6600    996
5214    592
2880    329
2112    1214
1920    325
1650    398
1500    322
1380    321
1380    1385
858 1445
792 1049
720 334
600 387
540 324
480 379
462 1183
420 1331
360 2546
360 2545
330 617
306 357
300 473
300 1382
180 1334
132 1432
132 1429
120 386
120 1335
120 1376
120 1345
60  1364
60  1374
60  1356
60  1357
60  1355
60  1361
60  1354
60  1375
60  1373
48  962
48  961
24  1128

最佳答案

本质上,这是一个 SQL 逻辑问题,在 WHERE 子句中混合使用了 ANDOR。在 DataGrip 中,您需要将所有 OR 子句括在括号中,以复制包含一个 IN() 子句的 Python 版本。下面两个语句应该产生相同的结果:

DataGrip调整 SQL

SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id  
FROM subscriber_hashtag
INNER JOIN eclipse_hashtag 
  ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe 
  ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE (swipe.state= 3 OR swipe.state = 6 OR swipe.state=9) AND 
      (subscriber_hashtag.subscriber_id = 160 OR 
       subscriber_hashtag.subscriber_id = 161 OR  
       subscriber_hashtag.subscriber_id = 365 OR 
       subscriber_hashtag.subscriber_id = 386 OR 
       subscriber_hashtag.subscriber_id = 471 OR 
       subscriber_hashtag.subscriber_id = 499)
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;

Python 渲染 SQL

SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id  
FROM subscriber_hashtag
INNER JOIN eclipse_hashtag 
  ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe 
  ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE (swipe.state= 3 OR swipe.state = 6 OR swipe.state=9) AND 
      subscriber_hashtag.subscriber_id IN (160,161,365,386,471,499)
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;

关于python - 为什么来自 Datagrip 的相同 SQL 查询和嵌入在 SQL 中的 Python 代码中的输出不同?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44907599/

相关文章:

python - 属性错误: 'LGBMRegressor' object has no attribute 'feature_name_'

python - 导入错误: No module named jira

python - 用步幅组成 numpy 数组

python - 如何创建可作为python集合而不是文件使用的大数据的rdd

php - 使用数组 SQL 获取 ID

sql - 计算sqlite中的匹配值

python - 如何使用beautifulsoup提取html?

sql - 具有由联接表的不同行组成的列的 select 语句

ipython - 在一个 IPython Notebook 单元中显示多个图像?

python - 有没有更快的方法来遍历和删除 Pandas 数据框中的特定行?