SQL 查询 - 长时间运行/占用 CPU 资源

标签 sql sql-server database database-performance sqlperformance

你好,我有下面的 SQL 查询,平均需要 40 分钟才能运行,它引用的其中一个表中有超过 700 万条记录。

我已经通过数据库调优顾问运行它并应用了所有建议,我还在 sql 中的事件监视器中对其进行了评估,并且没有推荐进一步的索引等。

任何建议都很好,提前致谢

WITH CTE AS 
(
    SELECT r.Id AS ResultId,
    r.JobId,
    r.CandidateId,
    r.Email,
    CAST(0 AS BIT) AS EmailSent,
    NULL AS EmailSentDate,
    'PICKUP' AS EmailStatus,
    GETDATE() AS CreateDate,
    C.Id AS UserId,
    C.Email AS UserEmail,
    NULL AS Subject
    FROM Result R
    INNER JOIN Job J ON R.JobId = J.Id
    INNER JOIN User C ON J.UserId = C.Id
    WHERE 
    ISNULL(J.Approved, CAST(0 AS BIT)) = CAST(1 AS BIT)
    AND ISNULL(J.Closed, CAST(0 AS BIT)) = CAST(0 AS BIT)
    AND ISNULL(R.Email,'') <> '' -- has an email address
    AND ISNULL(R.EmailSent, CAST(0 AS BIT)) = CAST(0 AS BIT) -- email has not been sent
    AND R.EmailSentDate IS NULL -- email has not been sent
    AND ISNULL(R.EmailStatus,'') = '' -- email has not been sent
    AND ISNULL(R.IsEmailSubscribe, 'True') <> 'False' -- not unsubscribed
    -- not already been emailed for this job
    AND NOT EXISTS (
        SELECT SMTP.Email
        FROM SMTP_Production SMTP
        WHERE SMTP.JobId = R.JobId AND SMTP.CandidateId = R.CandidateId
    )
    -- not unsubscribed
    AND NOT EXISTS (

        SELECT u.Id FROM Unsubscribe u
        WHERE  ISNULL(u.EmailAddress, '') = ISNULL(R.Email, '')

    )
    AND NOT EXISTS (
        SELECT SMTP.Id FROM SMTP_Production SMTP
        WHERE SMTP.EmailStatus = 'PICKUP' AND SMTP.CandidateId = R.CandidateId
    )   
    AND C.Id NOT IN (
        -- list of ids
    )
    AND J.Id NOT IN (
        -- list of ids
    )
    AND J.ClientId NOT IN 
    (
        -- list of ids
    )
)
INSERT INTO smtp_production (ResultId, JobId, CandidateId, Email, EmailSent, EmailSentDate, EmailStatus, CreateDate, ConsultantId, ConsultantEmail, Subject)
OUTPUT INSERTED.ResultId,GETDATE() INTO ResultstoUpdate
SELECT 
    CTE.ResultId,
    CTE.JobId,
    CTE.CandidateId,
    CTE.Email,
    CTE.EmailSent,
    CTE.EmailSentDate,
    CTE.EmailStatus,
    CTE.CreateDate,
    CTE.UserId,
    CTE.UserEmail,
    NULL
FROM CTE
  INNER JOIN 
    (
        SELECT *, row_number() over(partition by CTE.Email, CTE.CandidateId order by CTE.EmailSentDate desc) as rn
        FROM CTE

    ) DCTE ON CTE.ResultId = DCTE.ResultId AND DCTE.rn = 1

请在下面查看我更新的查询:

WITH CTE AS 
(
    SELECT R.Id AS ResultId,
    r.JobId,
    r.CandidateId,
    R.Email,
    CAST(0 AS BIT) AS EmailSent,
    NULL AS EmailSentDate,
    'PICKUP' AS EmailStatus,
    GETDATE() AS CreateDate,
    C.Id AS UserId,
    C.Email AS UserEmail,
    NULL AS Subject
    FROM RESULTS R
    INNER JOIN JOB J ON R.JobId = J.Id
    INNER JOIN Consultant C ON J.UserId = C.Id
    WHERE 
    J.DCApproved = 1
    AND (J.Closed = 0 OR J.Closed IS NULL)
    AND (R.Email <> '' OR R.Email IS NOT NULL)
    AND (R.EmailSent = 0 OR R.EmailSent IS NULL)
    AND R.EmailSentDate IS NULL -- email has not been sent
    AND (R.EmailStatus = '' OR R.EmailStatus IS NULL)
    AND (R.IsEmailSubscribe = 'True' OR R.IsEmailSubscribe IS NULL)
    -- not already been emailed for this job
    AND NOT EXISTS (
        SELECT SMTP.Email
        FROM SMTP_Production SMTP
        WHERE SMTP.JobId = R.JobId AND SMTP.CandidateId = R.CandidateId
    )
    -- not unsubscribed
    AND NOT EXISTS (

        SELECT u.Id FROM Unsubscribe u
        WHERE (u.EmailAddress = R.Email OR (u.EmailAddress IS NULL AND R.Email IS NULL))

    )
    AND NOT EXISTS (
        SELECT SMTP.Id FROM SMTP_Production SMTP
        WHERE SMTP.EmailStatus = 'PICKUP' AND SMTP.CandidateId = R.CandidateId
    )   
    AND C.Id NOT IN (
        -- LIST OF IDS
    )
    AND J.Id NOT IN (
        -- LIST OF IDS
    )
    AND J.ClientId NOT IN 
    (
        -- LIST OF IDS
    )
)

INSERT INTO smtp_production (ResultId, JobId, CandidateId, Email, EmailSent, EmailSentDate, EmailStatus, CreateDate, UserId, UserEmail, Subject)
OUTPUT INSERTED.ResultId,GETDATE() INTO ResultstoUpdate
SELECT 
    CTE.ResultId,
    CTE.JobId,
    CTE.CandidateId,
    CTE.Email,
    CTE.EmailSent,
    CTE.EmailSentDate,
    CTE.EmailStatus,
    CTE.CreateDate,
    CTE.UserId,
    CTE.UserEmail,
    NULL
FROM CTE
  INNER JOIN 
    (
        SELECT *, row_number() over(partition by CTE.Email, CTE.CandidateId order by CTE.EmailSentDate desc) as rn
        FROM CTE

    ) DCTE ON CTE.ResultId = DCTE.ResultId AND DCTE.rn = 1


GO

最佳答案

WHEREJOIN 子句中使用 ISNULL 可能是这里的主要原因。在查询中对列使用函数会导致查询变得不可 SARGable(这意味着它不能使用表上的任何索引,因此它会扫描整个事物)。 注意;使用针对变量的函数,WHERE 通常 没问题。例如 WHERE SomeColumn = DATEADD(DAY, @n, @SomeDate)。诸如 WHERE SomeColumn = ISNULL(@Variable,0) 之类的东西具有“包罗万象的查询”的味道,因此可能会影响性能;取决于你的设置。不过,这不是当前的讨论。

对于像 ISNULL(J.Closed, CAST(0 AS BIT)) = CAST(0 AS BIT) 这样的子句,这让查询优化器非常头疼,你的查询充满了它们.您需要将它们替换为如下子句:

WHERE (J.Closed = 0 OR J.Closed IS NULL)

虽然没有什么区别,但也没有必要在那里CAST 0。 SQL Server 可以看到您正在与 bit 进行比较,因此也会将 0 解释为 1。

您还有一个带有 WHERE 子句的 EXISTS ISNULL(u.EmailAddress, '') = ISNULL(R.Email, '')。这需要变成:

WHERE (u.EmailAddress = R.Email
  OR   (u.EmailAddress IS NULL AND R.Email IS NULL))

您需要更改 WHERE 子句(CTE 和子查询)中的所有 ISNULL 用法,您应该会看到体面的性能提升。

关于SQL 查询 - 长时间运行/占用 CPU 资源,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52850961/

相关文章:

sql - 找出列是否为 NOT NULL

MySQL INSERT INTO 使用带有 TIMESTAMP 的 SELECT

SQL Server 2008R2 SSRS报告无法连接到数据源: Cannot create a connection to data source

java - neo4j-jdbc 驱动程序最小可行片段,不确定如何解决依赖关系

c++ - 通过c++ xll的sql连接

java - 在数据库中存储一个大字符串?

sql-server - 如何在 SQL Server 中进行批量更新插入?

javascript - 如何将 javascript 对象的值输入到 sql 数据库中?

java - 如何优化从平面文件到 BD PostgreSQL 的数据导入?

sql - 查找日期列的最小值和最大值