我正在努力创建一个查询,该查询可以将一列中的多个值拆分为多个列以帮助“删除重复”数据集。
最好在下面的数据中进行解释,但基本上您会注意到一个间隔字段,它是 ID、START、FINISH、DURATION、COD 列的密集排列。由于多个重叠的 PSSID 和 CSSID 值,这些间隔是重复的。我想知道是否有一种好方法可以将重叠的 PSSID 和 CSSID 字段动态拆分为多列...!好吧,那我到底是什么意思...
示例数据:
ID START FINISH DURA COD INT PSSID CSSID
A1 33.18 33.27 0.09 ST 15 N13045 NULL
A1 33.18 33.27 0.09 ST 15 N13046 NULL
A1 33.27 33.285 0.015 DU 16 N13046 NULL
A1 33.27 33.285 0.015 DU 16 NULL N20015
A1 33.27 33.285 0.015 DU 16 NULL N2001516
A1 33.27 33.285 0.015 DU 16 NULL N20033
A1 33.285 33.35 0.065 BM 17 N13046 NULL
A1 33.285 33.35 0.065 BM 17 NULL N20015
A1 33.285 33.35 0.065 BM 17 NULL N2001516
A1 33.285 33.35 0.065 BM 17 NULL N20033
A1 33.35 33.395 0.045 DM 18 N13046 NULL
A1 33.35 33.395 0.045 DM 18 NULL N20015
A1 33.35 33.395 0.045 DM 18 NULL N2001516
A1 33.35 33.395 0.045 DM 18 NULL N20033
A1 33.395 33.44 0.045 DN 19 N13046 NULL
A1 33.395 33.44 0.045 DN 19 NULL N20015
A1 33.395 33.44 0.045 DN 19 NULL N2001516
A1 33.395 33.44 0.045 DN 19 NULL N20033
A1 33.44 33.485 0.045 BM 20 N13046 NULL
A1 33.44 33.485 0.045 BM 20 NULL N2001516
A1 33.44 33.485 0.045 BM 20 NULL N20033
A1 33.44 33.485 0.045 BM 20 NULL N20034
A1 33.485 33.51 0.025 DN 21 N13046 NULL
A1 33.485 33.51 0.025 DN 21 NULL N2001516
A1 33.485 33.51 0.025 DN 21 NULL N20033
A1 33.485 33.51 0.025 DN 21 NULL N20034
A1 33.51 33.595 0.085 DB 22 N13046 NULL
A1 33.51 33.595 0.085 DB 22 NULL N2001516
A1 33.51 33.595 0.085 DB 22 NULL N20034
A1 33.595 33.665 0.07 DN 23 N13046 NULL
A1 33.595 33.665 0.07 DN 23 NULL N2001516
A1 33.595 33.665 0.07 DN 23 NULL N20034
A1 33.665 33.785 0.12 DB 24 NULL N2001516
A1 33.785 33.79 0.005 YS 25 NULL NULL
A1 33.79 33.83 0.04 BM 26 NULL NULL
期望的输出:
ID START FINISH DURA COD INT PSSID1 PSSID2 CSSID1 CSSID2 CSSID3
A1 33.18 33.27 0.09 ST 15 N13046 N13045 NULL NULL NULL
A1 33.27 33.285 0.015 DU 16 N13046 NULL N20015 N2001516 N20033
A1 33.285 33.35 0.065 BM 17 N13046 NULL N20015 N2001516 N20033
A1 33.35 33.395 0.045 DM 18 N13046 NULL N20015 N2001516 N20033
A1 33.395 33.44 0.045 DN 19 N13046 NULL N20015 N2001516 N20033
A1 33.44 33.485 0.045 BM 20 N13046 NULL N20034 N2001516 N20033
A1 33.485 33.51 0.025 DN 21 N13046 NULL N20034 N2001516 N20033
A1 33.51 33.595 0.085 DB 22 N13046 NULL N20034 N2001516 NULL
A1 33.595 33.665 0.07 DN 23 N13046 NULL N20034 N2001516 NULL
A1 33.665 33.785 0.12 DB 24 NULL NULL NULL N2001516 NULL
A1 33.785 33.79 0.005 YS 25 NULL NULL NULL NULL NULL
A1 33.79 33.83 0.04 BM 26 NULL NULL NULL NULL NULL
更糟糕的是,这只是样本数据的一小部分,给定间隔可能有超过三个 PSSID、CSSID 字段(尽管上限应该为 5)。因此,查询需要是动态的才能实现这一点。
我使用的是 SQL Server 2012。下面提供了上述数据的架构:
CREATE TABLE #SampleData
([ID] varchar(2), [START] decimal(9,2), [FINISH] decimal(9,2), [DURA] decimal(9,2), [COD] varchar(2), [INT] int, [PSSID] varchar(6), [CSSID] varchar(8))
;
INSERT INTO #SampleData
([ID], [START], [FINISH], [DURA], [COD], [INT], [PSSID], [CSSID])
VALUES
('A1', 33.18, 33.27, 0.09, 'ST', 15, 'N13045', NULL),
('A1', 33.18, 33.27, 0.09, 'ST', 15, 'N13046', NULL),
('A1', 33.27, 33.285, 0.015, 'DU', 16, 'N13046', NULL),
('A1', 33.27, 33.285, 0.015, 'DU', 16, NULL, 'N20015'),
('A1', 33.27, 33.285, 0.015, 'DU', 16, NULL, 'N2001516'),
('A1', 33.27, 33.285, 0.015, 'DU', 16, NULL, 'N20033'),
('A1', 33.285, 33.35, 0.065, 'BM', 17, 'N13046', NULL),
('A1', 33.285, 33.35, 0.065, 'BM', 17, NULL, 'N20015'),
('A1', 33.285, 33.35, 0.065, 'BM', 17, NULL, 'N2001516'),
('A1', 33.285, 33.35, 0.065, 'BM', 17, NULL, 'N20033'),
('A1', 33.35, 33.395, 0.045, 'DM', 18, 'N13046', NULL),
('A1', 33.35, 33.395, 0.045, 'DM', 18, NULL, 'N20015'),
('A1', 33.35, 33.395, 0.045, 'DM', 18, NULL, 'N2001516'),
('A1', 33.35, 33.395, 0.045, 'DM', 18, NULL, 'N20033'),
('A1', 33.395, 33.44, 0.045, 'DN', 19, 'N13046', NULL),
('A1', 33.395, 33.44, 0.045, 'DN', 19, NULL, 'N20015'),
('A1', 33.395, 33.44, 0.045, 'DN', 19, NULL, 'N2001516'),
('A1', 33.395, 33.44, 0.045, 'DN', 19, NULL, 'N20033'),
('A1', 33.44, 33.485, 0.045, 'BM', 20, 'N13046', NULL),
('A1', 33.44, 33.485, 0.045, 'BM', 20, NULL, 'N2001516'),
('A1', 33.44, 33.485, 0.045, 'BM', 20, NULL, 'N20033'),
('A1', 33.44, 33.485, 0.045, 'BM', 20, NULL, 'N20034'),
('A1', 33.485, 33.51, 0.025, 'DN', 21, 'N13046', NULL),
('A1', 33.485, 33.51, 0.025, 'DN', 21, NULL, 'N2001516'),
('A1', 33.485, 33.51, 0.025, 'DN', 21, NULL, 'N20033'),
('A1', 33.485, 33.51, 0.025, 'DN', 21, NULL, 'N20034'),
('A1', 33.51, 33.595, 0.085, 'DB', 22, 'N13046', NULL),
('A1', 33.51, 33.595, 0.085, 'DB', 22, NULL, 'N2001516'),
('A1', 33.51, 33.595, 0.085, 'DB', 22, NULL, 'N20034'),
('A1', 33.595, 33.665, 0.07, 'DN', 23, 'N13046', NULL),
('A1', 33.595, 33.665, 0.07, 'DN', 23, NULL, 'N2001516'),
('A1', 33.595, 33.665, 0.07, 'DN', 23, NULL, 'N20034'),
('A1', 33.665, 33.785, 0.12, 'DB', 24, NULL, 'N2001516'),
('A1', 33.785, 33.79, 0.005, 'YS', 25, NULL, NULL),
('A1', 33.79, 33.83, 0.04, 'BM', 26, NULL, NULL)
;
感谢您的帮助!
最佳答案
您已经定义了创建 INT
列的组。我们可以使用它,分别为 PSS
和 CSS
创建 pivot
,然后加入它们。
SELECT *
INTO #DataSourcePSS
FROM
(
SELECT [INT]
,[PSSID]
,CONCAT('PSSID',ROW_NUMBER() OVER (PARTITION BY [INT] ORDER BY [PSSID] DESC)) AS [RowID]
FROM #SampleData
) DS
PIVOT
(
MAX([PSSID]) FOR RowID IN ([PSSID1], [PSSID2], [PSSID3], [PSSID4], [PSSID5])
) PVT
SELECT *
INTO #DataSourceCSS
FROM
(
SELECT [INT]
,[CSSID]
,CONCAT('CSSID', ROW_NUMBER() OVER (PARTITION BY [INT] ORDER BY [CSSID] DESC)) AS [RowID]
FROM #SampleData
) DS
PIVOT
(
MAX([CSSID]) FOR RowID IN ([CSSID1], [CSSID2], [CSSID3], [CSSID4], [CSSID5])
) PVT;
WITH DataSourceSD AS
(
SELECT DISTINCT [ID], [START], [FINISH], [DURA], [COD], [INT]
FROM #SampleData
)
SELECT SD.*
,PSS.[PSSID1],PSS.[PSSID2],PSS.[PSSID3],PSS.[PSSID4],PSS.[PSSID5]
,CSS.[CSSID1],CSS.[CSSID2],CSS.[CSSID3],CSS.[CSSID4],CSS.[CSSID5]
FROM DataSourceSD SD
INNER JOIN #DataSourcePSS PSS
ON SD.[INT] = PSS.[INT]
INNER JOIN #DataSourceCSS CSS
ON SD.[INT] = CSS.[INT]
ORDER BY SD.[INT];
DROP TABLE #DataSourceCSS;
DROP TABLE #DataSourcePSS;
DROP TABLE #SampleData;
由于每个组中最多可以有五个值,因此我以五个值为中心。在这种情况下,您可以拥有没有任何值的列。如果这不是OK
,您需要改用动态 PIVOT。
关于sql - 将两列中的动态值拆分为多列 - 删除重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38344190/