我需要将一个叙述字段(自由文本)拆分成多行。目前的格式如下:
Case_Reference | Narrative
```````````````|`````````````````````````````````````
XXXX/XX-123456 | [Endless_Text up to ~50k characters]
在作为文本的叙述字段中,各个条目(当不同的代理人对案例做了一些事情时)以条目日期开头,后跟两个空格(即 'dd/mm/yyyy '
),日期值随同一字段中的每个条目而变化。
换句话说,在寻找更好的分隔符之后,我唯一可以使用的是这种格式的字符串,所以我需要在 Narrative 文本中识别格式(mask 是更好的词吗?)匹配的多个位置'dd/mm/yyyy'
.
我可以毫无问题地识别多次出现的一致字符串,但它是在我主要寻找的地方识别它:
'%[0-9][0-9]/[0-9][0-9]/[0-9][0-9][0-9][0-9] %'
PATINDEX
当然会返回它的第一次出现/位置,但据我所知,没有办法“修改”它(即创建的函数)以允许拾取我们可以使用 CHARINDEX
(因为 PATINDEX
没有起始位置参数)。
为清楚起见,我不是在寻找代码来直接分隔它,因为我需要进一步操作每个条目,所以它纯粹是我正在寻找的 Narrative 文本中字符串多次出现的位置。
非常感谢任何帮助。
为清楚起见,没有进行此预导入的选项,因此需要对已登陆的数据进行此操作。
期望的输出是
Case_Reference1 | 1st_Position_of_Delimiter_String
Case_Reference1 | 2nd_Position_of_Delimiter_String
Case_Reference2 | 1st_Position_of_Delimiter_String
Case_Reference2 | 2nd_Position_of_Delimiter_String
Case_Reference2 | 3rd_Position_of_Delimiter_String
最佳答案
您可以使用递归 CTE 解决此问题
DECLARE @tbl TABLE (Case_Reference NVARCHAR(MAX),Narrative NVARCHAR(MAX));
INSERT INTO @tbl VALUES
(N'C1',N'01/02/2000 Some text with blanks 02/03/2000 More text 03/04/2000 An even more')
,(N'C2',N'01/02/2000 Test for C2 02/03/2000 One more for C2 03/04/2000 An even more 04/05/2000 Blah')
,(N'C3',N'01/02/2000 Test for C3 02/03/2000 One more for C3 03/04/2000 An even more')
;
WITH recCTE AS
(
SELECT 1 AS Step,Case_Reference,Narrative,CAST(1 AS BIGINT) AS StartsAt,NewPos.EndsAt+10 AS EndsAt,LEN(Narrative) AS MaxLen
,SUBSTRING(Narrative,NewPos.EndsAt+10+1,999999) AS RestString
FROM @tbl AS tbl
CROSS APPLY(SELECT PATINDEX('%[0-3][0-9]/[0-1][0-9]/[1-2][0-9][0-9][0-9] %',SUBSTRING(Narrative,12,9999999))) AS NewPos(EndsAt)
UNION ALL
SELECT r.Step+1,r.Case_Reference,r.Narrative,r.EndsAt+1,CASE WHEN NewPos.EndsAt>0 THEN r.EndsAt+NewPos.EndsAt+10 ELSE r.MaxLen END,r.MaxLen
,SUBSTRING(r.RestString,NewPos.EndsAt+10+1,999999)
FROM recCTE AS r
CROSS APPLY(SELECT PATINDEX('%[0-3][0-9]/[0-1][0-9]/[1-2][0-9][0-9][0-9] %',SUBSTRING(r.RestString,12,99999999))) AS NewPos(EndsAt)
WHERE r.EndsAt<r.MaxLen
)
SELECT Step,Case_Reference,StartsAt,EndsAt
,SUBSTRING(Narrative,StartsAt,EndsAt-StartsAt+1) AS OutputString
FROM recCTE
ORDER BY Case_Reference,Step
结果
+------+----------------+----------+--------+---------------------------------------+
| Step | Case_Reference | StartsAt | EndsAt | OutputString |
+------+----------------+----------+--------+---------------------------------------+
| 1 | C1 | 1 | 38 | 01/02/2000 Some text with blanks |
+------+----------------+----------+--------+---------------------------------------+
| 2 | C1 | 39 | 60 | 02/03/2000 More text |
+------+----------------+----------+--------+---------------------------------------+
| 3 | C1 | 61 | 84 | 03/04/2000 An even more |
+------+----------------+----------+--------+---------------------------------------+
| 1 | C2 | 1 | 24 | 01/02/2000 Test for C2 |
+------+----------------+----------+--------+---------------------------------------+
| 2 | C2 | 25 | 52 | 02/03/2000 One more for C2 |
+------+----------------+----------+--------+---------------------------------------+
| 3 | C2 | 53 | 77 | 03/04/2000 An even more |
+------+----------------+----------+--------+---------------------------------------+
| 4 | C2 | 78 | 93 | 04/05/2000 Blah |
+------+----------------+----------+--------+---------------------------------------+
| 1 | C3 | 1 | 24 | 01/02/2000 Test for C3 |
+------+----------------+----------+--------+---------------------------------------+
| 2 | C3 | 25 | 52 | 02/03/2000 One more for C3 |
+------+----------------+----------+--------+---------------------------------------+
| 3 | C3 | 53 | 76 | 03/04/2000 An even more |
+------+----------------+----------+--------+---------------------------------------+
关于sql - (SQL) 识别字符串格式在字段中多次出现的位置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40738970/