sql - 在SSIS中解析非结构化文本文件并读取每一行以获取所需的数据

标签 sql ssis

我正在开发 SSIS,并且有复杂的非结构化文本文件,我必须通过创建 SSIS 包来解析文本文件并在数据库中获取所需列的数据。解析文本文件的最佳方法是什么以及如何进行我可以编写脚本来读取该文本文件中的每一行吗?我还很困惑是否可以在不编写脚本的情况下读取 TEXT 文件的每一行?

文本文件数据中的必需列是 DEVICEID、DATAVALUE 和 DATAUNITS:

这是文本文件:

    12/02/2015 09:47:44:745 SecureHARTPort version: 1.1.12.0.

    12/02/2015 09:47:44:745 Connecting and initialing Session to 
    67.40.65.181 Port:5094 Tcp
    12/02/2015 09:47:44:745 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 0 
    Status: 0x00
   TranId: 1, Data ByteCount: 5
   Data: 01 00 09 27 C0 

    12/02/2015 09:47:44:761 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 0 
   Status: 0x00
  TranId: 1, Data ByteCount: 5
  Data: 01 00 09 27 C0 
  12/02/2015 09:47:44:855 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 3 
  Status: 0x00
 TranId: 2, Data ByteCount: 5
 Data: 02 80 00 00 82 

 12/02/2015 09:47:44:855 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 3 
 Status: 0x00
 TranId: 2, Data ByteCount: 29
 Data: 06 80 00 18 00 50 FE 26 4E 05 07 05 02 0E 0C 0B 6A 64 05 04 00 01 50 
 00 26 00 26 84 8E 

 Rx Cmd=0, Rsp code=0x00, Device Status=0x50
 Expansion Code=254
 Expanded Device Type=9806
 # Request Preambles=5
 Universal Comand Revision Level=7
 Transmitter HART Revision Level=5
 Software Revision=2
 Hardware Revision Level / Physical Signaling Code=14
 Flags=0C
 Device ID=748132
 Minimum # Response Preambles=5
 Max # of device variables=4
 Configuration Change Counter=1
 Extended Field Device Status=50
 Manufacturer's ID=38
 Private Label Distributor=38
 Device Profile=132

 12/02/2015 09:47:44:855 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 3 
 Status: 0x00
 TranId: 3, Data ByteCount: 9
  Data: 82 A6 4E 0B 6A 64 14 00 7B 

  12/02/2015 09:47:44:870 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 3 
  Status: 0x00
  TranId: 3, Data ByteCount: 43
  Data: 86 A6 4E 0B 6A 64 14 22 00 50 77 69 68 61 72 74 67 77 00 00 00 00 00 
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0C 

 Rx Cmd=20, Rsp code=0x00, Device Status=0x50
 Long Tag=wihartgw

  12/02/2015 09:47:44:870 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 3 
 Status: 0x00
 TranId: 4, Data ByteCount: 9
 Data: 82 A6 4E 0B 6A 64 4A 00 25 

 12/02/2015 09:47:44:886 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 3 
 Status: 0x00
 TranId: 4, Data ByteCount: 19
  Data: 86 A6 4E 0B 6A 64 4A 0A 00 50 01 01 65 00 05 02 01 03 1B 

  Rx Cmd=74, Rsp code=0x00, Device Status=0x50
 Max Num IO Cards=1
 Max Num Channels per IO Card=1
 Max Num Sub-Devices per Channel=101
  Num Devices Detected=5
  Max Num DR Supported=2
  Master Mode for Comm=1
   Retry Count for Sub-Device=3

   Rx Cmd=9, Rsp code=0x00, Device Status=0x50
   Extended Device Status=0
   Slot0 Var Code=246
   Slot0 Var Classification=0
   Slot0 Var Units=251
   Slot0 Var Value=4
   Slot0 Var Status=C0
   Slot1 Var Code=116
  Slot1 Var Classification=209
  Slot1 Var Units=70
 Slot1 Var Value=0

最佳答案

不知道这是否对您有帮助,但是使用如下所示的 T-SQL 脚本,您可以首先逐行读取文本,然后使用适当的过滤器:

DECLARE @YourText NVARCHAR(MAX)=
N'    12/02/2015 09:47:44:745 SecureHARTPort version: 1.1.12.0.

    12/02/2015 09:47:44:745 Connecting and initialing Session to 
    67.40.65.181 Port:5094 Tcp
    12/02/2015 09:47:44:745 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 0 
    Status: 0x00
   TranId: 1, Data ByteCount: 5
   Data: 01 00 09 27 C0 

    12/02/2015 09:47:44:761 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 0 
   Status: 0x00
  TranId: 1, Data ByteCount: 5
  Data: 01 00 09 27 C0 
  12/02/2015 09:47:44:855 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 3 
  Status: 0x00
 TranId: 2, Data ByteCount: 5
 Data: 02 80 00 00 82 

 12/02/2015 09:47:44:855 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 3 
 Status: 0x00
 TranId: 2, Data ByteCount: 29
 Data: 06 80 00 18 00 50 FE 26 4E 05 07 05 02 0E 0C 0B 6A 64 05 04 00 01 50 
 00 26 00 26 84 8E 

 Rx Cmd=0, Rsp code=0x00, Device Status=0x50
 Expansion Code=254
 Expanded Device Type=9806
 # Request Preambles=5
 Universal Comand Revision Level=7
 Transmitter HART Revision Level=5
 Software Revision=2
 Hardware Revision Level / Physical Signaling Code=14
 Flags=0C
 Device ID=748132
 Minimum # Response Preambles=5
 Max # of device variables=4
 Configuration Change Counter=1
 Extended Field Device Status=50
 Manufacturer''s ID=38
 Private Label Distributor=38
 Device Profile=132

 12/02/2015 09:47:44:855 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 3 
 Status: 0x00
 TranId: 3, Data ByteCount: 9
  Data: 82 A6 4E 0B 6A 64 14 00 7B 

  12/02/2015 09:47:44:870 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 3 
  Status: 0x00
  TranId: 3, Data ByteCount: 43
  Data: 86 A6 4E 0B 6A 64 14 22 00 50 77 69 68 61 72 74 67 77 00 00 00 00 00 
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0C 

 Rx Cmd=20, Rsp code=0x00, Device Status=0x50
 Long Tag=wihartgw

  12/02/2015 09:47:44:870 Tx: Message Header: Ver: 1, MsgType: 0, MsgId: 3 
 Status: 0x00
 TranId: 4, Data ByteCount: 9
 Data: 82 A6 4E 0B 6A 64 4A 00 25 

 12/02/2015 09:47:44:886 Rx: Message Header: Ver: 1, MsgType: 1, MsgId: 3 
 Status: 0x00
 TranId: 4, Data ByteCount: 19
  Data: 86 A6 4E 0B 6A 64 4A 0A 00 50 01 01 65 00 05 02 01 03 1B 

  Rx Cmd=74, Rsp code=0x00, Device Status=0x50
 Max Num IO Cards=1
 Max Num Channels per IO Card=1
 Max Num Sub-Devices per Channel=101
  Num Devices Detected=5
  Max Num DR Supported=2
  Master Mode for Comm=1
   Retry Count for Sub-Device=3

   Rx Cmd=9, Rsp code=0x00, Device Status=0x50
   Extended Device Status=0
   Slot0 Var Code=246
   Slot0 Var Classification=0
   Slot0 Var Units=251
   Slot0 Var Value=4
   Slot0 Var Status=C0
   Slot1 Var Code=116
  Slot1 Var Classification=209
  Slot1 Var Units=70
 Slot1 Var Value=0';

--查询将在 CHAR(13) and/or CHAR(10) 的任意组合处剪切行。 :

 WITH LineByLine AS
 (
    SELECT  ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS LineNr
           ,LTRIM(RTRIM(x.value(N'(text())[1]',N'nvarchar(max)'))) AS Line
    FROM
    (
    SELECT CAST(N'<x>' + REPLACE((SELECT REPLACE(REPLACE(REPLACE(@YourText,NCHAR(10),NCHAR(13)),NCHAR(13)+NCHAR(13),NCHAR(13)),NCHAR(13),N'\nl') AS [*] FOR XML PATH('')),N'\nl',N'</x><x>')  + N'</x>'AS XML) AS Casted
    ) AS t
    CROSS APPLY Casted.nodes(N'/x[text()]') AS A(x)
 )
 SELECT LineNr,Line
 FROM LineByLine
 WHERE CHARINDEX('Device ID=',Line)>0
    OR CHARINDEX('Data:',Line)>0
    OR CHARINDEX('unit',Line)>0;

结果将是:

Nr  Line
7   Data: 01 00 09 27 C0
11  Data: 01 00 09 27 C0
15  Data: 02 80 00 00 82
19  Data: 06 80 00 18 00 50 FE 26 4E 05 07 05 02 0E 0C 0B 6A 64 05 04 00 01 50
30  Device ID=748132
41  Data: 82 A6 4E 0B 6A 64 14 00 7B
45  Data: 86 A6 4E 0B 6A 64 14 22 00 50 77 69 68 61 72 74 67 77 00 00 00 00 00
52  Data: 82 A6 4E 0B 6A 64 4A 00 25
56  Data: 86 A6 4E 0B 6A 64 4A 0A 00 50 01 01 65 00 05 02 01 03 1B
69  Slot0 Var Units=251
74  Slot1 Var Units=70

您没有说明您的预期输出,也没有说明文本中的列名称,所以这是猜测...希望它有所帮助...

关于sql - 在SSIS中解析非结构化文本文件并读取每一行以获取所需的数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44506585/

相关文章:

php - 如何使用 javascript 从数据库中获取当前的伊斯兰日期记录

java - SQL group by 替换不同的值

SSIS/SSMS 与 SAP 报告/交易的连接

sql-server - 检查数据库是否存在并且当前登录可以访问

mysql - 如何比较多个表的查询结果?

sql - 报告一组记录的分组平均值

php - 为 WHERE 条件选择相等数量的记录

sql-server - 如何根据查询返回的记录数控制 SSIS 包流?

ssis - 如何根据任务状态改变控制任务流程?

ssis - 有没有办法在 BIML Express 中使用 BIML 来重命名项目