SQL Server,将具有多行html代码的文本解析为表格

标签 sql sql-server sql-server-2008 loops parsing

我有一个代码的框架,我开始为以下目的编写代码:

use EAS
go
  declare @entityID,int
  declare @text,text
  declare @col1,varchar --col1 is between <b>col1</b>
  declare @col2,varchar   --col2 are between >col2</td>

  @entityID=entity_ID
  @text = maAttributes_Text.value
  @col1 = select SUBSTRING(@text,CHARINDEX('<b>',@text)+len('<b>'),
  charindex('</b>',@text))
  @col2 = select substring(@text, CHARINDEX('>',@text)+LEN('>'),charindex('</td>',@text))

  begin
  add to `table2` [entity_id], [caption], [value]
  @entityID, @col1, @col2
  end

我需要访问另一个表,table1,其中包含字段 entity_idcaptionvalue,以及其他一些不重要的。

entity_id | caption      | value
----------|--------------|-------
12345     |Specifications| *html snippet*

我的代码应该做的是,在 table1 中找到在字段 caption 中具有 'Specifications' 值的行,然后提取 entity_id 值,存储它,然后运行一个循环,该循环获取 html 表的所有行并将信息添加到另一个 sql 表中。

例如,运行table1的一行代码后,table2将如下所示:

table2
entity_ID|Caption|Value
---------|-------|-----
12345    |length |10
12345    |width  |10
12345    |height |10
et al.

table2 中的所有信息均来自存储在 table1value 字段中的 html 片段。

我的直觉告诉我应该同时使用循环(我这辈子都不能在 sql 中创建循环)和 STRING_SPLIT 函数。但是,我不确定如何实现它们。

完成table1的第一行后,会继续下一行,下一行,行数WHERE caption in ('specifications')

编辑:我知道我的措辞很困惑。我会尝试重新解释。

我有 table1,其中包含一个与 html 片段相关联的 ID 号。 html 片段是包含许多行和两列的表格的代码。我想要的函数将获取 html 片段并逐行解析,并将其添加到 table2,其中来自 table1 的一个 ID 号将附加到 html 的几行 table 。因此,当提取 HTML 片段的前几行时,table2 将显示为:

table2
entity_ID|Caption|Value
---------|-------|-----
12345    |length |10
12345    |width  |10
12345    |height |10  

其中 entity_ID 是唯一的 ID 号,它与 HTML 片段中的几行相关联。 Caption 取自 html 片段的 col1Value 取自 html 片段的 col2

编辑 2:
我的情况的一个简单示例:

表1

ID|Caption       |Value  
--|--------------|-----  
1 |Specifications|*html snippet*  
2 |Specifications|*html snippet2*  
etc.

代码将从 table1 中获取信息并将其放入“table2”中,如下所示:

表2

ID|SpecName|Spec    
--|--------|----  
1 |Length  |10  
1 |Width   |10  
1 |Height  |10  

SpecName 和 Spec 是 html 代码段创建的 html 表中的字段。长度、宽度、高度是代码段生成的信息。 10,是每个 SpecNames 的值。

<table  class="inline_sd_table"> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Accessories Optional</b></td>      <td class="inline_sd_even_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Accommodates Ceiling Height</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>ADA Compliance</b></td>      <td class="inline_sd_even_cell">See Additional Information above</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Blower Air Mover Type</b></td>      <td class="inline_sd_odd_cell">Fan</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Blower Included</b></td>      <td class="inline_sd_even_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Boost Mode</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>CFM (Airflow)</b></td>      <td class="inline_sd_even_cell">190</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Color/Finish</b></td>      <td class="inline_sd_odd_cell">Almond</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Control Feature Filter Reminder</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Damper Included</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Delay Shut Off</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Digital Clock</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Dishwasher Safe Filters</b></td>      <td class="inline_sd_even_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Duct Connector Size</b></td>      <td class="inline_sd_odd_cell">3-1/4" x 10", 7" Round Non-Ducted</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>ENERGY STAR&reg; qualified</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Filter Included</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Filter Type</b></td>      <td class="inline_sd_even_cell">Aluminum</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Flue Accommodates ceilings up to xx ft</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Heat Sentry - Temperature Sensor</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>HVI Certified</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Light Bulb Quantity</b></td>      <td class="inline_sd_even_cell">1</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Light Bulbs Included</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Lighting Function</b></td>      <td class="inline_sd_even_cell">One-level</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Lighting Type</b></td>      <td class="inline_sd_odd_cell">Incandescent</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Max Sones/100 CFM</b></td>      <td class="inline_sd_even_cell">4.1</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Minimum Height Above Cook Top</b></td>      <td class="inline_sd_odd_cell">18"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Mounting Type</b></td>      <td class="inline_sd_even_cell">Under Cabinet</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Nightlight</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Non-Stick Coated</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Product Depth (inches)</b></td>      <td class="inline_sd_odd_cell">17-1/2"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Product Height (inches)</b></td>      <td class="inline_sd_even_cell">6"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Product Width (inches)</b></td>      <td class="inline_sd_odd_cell">42"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Rated Amps</b></td>      <td class="inline_sd_even_cell">2.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Remote Control</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Sones Certified Horizontal Rectangular High</b></td>      <td class="inline_sd_even_cell">6.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Sones Certified Horizontal Rectangular Low</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Sones Certified Vertical Rectangular High</b></td>      <td class="inline_sd_even_cell">6.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Sones Certified Vertical Rectangular Low</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Sones Certified Vertical Round High</b></td>      <td class="inline_sd_even_cell">6.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Sones Certified Vertical Round Low</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Switch Control Settings</b></td>      <td class="inline_sd_even_cell">2</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Switch Style</b></td>      <td class="inline_sd_odd_cell">Rocker</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Timer</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>UL Listed</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Voltage</b></td>      <td class="inline_sd_even_cell">120</td> </tr> </table> 

最佳答案

您的示例有点难以理解,但这有帮助吗?

我修改了一个 Parse/Split 函数来接受两个不同的参数。在这种情况下 '>''<' .

我应该补充一点,TVF 的性能非常好。结果在 0.068 秒内返回。

示例

Declare @S varchar(max) = '<table  class="inline_sd_table"> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Accessories Optional</b></td>      <td class="inline_sd_even_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Accommodates Ceiling Height</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>ADA Compliance</b></td>      <td class="inline_sd_even_cell">See Additional Information above</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Blower Air Mover Type</b></td>      <td class="inline_sd_odd_cell">Fan</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Blower Included</b></td>      <td class="inline_sd_even_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Boost Mode</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>CFM (Airflow)</b></td>      <td class="inline_sd_even_cell">190</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Color/Finish</b></td>      <td class="inline_sd_odd_cell">Almond</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Control Feature Filter Reminder</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Damper Included</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Delay Shut Off</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Digital Clock</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Dishwasher Safe Filters</b></td>      <td class="inline_sd_even_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Duct Connector Size</b></td>      <td class="inline_sd_odd_cell">3-1/4" x 10", 7" Round Non-Ducted</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>ENERGY STAR&reg; qualified</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Filter Included</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Filter Type</b></td>      <td class="inline_sd_even_cell">Aluminum</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Flue Accommodates ceilings up to xx ft</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Heat Sentry - Temperature Sensor</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>HVI Certified</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Light Bulb Quantity</b></td>      <td class="inline_sd_even_cell">1</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Light Bulbs Included</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Lighting Function</b></td>      <td class="inline_sd_even_cell">One-level</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Lighting Type</b></td>      <td class="inline_sd_odd_cell">Incandescent</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Max Sones/100 CFM</b></td>      <td class="inline_sd_even_cell">4.1</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Minimum Height Above Cook Top</b></td>      <td class="inline_sd_odd_cell">18"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Mounting Type</b></td>      <td class="inline_sd_even_cell">Under Cabinet</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Nightlight</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Non-Stick Coated</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Product Depth (inches)</b></td>      <td class="inline_sd_odd_cell">17-1/2"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Product Height (inches)</b></td>      <td class="inline_sd_even_cell">6"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Product Width (inches)</b></td>      <td class="inline_sd_odd_cell">42"</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Rated Amps</b></td>      <td class="inline_sd_even_cell">2.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Remote Control</b></td>      <td class="inline_sd_odd_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Sones Certified Horizontal Rectangular High</b></td>      <td class="inline_sd_even_cell">6.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Sones Certified Horizontal Rectangular Low</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Sones Certified Vertical Rectangular High</b></td>      <td class="inline_sd_even_cell">6.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Sones Certified Vertical Rectangular Low</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Sones Certified Vertical Round High</b></td>      <td class="inline_sd_even_cell">6.5</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Sones Certified Vertical Round Low</b></td>      <td class="inline_sd_odd_cell">N/A</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Switch Control Settings</b></td>      <td class="inline_sd_even_cell">2</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>Switch Style</b></td>      <td class="inline_sd_odd_cell">Rocker</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Timer</b></td>      <td class="inline_sd_even_cell">No</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_odd_cell"><b>UL Listed</b></td>      <td class="inline_sd_odd_cell">Yes</td> </tr> <tr class="inline_sd_cell_row">      <td class="inline_sd_even_cell"><b>Voltage</b></td>      <td class="inline_sd_even_cell">120</td> </tr> </table> '

Select Item  = max(case when RetCol=1 then RetVal end)
      ,Value = max(case when RetCol=0 then RetVal end)
 From (
        Select RetGrp=(Row_Number() over (Order By RetSeq)-1)/2
              ,RetCol=Row_Number() over (Order By RetSeq) % 2 
              ,RetPos
              ,RetVal
         From  [dbo].[udf-Str-Extract](@S,'>','<')
         Where RetVal>''
      ) A
 Group By RetGrp
 Order By RetGrp

返回

enter image description here

UDF(如果有兴趣)

CREATE FUNCTION [dbo].[udf-Str-Extract] (@String varchar(max),@Delimiter1 varchar(100),@Delimiter2 varchar(100))
Returns Table 
As
Return (  

with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
       cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 N1,cte1 N2,cte1 N3,cte1 N4,cte1 N5,cte1 N6) A ),
       cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter1) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter1)) = @Delimiter1),
       cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter1,@String,s.N),0)-S.N,8000) From cte3 S)

Select RetSeq = Row_Number() over (Order By N)
      ,RetPos = N
      ,RetVal = left(RetVal,charindex(@Delimiter2,RetVal)-1) 
 From  (
        Select *,RetVal = Substring(@String, N, L) 
         From  cte4
       ) A
 Where charindex(@Delimiter2,RetVal)>1

)
/*
Max Length of String 1MM characters

Declare @String varchar(max) = 'Dear [[FirstName]] [[LastName]], ...'
Select * From [dbo].[udf-Str-Extract] (@String,'[[',']]')
*/

EDIT - Illustration of Raw Results

如果你只是简单地运行

Select * from  [dbo].[udf-Str-Extract](@S,'>','<')

结果将是:

enter image description here

关于SQL Server,将具有多行html代码的文本解析为表格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45578134/

相关文章:

javascript - JQuery - 将逗号添加到数字输入但不提交

mysql - 将多个不同的sql合并为一个

mysql - 如何从 MSSQL 中的行中没有特定值的表中选择数据

sql - 有没有办法防止触发器被禁用?

sql - 在 SQL Server 2008 R2 中引用前一行值进行算术计算

MySQL 子选择多个条件

时间:2019-03-08 标签:c#linqwhereconditionalif

sql-server - 您可以从交易中打印消息吗?

sql - 我怎样才能加快这个索引 View 的速度?

mysql - sql查询将两个查询合并为一个空行