我有 2 个具有多个细微差别的 HTML 字符串:
<tbody class="Expanded4" id="divisionG_area24_clubs"><!--<tr><th class='noBorderLeftRight'></th>--><th class="noBorderLeftRight" colspan="6"></th><th colspan="6"><table style="margin-bottom:auto;" width="100%"><tr><th class="noBorderTopLeft" style="width:auto;"></th><th class="Grid_top_blue Grid_Table" colspan="2">Membership</th><th class="Grid_top_blue Grid_Table" colspan="1">Goal4s</th><th class="Grid_Title_top_black grid_blue_border" colspan="6">Education</th><th class="Grid_Title_top_black grid_blue_border" colspan="2">Mem.</th><th class="Grid_Title_top_black grid_blue_border" colspan="2">Trn.</th><th class="Grid_Title_top_black grid_blue_border" colspan="2">Rn.|Lst.</th></tr><tr><th class="noBorderTopLeft" style="width:auto;"></th><th class="Grid_top_black_max Grid_Table">Base</th><th class="Grid_top_black_max Grid_Table">To Date</th><th class="Grid_top_black_max Grid_Table blue_border_right">Met</th><th class="Grid_top_black" title="Four Level 1 awards">1</th><th class="Grid_top_black" title="Two Level 2 awards">2</th><th class="Grid_top_black" title="Two more Level 2 awards">3</th><th class="Grid_top_black" title="Two Level 3 awards">1</th><th class="Grid_top_black" title="One Level 4, Level 5, or DTM award">5</th><th class="Grid_top_black" title="One more Level 4, Level 5, or DTM award">6</th><th class="Grid_top_black max22" title="4 New members">7</th><th class="Grid_top_black max22" title="4 More new members">9</th><th class="Grid_top_black max22" title="4 Officers trained first training period">9a</th><th class="Grid_top_black max22" title="4 Officers trained second training period">9b</th><th class="Grid_top_black max22" title="1 Dues-renewal on time">10a</th><th class="Grid_top_black max22" title="1 officer list on time">10b</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=01448795'"><td class="Grid_Title_top5 min280 crop" title="Advanced Speakers on the Hill"> <span class="redFont">01448795</span> Advanced Speakers on the Hill</td><th class="Grid_Table_yellow"><span>29<span></span></span></th><td class="Grid_Table title_gray"><span>30<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">2</span></td><th class="Grid_Title_goal" title="3 Level 1s needed">1</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goalAchieved" title="Achieved">1</th><th class="Grid_Title_goalAchieved" title="Achieved">1</th><th class="Grid_Title_goal" title="3 New Members needed">1</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">7</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=02194262'"><td class="Grid_Title_top5 min280 crop" title="Inclusive Toastmasters"> <span class="redFont">02194262</span> Inclusivey Toastmasters</td><th class="Grid_Table_yellow"><span>21<span></span></span></th><td class="Grid_Table title_gray"><span>21<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">7</span></td><th class="Grid_Title_goal" title="4 Level 1s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">1</th><th class="Grid_Title_goal" title="1 Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="1 more Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">5</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=02785335'"><td class="Grid_Title_top5 min280 crop" title="Club Toastmasters FrancoFun"> <span class="redFont">02785335</span> Club Toastmsasters FrancoFun</td><th class="Grid_Table_yellow"><span>21<span></span></span></th><td class="Grid_Table title_gray"><span>21<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">1</span></td><th class="Grid_Title_goal" title="4 Level 1s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goalAchieved" title="Achieved">1</th><th class="Grid_Title_goal" title="1 more Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">6</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=04437661'"><td class="Grid_Title_top5 min280 crop" title="Feel Good Toastmasters"> <span class="redFont">04437661</span> Feel Good Toastmasters</td><th class="Grid_Table_yellow"><span>21<span></span></span></th><td class="Grid_Table title_gray"><span>22<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">0</span></td><th class="Grid_Title_goal" title="4 Level 1s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goal" title="1 Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="1 more Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="3 New Members needed">1</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">6</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr></table></th></tbody>
和
<tbody class="Expanded4" id="divisionG_area24_clubs"><!--<tr><th class='noBorderLeftRight'></th>--><th class="noBorderLeftRight" colspan="6"></th><th colspan="6"><table style="margin-bottom:auto;" width="100%"><tr><th class="noBorderTopLeft" style="width:auto;"></th><th class="Grid_top_blue Grid_Table" colspan="2">Membership</th><th class="Grid_top_blue Grid_Table" colspan="1">Goals</th><th class="Grid_Title_top_black grid_blue_border" colspan="6">Education</th><th class="Grid_Title_top_black grid_blue_border" colspan="2">Mem.</th><th class="Grid_Title_top_black grid_blue_border" colspan="2">Trn.</th><th class="Grid_Title_top_black grid_blue_border" colspan="2">Rn.|Lst.</th></tr><tr><th class="noBorderTopLeft" style="width:auto;"></th><th class="Grid_top_black_max Grid_Table">Base</th><th class="Grid_top_black_max Grid_Table">To Date</th><th class="Grid_top_black_max Grid_Table blue_border_right">Met</th><th class="Grid_top_black" title="Four Level 1 awards">1</th><th class="Grid_top_black" title="Two Level 2 awards">2</th><th class="Grid_top_black" title="Two more Level 2 awards">3</th><th class="Grid_top_black" title="Two Level 3 awards">4</th><th class="Grid_top_black" title="One Level 4, Level 5, or DTM award">5</th><th class="Grid_top_black" title="One more Level 4, Level 5, or DTM award">6</th><th class="Grid_top_black max22" title="4 New members">7</th><th class="Grid_top_black max22" title="4 More new members">8</th><th class="Grid_top_black max22" title="4 Officers trained first training period">9a</th><th class="Grid_top_black max22" title="4 Officers trained second training period">9b</th><th class="Grid_top_black max22" title="1 Dues-renewal on time">10a</th><th class="Grid_top_black max22" title="1 officer list on time">10b</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=01448795'"><td class="Grid_Title_top5 min280 crop" title="Advanced Speakers on the Hill"> <span class="redFont">01448795</span> Advanced Speakers on the Hill</td><th class="Grid_Table_yellow"><span>29<span></span></span></th><td class="Grid_Table title_gray"><span>30<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">2</span></td><th class="Grid_Title_goal" title="3 Level 1s needed">1</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goalAchieved" title="Achieved">1</th><th class="Grid_Title_goalAchieved" title="Achieved">1</th><th class="Grid_Title_goal" title="3 New Members needed">1</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">7</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=02194262'"><td class="Grid_Title_top5 min280 crop" title="Inclusive Toastmasters"> <span class="redFont">02194262</span> Inclusive Toastmasters</td><th class="Grid_Table_yellow"><span>21<span></span></span></th><td class="Grid_Table title_gray"><span>21<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">0</span></td><th class="Grid_Title_goal" title="4 Level 1s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goal" title="1 Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="1 more Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">5</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=02785335'"><td class="Grid_Title_top5 min280 crop" title="Club Toastmasters FrancoFun"> <span class="redFont">02785335</span> Club Toastmasters FrancoFun</td><th class="Grid_Table_yellow"><span>21<span></span></span></th><td class="Grid_Table title_gray"><span>21<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">1</span></td><th class="Grid_Title_goal" title="4 Level 1s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goalAchieved" title="Achieved">1</th><th class="Grid_Title_goal" title="1 more Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">6</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=04437661'"><td class="Grid_Title_top5 min280 crop" title="Feel Good Toastmasters"> <span class="redFont">04437661</span> Feel Good Toastmasters</td><th class="Grid_Table_yellow"><span>21<span></span></span></th><td class="Grid_Table title_gray"><span>22<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">0</span></td><th class="Grid_Title_goal" title="4 Level 1s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goal" title="1 Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="1 more Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="3 New Members needed">1</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">6</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr></table></th></tbody>
我正在尝试寻找两个字符串之间的差异。我需要返回第二个字符串,其中使用 <mark>
突出显示任何差异。标签。
这有点难以解释,所以这里有一些例子:
如果一个字符串包含文本 <span>This is a string</span>
第二个有 <span>Thiss is a string</span>
,我要回<span><mark>Thiss is a string</mark></span>
。
如果另一个字符串具有文本 <p>36</p>
第二个有 <p>3</p>
,我要回<p><mark>3</mark></p>
.
请注意 <mark>
标签插入到最近的>
之后到差异的左侧,而 </mark>
插入到最近的 <
之前到差异的右侧。
我确信这是可能的,但我似乎找不到一种有效的方法来实现这一点。这是我到目前为止所拥有的:
skew=0
prev_i = []
highlighted_area_info = my_second_html_string
diff = difflib.ndiff(my_first_html_string, my_second_html_string)
for i,s in enumerate(diff, start=0):
if s[0]==' ':
continue
else:
if i in prev_i:
continue
count_right = my_second_html_string[i].find('<')
count_left = 0
for a, b in reversed(list(enumerate(my_second_html_string))):
if a < i:
if b == ">":
break
else:
count_left += 1
highlighted_area_info2 = highlighted_area_info[:i-count_left+skew]
highlighted_area_info2 += highlight_beginning
highlighted_area_info2 += highlighted_area_info[i-count_left+skew:i+count_right+skew]
highlighted_area_info2 += highlight_end
highlighted_area_info2 += highlighted_area_info[i+count_right+skew:]
skew += len(highlight_beginning)+len(highlight_end)
highlighted_area_info = highlighted_area_info2
prev_i = list(range(i-count_left+skew, i+count_right+skew))
print(highlighted_area_info)
不幸的是,<mark>
和</mark>
标签插入到不正确的位置,导致类似这样的问题:<td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder"><mark>0</</ma<mark>rk>s</mark>pan></td>
而不是<td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder"><mark>0</mark></span></td>
,这正是我所期待的。
我已经花了几天时间在这上面,但我仍然不确定我做错了什么,尽管有些事情显然是不对的。我的代码也可能没有利用最有效的方式来实现我的目标。
我需要在几天内获得工作代码,因此非常感谢任何帮助。
最佳答案
我用过print()
测试代码中变量的值,我发现您使用 ndiff(string1, string2)
但它需要ndiff(list_of_lines1, list_of_lines2)
- 因此它将您的字符串视为字符列表,并分别比较每个字符。这样它就把<mark>
对于每个更改的字符 - 而不是放置一个 <mark>
获取完整单词。
我尝试使用单行列表 ndiff([string1], [string2])
来更改此设置和其他变化,但最终我辞职了,因为这没有意义。您宁愿需要使用 lxml
或Beautifulsoup
解析HTML
到树 tags
如nodes
然后比较text
在nodes
.
我找到模块xmldiff它使用 lxml
它生成两个 XML
的更改列表或HTML
.
import xmldiff.main
all_changes = xmldiff.main.diff_texts(my_first_html_string, my_second_html_string)
每个change
给出xpath
所以我用lxml
查找节点并替换 text
与 <mark>text</mark>
它可以找到不同的changes但我只需要 UpdateTextIn
(当文本位于标签内时 - 即 <a>new text</a>
)和 UpdateTextAfter
(当文本位于标签之后时 - 即 <a>...</a>new text
highlighted_tree = lxml.etree.fromstring(my_second_html_string)
for item in all_changes:
highlighted_node = highlighted_tree.xpath(item.node)[0]
if isinstance(item, xmldiff.actions.UpdateTextIn):
highlighted_node.text = '' # remove
highlighted_node.insert(0, lxml.etree.fromstring('<mark>' + item.text + '</mark>'))
if isinstance(item, xmldiff.actions.UpdateTextAfter):
highlighted_node.tail = '' # remove # has to be before addnext
highlighted_node.addnext(lxml.etree.fromstring('<mark>' + item.text + '</mark>'))
之后我再次将树转换为 HTML
html = lxml.etree.tostring(highlighted_tree)
print(html.decode())
带数据的最小工作示例
import xmldiff.main # diff_texts
import xmldiff.actions # UpdateTextIn, UpdateTextAfter
import lxml.etree
my_first_html_string = '''<tbody class="Expanded4" id="divisionG_area24_clubs"><!--<tr><th class='noBorderLeftRight'></th>--><th class="noBorderLeftRight" colspan="6"></th><th colspan="6"><table style="margin-bottom:auto;" width="100%"><tr><th class="noBorderTopLeft" style="width:auto;"></th><th class="Grid_top_blue Grid_Table" colspan="2">Membership</th><th class="Grid_top_blue Grid_Table" colspan="1">Goal4s</th><th class="Grid_Title_top_black grid_blue_border" colspan="6">Education</th><th class="Grid_Title_top_black grid_blue_border" colspan="2">Mem.</th><th class="Grid_Title_top_black grid_blue_border" colspan="2">Trn.</th><th class="Grid_Title_top_black grid_blue_border" colspan="2">Rn.|Lst.</th></tr><tr><th class="noBorderTopLeft" style="width:auto;"></th><th class="Grid_top_black_max Grid_Table">Base</th><th class="Grid_top_black_max Grid_Table">To Date</th><th class="Grid_top_black_max Grid_Table blue_border_right">Met</th><th class="Grid_top_black" title="Four Level 1 awards">1</th><th class="Grid_top_black" title="Two Level 2 awards">2</th><th class="Grid_top_black" title="Two more Level 2 awards">3</th><th class="Grid_top_black" title="Two Level 3 awards">1</th><th class="Grid_top_black" title="One Level 4, Level 5, or DTM award">5</th><th class="Grid_top_black" title="One more Level 4, Level 5, or DTM award">6</th><th class="Grid_top_black max22" title="4 New members">7</th><th class="Grid_top_black max22" title="4 More new members">9</th><th class="Grid_top_black max22" title="4 Officers trained first training period">9a</th><th class="Grid_top_black max22" title="4 Officers trained second training period">9b</th><th class="Grid_top_black max22" title="1 Dues-renewal on time">10a</th><th class="Grid_top_black max22" title="1 officer list on time">10b</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=01448795'"><td class="Grid_Title_top5 min280 crop" title="Advanced Speakers on the Hill"> <span class="redFont">01448795</span> Advanced Speakers on the Hill</td><th class="Grid_Table_yellow"><span>29<span></span></span></th><td class="Grid_Table title_gray"><span>30<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">2</span></td><th class="Grid_Title_goal" title="3 Level 1s needed">1</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goalAchieved" title="Achieved">1</th><th class="Grid_Title_goalAchieved" title="Achieved">1</th><th class="Grid_Title_goal" title="3 New Members needed">1</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">7</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=02194262'"><td class="Grid_Title_top5 min280 crop" title="Inclusive Toastmasters"> <span class="redFont">02194262</span> Inclusivey Toastmasters</td><th class="Grid_Table_yellow"><span>21<span></span></span></th><td class="Grid_Table title_gray"><span>21<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">7</span></td><th class="Grid_Title_goal" title="4 Level 1s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">1</th><th class="Grid_Title_goal" title="1 Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="1 more Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">5</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=02785335'"><td class="Grid_Title_top5 min280 crop" title="Club Toastmasters FrancoFun"> <span class="redFont">02785335</span> Club Toastmsasters FrancoFun</td><th class="Grid_Table_yellow"><span>21<span></span></span></th><td class="Grid_Table title_gray"><span>21<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">1</span></td><th class="Grid_Title_goal" title="4 Level 1s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goalAchieved" title="Achieved">1</th><th class="Grid_Title_goal" title="1 more Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">6</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=04437661'"><td class="Grid_Title_top5 min280 crop" title="Feel Good Toastmasters"> <span class="redFont">04437661</span> Feel Good Toastmasters</td><th class="Grid_Table_yellow"><span>21<span></span></span></th><td class="Grid_Table title_gray"><span>22<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">0</span></td><th class="Grid_Title_goal" title="4 Level 1s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goal" title="1 Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="1 more Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="3 New Members needed">1</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">6</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr></table></th></tbody>'''
my_second_html_string = '''<tbody class="Expanded4" id="divisionG_area24_clubs"><!--<tr><th class='noBorderLeftRight'></th>--><th class="noBorderLeftRight" colspan="6"></th><th colspan="6"><table style="margin-bottom:auto;" width="100%"><tr><th class="noBorderTopLeft" style="width:auto;"></th><th class="Grid_top_blue Grid_Table" colspan="2">Membership</th><th class="Grid_top_blue Grid_Table" colspan="1">Goals</th><th class="Grid_Title_top_black grid_blue_border" colspan="6">Education</th><th class="Grid_Title_top_black grid_blue_border" colspan="2">Mem.</th><th class="Grid_Title_top_black grid_blue_border" colspan="2">Trn.</th><th class="Grid_Title_top_black grid_blue_border" colspan="2">Rn.|Lst.</th></tr><tr><th class="noBorderTopLeft" style="width:auto;"></th><th class="Grid_top_black_max Grid_Table">Base</th><th class="Grid_top_black_max Grid_Table">To Date</th><th class="Grid_top_black_max Grid_Table blue_border_right">Met</th><th class="Grid_top_black" title="Four Level 1 awards">1</th><th class="Grid_top_black" title="Two Level 2 awards">2</th><th class="Grid_top_black" title="Two more Level 2 awards">3</th><th class="Grid_top_black" title="Two Level 3 awards">4</th><th class="Grid_top_black" title="One Level 4, Level 5, or DTM award">5</th><th class="Grid_top_black" title="One more Level 4, Level 5, or DTM award">6</th><th class="Grid_top_black max22" title="4 New members">7</th><th class="Grid_top_black max22" title="4 More new members">8</th><th class="Grid_top_black max22" title="4 Officers trained first training period">9a</th><th class="Grid_top_black max22" title="4 Officers trained second training period">9b</th><th class="Grid_top_black max22" title="1 Dues-renewal on time">10a</th><th class="Grid_top_black max22" title="1 officer list on time">10b</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=01448795'"><td class="Grid_Title_top5 min280 crop" title="Advanced Speakers on the Hill"> <span class="redFont">01448795</span> Advanced Speakers on the Hill</td><th class="Grid_Table_yellow"><span>29<span></span></span></th><td class="Grid_Table title_gray"><span>30<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">2</span></td><th class="Grid_Title_goal" title="3 Level 1s needed">1</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goalAchieved" title="Achieved">1</th><th class="Grid_Title_goalAchieved" title="Achieved">1</th><th class="Grid_Title_goal" title="3 New Members needed">1</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">7</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=02194262'"><td class="Grid_Title_top5 min280 crop" title="Inclusive Toastmasters"> <span class="redFont">02194262</span> Inclusive Toastmasters</td><th class="Grid_Table_yellow"><span>21<span></span></span></th><td class="Grid_Table title_gray"><span>21<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">0</span></td><th class="Grid_Title_goal" title="4 Level 1s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goal" title="1 Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="1 more Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">5</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=02785335'"><td class="Grid_Title_top5 min280 crop" title="Club Toastmasters FrancoFun"> <span class="redFont">02785335</span> Club Toastmasters FrancoFun</td><th class="Grid_Table_yellow"><span>21<span></span></span></th><td class="Grid_Table title_gray"><span>21<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">1</span></td><th class="Grid_Title_goal" title="4 Level 1s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goalAchieved" title="Achieved">1</th><th class="Grid_Title_goal" title="1 more Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">6</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr><tr class="Grid_Top_Row club_gray" onclick="window.location.href='ClubReport.aspx?id=04437661'"><td class="Grid_Title_top5 min280 crop" title="Feel Good Toastmasters"> <span class="redFont">04437661</span> Feel Good Toastmasters</td><th class="Grid_Table_yellow"><span>21<span></span></span></th><td class="Grid_Table title_gray"><span>22<span></span></span></td><td class="Grid_Table x_light_gray blue_border_right"><span class="chart_table_big_numbers goalsMetBorder">0</span></td><th class="Grid_Title_goal" title="4 Level 1s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 2s needed">0</th><th class="Grid_Title_goal" title="2 Level 3s needed">0</th><th class="Grid_Title_goal" title="1 Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="1 more Level 4, Level 5, or DTM needed">0</th><th class="Grid_Title_goal" title="3 New Members needed">1</th><th class="Grid_Title_goal" title="4 New Members needed">0</th><th class="Grid_Title_goalAchieved" title="First Training Period Achieved">6</th><th class="Grid_Title_goal" title="Second Training Period 4 needed">0</th><th class="Grid_Title_goal" title="On-time dues-renewal needed">0</th><th class="Grid_Title_goalAchieved" title="On-time officer list Achieved">1</th></tr></table></th></tbody>'''
#my_first_html_string = '''<html>test1 <p>325</p><div>This</div> testA</html>'''
#my_second_html_string = '''<html>test2 <p>3</p><div>Thiss</div> testB</html>'''
all_changes = xmldiff.main.diff_texts(my_first_html_string, my_second_html_string)
#old_tree = lxml.etree.fromstring(my_first_html_string)
#new_tree = lxml.etree.fromstring(my_second_html_string)
highlighted_tree = lxml.etree.fromstring(my_second_html_string)
for item in all_changes:
#print('item:', item)
#print('item.xpath:', item.node)
#print('item.text:', item.text)
#old_node = old_tree.xpath(item.node)[0]
#new_node = new_tree.xpath(item.node)[0]
#print('old node:', lxml.etree.tostring(old_node))
#print('new node:', lxml.etree.tostring(new_node))
#print('old text and tail:', [old_node.text, old_node.tail])
#print('new text and tail:', [new_node.text, new_node.tail])
highlighted_node = highlighted_tree.xpath(item.node)[0]
if isinstance(item, xmldiff.actions.UpdateTextIn):
print('changed text:', item.text)
highlighted_node.text = ''
highlighted_node.insert(0, lxml.etree.fromstring('<mark style="background:red">' + item.text + '</mark>'))
if isinstance(item, xmldiff.actions.UpdateTextAfter):
print('changed tail:', item.text)
highlighted_node.tail = '' # has to be removed before `addnext`
highlighted_node.addnext(lxml.etree.fromstring('<mark style="background:red">' + item.text + '</mark>'))
print('---')
html = lxml.etree.tostring(highlighted_tree)
html = html.decode()
print(html)
with open('output.html', 'w') as f:
f.write(html)
结果:
唯一的问题是,有时旧文本和新文本可能具有相同的文本,但空格、制表符、换行数不同,并且它也被视为 change
- 但它会被跳过(但这需要额外的代码)
关于python - 突出显示两个 html 字符串之间的差异,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63770166/