ruby - 如何从字符串中删除 HTML 编码的字符?

标签 ruby string html-parsing

我有一个包含一些 HTML 编码字符的字符串,我想删除它们:

"<div>Hi All,</div><div class=\"paragraph_break\">< /></div><div>Starting today we are initiating PoLS.</div><div class=\"paragraph_break\"><br /></div><div>Please use the following communication protocols:<br /></div><div>1. Task Breakup and allocation - Gravity<br /></div><div>2. All mail communications - BC messages<br /></div><div>3. Reports on PoC / Spikes: Writeboard<br /></div><div>4. Non story related tasks: BC To-Do<br /></div><div>5. All UI and HTML will communicated to you through BC.<br /></div><div>6. For File sharing, we'll be using Dropbox.<br /></div><div>7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.</div><div class=\"paragraph_break\"><br /></div><div>You'll have been given necessary accesses to all these portals. Please start using them judiciously.</div><div class=\"paragraph_break\"><br /></div><div>All the best!</div><div class=\"paragraph_break\"><br /></div><div>Thanks,<br /></div><div>Saurav<br /></div>"

最佳答案

您想做的事情有很多方法。也许看看你为什么要这样做会有所帮助。通常当我想删除编码的 HTML 时,我想恢复 HTML 的内容。 Ruby 有一些模块可以让它变得简单。

require 'cgi'
require 'nokogiri'

html = "<div>Hi All,</div><div class=\"paragraph_break\">< /></div><div>Starting today we are initiating PoLS.</div><div class=\"paragraph_break\"><br /></div><div>Please use the following communication protocols:<br /></div><div>1. Task Breakup and allocation - Gravity<br /></div><div>2. All mail communications - BC messages<br /></div><div>3. Reports on PoC / Spikes: Writeboard<br /></div><div>4. Non story related tasks: BC To-Do<br /></div><div>5. All UI and HTML will communicated to you through BC.<br /></div><div>6. For File sharing, we'll be using Dropbox.<br /></div><div>7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.</div><div class=\"paragraph_break\"><br /></div><div>You'll have been given necessary accesses to all these portals. Please start using them judiciously.</div><div class=\"paragraph_break\"><br /></div><div>All the best!</div><div class=\"paragraph_break\"><br /></div><div>Thanks,<br /></div><div>Saurav<br /></div>"

puts CGI.unescapeHTML(html)

哪些输出:

<div>Hi All,</div><div class="paragraph_break">< /></div><div>Starting today we are initiating PoLS.</div><div class="paragraph_break"><br /></div><div>Please use the following communication protocols:<br /></div><div>1. Task Breakup and allocation - Gravity<br /></div><div>2. All mail communications - BC messages<br /></div><div>3. Reports on PoC / Spikes: Writeboard<br /></div><div>4. Non story related tasks: BC To-Do<br /></div><div>5. All UI and HTML will communicated to you through BC.<br /></div><div>6. For File sharing, we'll be using Dropbox.<br /></div><div>7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.</div><div class="paragraph_break"><br /></div><div>You'll have been given necessary accesses to all these portals. Please start using them judiciously.</div><div class="paragraph_break"><br /></div><div>All the best!</div><div class="paragraph_break"><br /></div><div>Thanks,<br /></div><div>Saurav<br /></div>

如果我想更进一步,删除标签,检索所有文本:

puts Nokogiri::HTML(CGI.unescapeHTML(html)).content

将输出:

Hi All,Starting today we are initiating PoLS.Please use the following communication protocols:1. Task Breakup and allocation - Gravity2. All mail communications - BC messages3. Reports on PoC / Spikes: Writeboard4. Non story related tasks: BC To-Do5. All UI and HTML will communicated to you through BC.6. For File sharing, we'll be using Dropbox.7. Use Skype for lighter and generic desicussions. However, in case you need any approvals, data for later reference, etc, then please use BC. PoLS conversation has been created on skype.You'll have been given necessary accesses to all these portals. Please start using them judiciously.All the best!Thanks,Saurav

当我看到那种字符串时,我通常想去哪里。

鲁比的 CGI使编码和解码 HTML 变得容易。 Nokogiri gem 使删除标签变得容易。

关于ruby - 如何从字符串中删除 HTML 编码的字符?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8929006/

相关文章:

node.js - 用node.js解析奇怪的html

python - 在 Python 中将十六进制字符串转换为整数

javascript - 使用 Javascript 根据多个字符从格式错误的 URL 中提取字符串

string - 如何从 Racket 中的字符串中删除重音?

ruby - 指定 Gemfile 的路径

PHP DOMDocument nodeValue 返回不同的编码

vb.net - 解析 HTML 表格

ruby - 使用不同元素时 ruby​​ 中的 flat_map

ruby-on-rails - 测试普通 ruby​​ 对象并将其集成到 Rails 3.2 应用程序中

ruby-on-rails - 在 Ruby 中使用 Rails 测试语法