html - 如何获取html页面的 "Text"？ (网络浏览器 - 德尔福)

我正在使用 WebBrowser 获取 html 页面的源代码。我们的页面源代码有一些文本和一些 html 标签。像这样:

FONT&gt;&lt;/P&gt;&lt;P align=center&gt;&lt;FONT color=#ccffcc size=3&gt;**Hello There , This is a text in our html page** &lt;/FONT&gt;&lt;/P&gt;&lt;P align=center&gt; &lt;/P&gt;

Html 标签是随机的，我们无法猜测它们。那么有没有办法只获取文本并将它们与 html 标签分开？

最佳答案

你可以使用 TWebBrowser从 html 代码中解析和选择纯文本的实例。

看这个例子

uses
MSHTML,
SHDocVw,
ActiveX;

function GetPlainText(Const Html: string): string;
var
DummyWebBrowser: TWebBrowser;
Document       : IHtmlDocument2;
DummyVar       : Variant;
begin
   Result := '';
   DummyWebBrowser := TWebBrowser.Create(nil);
   try
     //open an blank page to create a IHtmlDocument2 instance
     DummyWebBrowser.Navigate('about:blank');
     Document := DummyWebBrowser.Document as IHtmlDocument2; 
     if (Assigned(Document)) then //Check the Document
     begin
       DummyVar      := VarArrayCreate([0, 0], varVariant); //Create a variant array to write the html code to the  IHtmlDocument2
       DummyVar[0]   := Html; //assign the html code to the variant array
       Document.Write(PSafeArray(TVarData(DummyVar).VArray)); //set the html in the document
       Document.Close;
       Result :=(Document.body as IHTMLBodyElement).createTextRange.text;//get the plain text
     end;
   finally
     DummyWebBrowser.Free;
   end;
end;

关于html - 如何获取html页面的 "Text"？ (网络浏览器 - 德尔福)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/3666392/

上一篇：javascript - 如何将一张图片与另一张图片叠加？

下一篇：html - 您可以在 html 表中的行之间设置单元格间距，而无需在列中设置单元格间距吗？

相关文章：

html - 如何添加带有固定标题的滚动条

HTML5/CSS3模态框兼容IE

delphi - 使用 Delphi 和 Access

delphi - 德尔福的指针类型转换

image - 什么是对文本进行下采样的最佳过滤器？

javascript - 如何在 div 上应用 maskmoney？

javascript - 我想在 ng-disabled 中给出多个条件

delphi - 创建自定义构造函数而不直接调用继承

android - metaio sdk (openGL) 制动字体和图像渲染

c++ - 如何在c++中从一行中的两个文件(水平)中打印数据？如给定的。这两种形式来自两个不同的文本文件