java - 读取HTML，如何使用BufferedReader跳过网页中的HEAD标签信息，逐行读取HTML？

关闭。此题需要details or clarity 。目前不接受答案。

想要改进这个问题吗？通过 editing this post 添加详细信息并澄清问题.

已关闭 9 年前。

我有一个快速的问题，我很难弄清楚。我想逐行读取 html 文件，但我想跳过 HEAD 标签。因此，我认为我可以在跳过 HEAD 标签后开始阅读文本。

到目前为止我已经创建了:

BufferedReader reader = new BufferedReader(new InputStreamReader(socket.getInputStream()));

StringBuilder string = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
    if (line.startsWith("<html>")) 
        string.append(line + "\n");
}

我想将html代码保存在内存中，但不带HEAD信息。

示例:

<HTML>

<HEAD>

    <TITLE>Your Title Here</TITLE>

</HEAD>

<BODY BGCOLOR="FFFFFF">

    <CENTER><IMG SRC="clouds.jpg" ALIGN="BOTTOM"> </CENTER>

    <a href="http://somegreatsite.com">Link Name</a>is a link to another nifty site

    <H1>This is a Header</H1>

    <H2>This is a Medium Header</H2>

    Send me mail at <a href="mailto:support@yourcompany.com">support@yourcompany.com</a>.

</BODY>

我想保存除标签信息之外的所有内容。

最佳答案

像这样怎么样 -

boolean htmlFound = false;                        // Have we found an open html tag?
StringBuilder string = new StringBuilder();       // Back to your code...
String line;
while ((line = reader.readLine()) != null) {
  if (!htmlFound) {                               // Have we found it yet?
    if (line.toLowerCase().startsWith("<html")) { // Check if this line opens a html tag...
      htmlFound = true;                           // yes? Excellent!
    } else {
      continue;                                   // Skip over this line...
    }
  }
  System.out.println("This is each line: " + line);
  string.append(line + "\n");
}

关于java - 读取HTML，如何使用BufferedReader跳过网页中的HEAD标签信息，逐行读取HTML？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20535287/

java - 读取HTML，如何使用BufferedReader跳过网页中的HEAD标签信息，逐行读取HTML？

上一篇：java - JSF/JPA 错误 : The class 'java.lang.String' does not have the property

下一篇：java - 将 2 个列表插入标签 <c :forEach> for loop Lists synchronously