我有一个快速的问题,我很难弄清楚。我想逐行读取 html 文件,但我想跳过 HEAD 标签。因此,我认为我可以在跳过 HEAD 标签后开始阅读文本。
到目前为止我已经创建了:
BufferedReader reader = new BufferedReader(new InputStreamReader(socket.getInputStream()));
StringBuilder string = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
if (line.startsWith("<html>"))
string.append(line + "\n");
}
我想将html代码保存在内存中,但不带HEAD信息。
示例:
<HTML>
<HEAD>
<TITLE>Your Title Here</TITLE>
</HEAD>
<BODY BGCOLOR="FFFFFF">
<CENTER><IMG SRC="clouds.jpg" ALIGN="BOTTOM"> </CENTER>
<a href="http://somegreatsite.com">Link Name</a>is a link to another nifty site
<H1>This is a Header</H1>
<H2>This is a Medium Header</H2>
Send me mail at <a href="mailto:support@yourcompany.com">support@yourcompany.com</a>.
</BODY>
我想保存除标签信息之外的所有内容。
最佳答案
像这样怎么样 -
boolean htmlFound = false; // Have we found an open html tag?
StringBuilder string = new StringBuilder(); // Back to your code...
String line;
while ((line = reader.readLine()) != null) {
if (!htmlFound) { // Have we found it yet?
if (line.toLowerCase().startsWith("<html")) { // Check if this line opens a html tag...
htmlFound = true; // yes? Excellent!
} else {
continue; // Skip over this line...
}
}
System.out.println("This is each line: " + line);
string.append(line + "\n");
}
关于java - 读取HTML,如何使用BufferedReader跳过网页中的HEAD标签信息,逐行读取HTML?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20535287/