今天早些时候我发布了 Finding number of occurrences on website
但是我收到的答案并没有像我希望的那样有帮助。我试图告诉爬虫从它找到的网站中阅读文本并搜索一个给定的单词。我发现了这一点:
org.jsoup.nodes.Document dom = Jsoup.parse(html);
但是我不知道如何实现它。请帮忙
爬虫
public void crawlFrom(String link){ // TODO
try
{
Connection connection = Jsoup.connect(link).userAgent(USER_AGENT);
Document htmlDocument = connection.get();
this.htmlDocument = htmlDocument;
System.out.println("Received web page at " + link);
Elements linksOnPage = htmlDocument.select("a[href]");
System.out.println("------------------\nFound (" + linksOnPage.size() + ") links\n------------------");
for(Element newLink : linksOnPage)
{
this.linkListe.add(newLink.absUrl("href"));
}
}
catch(IOException ioe)
{
// We were not successful in our HTTP request
System.out.println("Error in out HTTP request " + ioe);
}
System.out.println(linkListe);
return;
}
搜索者
public int searchHits(String target, String aften){ // TODO
String[] out = new String[0];
int occurrences = 0;
if (aften.contains(target)) {
occurrences++;
}
return occurrences;
}
最佳答案
我不太确定 aften
和 target
是什么,但我会给你一段代码,用于搜索文本中某个单词的出现次数。
public int searchHits(String target, String aften){ // TODO
int index = 0;
int occurrences = 0;
while(index != -1){
index = aften.indexOf(target,index); // start search from index 0
if(index != -1){
occurrences ++; //if found, increment the counter
index += target.length(); // set the next starting index to be after this current index
}
}
return occurrences;
}
---------------------更新---------------------
将抽象方法从 int searchHits(String word);
更改为 int searchHits(String target, String aften);
关于java - 让网络爬虫通过网站进行解析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33641925/