我是 jsoup 的新手,想更熟悉如何从网站中提取信息。我正在尝试做一些简单的事情:从 eBay 获取一些值(value)。
我想从“本周热门”中获取商品名称、html 链接、价格和销售量(如此处:http://www.ebay.co.uk/sch/Action-Figures/246/bn_1632128/i.html)
但是我不确定如何进行。
package application;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import javax.swing.JOptionPane;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class GetHotSellers {
public static void main(String[] args) {
Document doc = Jsoup.parse(readURL("http://www.ebay.co.uk/sch/Action-Figures/246/bn_1632128/i.html"));
Elements sold_items = doc.getElementsMatchingText("sold$");
for(Element sold : sold_items) {
System.out.println(sold.text());
}
}
public static String readURL(String url) {
String fileContents = "";
String currentLine = "";
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(new URL(url).openStream()));
fileContents = reader.readLine();
while (currentLine != null) {
currentLine = reader.readLine();
fileContents += "\n" + currentLine;
}
reader.close();
reader = null;
} catch (Exception e) {
JOptionPane.showMessageDialog(null, e.getMessage(), "Error Message", JOptionPane.OK_OPTION);
e.printStackTrace();
}
return fileContents;
}
}
这是我得到的。我是否需要改进我的正则表达式,或者我是否需要使用一些更适合我的请求的其他函数?
我当前的输出如下所示:
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold 12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold 12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold
2016 8PC Marvel Avengers DC Super Hero Mini Figure Set Fits Lego FROM UK £6.35 381 sold
381 sold
381 sold
Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold
Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold
Despicable Me Minions Supervillain Jet Playset -From the Argos Shop on ebay £7.99 187 sold
187 sold
187 sold
Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold
Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold
Avengers Marvel Titan 12" figure Spider-man Captain Iron man Wolverine Thor Toy £8.69 174 sold
174 sold
174 sold
Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold
Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold
Imaginext Marvel DC Super Hero Squad Figures and Villains Batman Please select £1.99 129 sold
129 sold
129 sold
Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold
Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold
Star Wars Episode The Force Awakens Electronic Chewbacca Mask IN STOCK NOW! £24.99 101 sold
101 sold
101 sold
Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold
Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold
Jurassic World Indominus Rex Chomping Dinosaur 44cm Figure T-Rex Dino Action Toy £26.99 89 sold
89 sold
89 sold
12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold
12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold
12" Avengers Marvel Titan Figures Spider-Man Captain Iron Man Wolverine Thor Toy £7.45 88 sold
88 sold
88 sold
Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay £7.99 87 sold
87 sold
87 sold
我想要的输出示例:
Henry Hugglemonster Huggle House Playset. From the Official Argos Shop on ebay || £7.99 || 87 sold || http://link.com
编辑:
刚刚试过类似的东西,但没有运气。
for(String categoryURL : categoryLinksArray) {
Document doc = Jsoup.parse(readURL(categoryURL));
Elements sold_items = doc.getElementsByClass("b-block-info-container");
for(Element sold : sold_items) {
System.out.println("NAME: " + sold.attr("b-block-info-container__title b-block-info-container__title__ListingSummary") + "\n" +
"PRICE: " + sold.attr("b-block-info-container__price") + "\n" +
"SOLD/week: " + sold.attr("item_quantity__hotness") + "\n" +
"URL: " + sold.attr("abs:href"));
System.out.println("--------------------------------------");
}
}
最佳答案
我做到了,但不是很有效,因为它很慢。
public static void main(String[] args) {
ArrayList<String> categoryLinksArray = new ArrayList<>();
Document links = Jsoup.parse(readURL("http://www.ebay.co.uk/sch/allcategories/all-categories"));
Elements item_categories = links.getElementsByClass("ch");
for (Element category : item_categories) {
categoryLinksArray.add(category.attr("abs:href"));
}
for (String categoryURL : categoryLinksArray) {
Document doc = Jsoup.parse(readURL(categoryURL));
Elements hot_items = doc
.getElementsByClass("b-module b-module-carousel b-module-deals topSold b-display--portrait");
for (Element item : hot_items) {
Elements hot_items_names = item.getElementsByClass(
"b-block-info-container__title b-block-info-container__title__ListingSummary");
Elements hot_items_price = item.getElementsByClass("b-block-info-container__price");
Elements hot_items_sold = item.getElementsByClass("item_quantity__hotness");
Elements hot_items_url = item.getElementsByClass("b-block-tile");
HashMap<String, String> hs_items = new HashMap<>();
for (Element item_name : hot_items_names) {
hs_items.put("Name", item_name.text());
}
for (Element item_price : hot_items_price) {
hs_items.put("Price", item_price.text());
}
for (Element item_sold : hot_items_sold) {
hs_items.put("Sold", item_sold.text());
}
for (Element item_url : hot_items_url) {
hs_items.put("URL", item_url.attr("abs:href"));
}
System.out.println("Name: " + hs_items.get("Name") + "\n" +
"Price: " + hs_items.get("Price") + "\n" +
"Sold: " + hs_items.get("Sold") + "\n" +
"URL: " + hs_items.get("URL") + "\n" +
"----------------------------------");
}
}
}
关于java - jsoup 获取与它们相关的特定标签和值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40853610/