请问我的代码有什么问题。我想从 html 页面获取结果并将值存储在字符串中或稍后存储在数组中....谢谢
09-05 16:36:41.221: I/test(22697): plan failed 1org.xml.sax.SAXParseException: attr value delimiter missing! (position:START_TAG @1:166 in java.io.StringReader@4061bc98) 09-05 16:36:41.221: I/test(22697): plan failed 1a @1:166 in java.io.StringReader@4061bc98) 09-05 16:36:41.231: W/System.err(22697): at org.apache.harmony.xml.parsers.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:151) 09-05 16:36:41.231: W/System.err(22697): at com.asiatype.boracay.CurrencyActivity$DownloadData.doInBackground(CurrencyActivity.java:194) 09-05 16:36:41.231: W/System.err(22697): at com.asiatype.boracay.CurrencyActivity$DownloadData.doInBackground(CurrencyActivity.java:1) 09-05 16:36:41.231: W/System.err(22697): at android.os.AsyncTask$2.call(AsyncTask.java:185) 09-05 16:36:41.231: W/System.err(22697): at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:306) 09-05 16:36:41.231: W/System.err(22697): at java.util.concurrent.FutureTask.run(FutureTask.java:138) 09-05 16:36:41.231: W/System.err(22697): at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1088) 09-05 16:36:41.231: W/System.err(22697): at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:581) 09-05 16:36:41.231: W/System.err(22697): at java.lang.Thread.run(Thread.java:1027)
String s,link;
String theResult = "";
link="http://www.bsp.gov.ph/statistics/sdds/exchrate.htm";
Document doc;
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(link);
HttpResponse response;
try {
response = client.execute(request);
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
htmlSource = str.toString();
} catch (ClientProtocolException e2) {
// TODO Auto-generated catch block
e2.printStackTrace();
} catch (IOException e2) {
// TODO Auto-generated catch block
e2.printStackTrace();
}
try {
doc = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().parse(new InputSource(new StringReader(htmlSource)));
XPathExpression xpath = XPathFactory.newInstance()
.newXPath().compile("//div/table/tbody/tr[child::td[contains(text(),\"USD\")]]/td[15]");
htmlResult = (String) xpath.evaluate(doc, XPathConstants.STRING);
} catch (SAXException e1) {
// TODO Auto-generated catch block
Log.i("test", "plan failed 1"+e1);
Log.i("test", "plan failed 1a "+ htmlSource);
Log.i("test", "plan failed 1a "+ htmlResult);
e1.printStackTrace();
} catch (IOException e1) {
// TODO Auto-generated catch block
Log.i("test", "plan failed 2");
e1.printStackTrace();
} catch (ParserConfigurationException e1) {
// TODO Auto-generated catch block
Log.i("test", "plan failed 3");
e1.printStackTrace();
} catch (XPathExpressionException e) {
// TODO Auto-generated catch block
Log.i("test", "plan failed 4");
e.printStackTrace();
}
最佳答案
source 您用作输入的 HTML 文件不是格式正确的 XML,这就是抛出 SAXParseException
的原因 - 让您知道缺少 XML 属性的值定界符。
HTML 和 XML 非常不同。例如,HTML 可以缺少或不匹配的结束标记,以及不带引号的属性值,而 XML 不允许这样做。出于这个原因,强烈建议不要不尝试将 HTML 解析为 XML。解析无法解决 HTML 允许的所有不一致问题。
有几种替代方法可以解决这个问题:
- 来自 Reading HTML file to DOM tree using Java - 使用 Neko尝试使 HTML 成为有效的 XML,这将使您能够保留现有的 SAXParser 代码,您必须找到日期
- 来自上面的同一个问题 - 使用 JTidy将 HTML 解析为 DOM 树并使用 DOM 方法查找数据。参见 xml dom parser in java?对于一些 Java DOM 解析器
关于java - 如何使用 xpath 从特定站点获取值并将其存储在数组中? (安卓),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12277531/