java - 如何使用 xpath 从特定站点获取值并将其存储在数组中？ (安卓)

请问我的代码有什么问题。我想从 html 页面获取结果并将值存储在字符串中或稍后存储在数组中....谢谢

09-05 16:36:41.221: I/test(22697): plan failed 1org.xml.sax.SAXParseException: attr value delimiter missing! (position:START_TAG @1:166 in java.io.StringReader@4061bc98) 09-05 16:36:41.221: I/test(22697): plan failed 1a @1:166 in java.io.StringReader@4061bc98) 09-05 16:36:41.231: W/System.err(22697): at org.apache.harmony.xml.parsers.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:151) 09-05 16:36:41.231: W/System.err(22697): at com.asiatype.boracay.CurrencyActivity$DownloadData.doInBackground(CurrencyActivity.java:194) 09-05 16:36:41.231: W/System.err(22697): at com.asiatype.boracay.CurrencyActivity$DownloadData.doInBackground(CurrencyActivity.java:1) 09-05 16:36:41.231: W/System.err(22697): at android.os.AsyncTask$2.call(AsyncTask.java:185) 09-05 16:36:41.231: W/System.err(22697): at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:306) 09-05 16:36:41.231: W/System.err(22697): at java.util.concurrent.FutureTask.run(FutureTask.java:138) 09-05 16:36:41.231: W/System.err(22697): at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1088) 09-05 16:36:41.231: W/System.err(22697): at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:581) 09-05 16:36:41.231: W/System.err(22697): at java.lang.Thread.run(Thread.java:1027)

            String s,link;
        String theResult = "";
        link="http://www.bsp.gov.ph/statistics/sdds/exchrate.htm";
        Document doc;
        HttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet(link);
        HttpResponse response;
        try {
            response = client.execute(request);
            InputStream in = response.getEntity().getContent();
            BufferedReader reader = new BufferedReader(new InputStreamReader(in));
            StringBuilder str = new StringBuilder();
            String line = null;
            while((line = reader.readLine()) != null)
            {
                str.append(line);
            }
            in.close();
            htmlSource = str.toString();
        } catch (ClientProtocolException e2) {
            // TODO Auto-generated catch block
            e2.printStackTrace();
        } catch (IOException e2) {
            // TODO Auto-generated catch block
            e2.printStackTrace();
        }


        try {
            doc = DocumentBuilderFactory.newInstance()
                      .newDocumentBuilder().parse(new InputSource(new StringReader(htmlSource)));
            XPathExpression xpath = XPathFactory.newInstance()
                      .newXPath().compile("//div/table/tbody/tr[child::td[contains(text(),\"USD\")]]/td[15]");
                    htmlResult = (String) xpath.evaluate(doc, XPathConstants.STRING);
        } catch (SAXException e1) {
            // TODO Auto-generated catch block
            Log.i("test", "plan failed 1"+e1);
            Log.i("test", "plan failed 1a "+ htmlSource);
            Log.i("test", "plan failed 1a "+ htmlResult);
            e1.printStackTrace();
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            Log.i("test", "plan failed 2");

            e1.printStackTrace();
        } catch (ParserConfigurationException e1) {
            // TODO Auto-generated catch block
            Log.i("test", "plan failed 3");

            e1.printStackTrace();
        } catch (XPathExpressionException e) {
            // TODO Auto-generated catch block
            Log.i("test", "plan failed 4");

            e.printStackTrace();
        }

最佳答案

source 您用作输入的 HTML 文件不是格式正确的 XML，这就是抛出 SAXParseException 的原因 - 让您知道缺少 XML 属性的值定界符。

HTML 和 XML 非常不同。例如，HTML 可以缺少或不匹配的结束标记，以及不带引号的属性值，而 XML 不允许这样做。出于这个原因，强烈建议不要不尝试将 HTML 解析为 XML。解析无法解决 HTML 允许的所有不一致问题。

有几种替代方法可以解决这个问题:

来自 Reading HTML file to DOM tree using Java - 使用 Neko尝试使 HTML 成为有效的 XML，这将使您能够保留现有的 SAXParser 代码，您必须找到日期
来自上面的同一个问题 - 使用 JTidy将 HTML 解析为 DOM 树并使用 DOM 方法查找数据。参见 xml dom parser in java?对于一些 Java DOM 解析器

关于java - 如何使用 xpath 从特定站点获取值并将其存储在数组中？ (安卓)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12277531/

java - 如何使用 xpath 从特定站点获取值并将其存储在数组中？ (安卓)

上一篇：java - 使用 ant 引用 jar(不是/libs)的外部文件夹

下一篇：android - 夏洛克操作栏菜单