嗨,
我想提取 div
标记之间的文本
<div class="innercontenttxt">
<p>img border="1" align="left" height="170" width="324" vspace="3" hspace="2" src="/tmdbuserfiles/ramdev-balakrishna(1).jpg" alt="ramdev aide remanded, lakrishna acharya judicial remand, ramdev aide fake passport case, baba ramdev assistant judicial custody, balakrishna sent to judicial custody, yoga guru ramdev assistant remanded, yoga guru ramdev assistant balakrishna" />
Yoga guru Ramdev's aide Balakrishna Acharya remanded to 14 days judicial custody in a fake passport on Saturday. He was arrested yesterday after he failed to appear at a Dehradun court.
<br />
<br />
Balakrishna Acharya, who is basically a Nepalese citizen,
is alleged to have submitted fake documents to procure a passport.
When he failed to appear in Dehradun court in connection with the case,
</p>
</div>
解压后的结果应该是:
ramdev aide alakrishna Acharya remanded to 14 days judicial custody in a fake passport on Saturday. He was arrested yesterday after he failed to appear at a Dehradun court.Balakrishna Acharya, who is basically a Nepalese citizen, is alleged to have submitted fake documents to procure a passport. When he failed to appear in Dehradun court in connection with the case, the court had issued a non-bailable warrant and subsequently arrested him yesterday.
最佳答案
这个问题似乎与此类似other question 。
假设您已经将 html 源存储在名为 htmlPage 的字符串变量中。
int divIndex = htmlPage.indexOf("<div");
divIndex = htmlPage.indexOf(">", divIndex);
int endDivIndex = htmlPage.indexOf("</div>", divIndex);
String content = htmlPage.substring(divIndex + 1, endDivIndex);
关于java - 如何在java中从Html中的Div标签中提取文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11662505/