java - 从链接读取生成的 HTML 文件

标签 java html

所以我的问题是我似乎无法弄清楚如何使用 Java 从链接获取生成的 HTML 页面。这是我正在使用的代码:

public class URLReader {

    public static void main(String[] args) throws Exception {

        URL oracle = new URL("http://www.whalesonggames.com/oldforums/printthread.php?t=7495&pp=20&page=1");
        BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));

        String inputLine;
        while ((inputLine = in.readLine()) != null)
            System.out.println(inputLine);
        in.close();
    }
}

我想要打印的内容是:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en" id="vbulletin_html">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
    <base href="http://www.whalesonggames.com/oldforums/" /><!--[if IE]></base><![endif]-->
    <meta name="generator" content="vBulletin 4.2.2" />


    <link rel="stylesheet" type="text/css" href="css.php?styleid=3&amp;langid=1&amp;d=1381351020&amp;td=ltr&amp;sheet=bbcode.css,popupmenu.css,printthread.css,vbulletin.css,vbulletin-chrome.css" />




    <title> transfers</title>
    <link rel="stylesheet" type="text/css" href="css.php?styleid=3&amp;langid=1&amp;d=1381351020&amp;td=ltr&amp;sheet=additional.css" />

</head>
<body>

<div class="above_body">
<div id="header" class="floatcontainer">
<div><a name="top" href="forum.php" class="logo-image"><img src="images/misc/vbulletin4_logo.png" alt="The Infinite Black Forums - Powered by vBulletin" /></a></div>
</div>
</div>
<div class="body_wrapper">
<div id="pagetitle">
    <h1><a href="showthread.php?7495-transfers">transfers</a></h1>
    <p class="description">Printable View</p>
</div>



<ul id="postlist">
    <li class="postbit blockbody" id="post_1">
    <div class="header">
        <div class="datetime">04-10-2014, 06:59 AM</div>
        <span class="username">CaNc3r</span>
    </div>


        <div class="title">transfers</div>

    <div class="content">
        <blockquote class="restore">just wondering if we get our garrisons transfered also now? thank you.</blockquote>
    </div>
</li><li class="postbit blockbody" id="post_2">
    <div class="header">
        <div class="datetime">04-10-2014, 08:03 AM</div>
        <span class="username">replicatorz</span>
    </div>


    <div class="content">
        <blockquote class="restore">More at login says you can claim your grey corp with transfer.<br />
<br />
I am wondering what will happen now that I sold both sald corps in blue after claiming them on grey.  I suppose for now I will leave them undeployed/empty.</blockquote>
    </div>
</li><li class="postbit blockbody" id="post_3">
    <div class="header">
        <div class="datetime">04-10-2014, 08:07 AM</div>
        <span class="username">scoutsniper</span>
    </div>


    <div class="content">
        <blockquote class="restore">I'd like some clarification as well. When grey server opened GNG sent a lead at to grey to hold our spot. Since then we have tformed our red server garrison a full level. Does the mean our garrison on grey is 11 or 12?</blockquote>
    </div>
</li><li class="postbit blockbody" id="post_4">
    <div class="header">
        <div class="datetime">04-10-2014, 08:09 AM</div>
        <span class="username">CaNc3r</span>
    </div>


    <div class="content">
        <blockquote class="restore">anyone having login issues after reset?</blockquote>
    </div>
</li><li class="postbit blockbody" id="post_5">
    <div class="header">
        <div class="datetime">04-10-2014, 08:25 AM</div>
        <span class="username">replicatorz</span>
    </div>


    <div class="content">
        <blockquote class="restore">Never mind.  I reread login screen.  Question answered.</blockquote>
    </div>
</li><li class="postbit blockbody" id="post_6">
    <div class="header">
        <div class="datetime">04-10-2014, 08:50 AM</div>
        <span class="username">Ozymandias</span>
    </div>


    <div class="content">
        <blockquote class="restore">If the original Feb 10th duplicate was PURGED (entirely deleted), or if it never exited (post Feb 10th), it was re-duplicated today.<br />
<br />
If it is being used on the new server, there was no re-duplication. It has always existed there.<br />
<br />
You can type :TRANSFER to see what corporation you would transfer into.</blockquote>
    </div>
</li><li class="postbit blockbody" id="post_7">
    <div class="header">
        <div class="datetime">04-10-2014, 09:10 AM</div>
        <span class="username">Kolpo</span>
    </div>


    <div class="content">
        <blockquote class="restore">What if I tried to transfer a corp after feb 10th and it's dissapeared is there a way for me to get that back?</blockquote>
    </div>
</li><li class="postbit blockbody" id="post_8">
    <div class="header">
        <div class="datetime">04-10-2014, 09:11 AM</div>
        <span class="username">Ozymandias</span>
    </div>


    <div class="content">
        <blockquote class="restore"><a href="http://www.whalesonggames.com/forums/showthread.php?7497-Red-Blue-Green-Corporations-copied-to-Grey" target="_blank">http://www.whalesonggames.com/forums...copied-to-Grey</a></blockquote>
    </div>
</li><li class="postbit blockbody" id="post_9">
    <div class="header">
        <div class="datetime">04-10-2014, 09:12 AM</div>
        <span class="username">Ozymandias</span>
    </div>


    <div class="content">
        <blockquote class="restore"><div class="bbcode_container">
    <div class="bbcode_description">Quote:</div>
    <div class="bbcode_quote printable">
        <hr />

            <div>
                Originally Posted by <strong>Kolpo</strong>
                <a href="showthread.php?p=122005#post122005" rel="nofollow"><img class="inlineimg" src="images/buttons/viewpost.gif" alt="View Post" /></a>
            </div>
            <div class="message">What if I tried to transfer a corp after feb 10th and it's dissapeared is there a way for me to get that back?</div>

        <hr />
    </div>
</div>If it existed on the old servers still, it was duplicated today. Otherwise there's not much we can do.</blockquote>
    </div>
</li>
</ul>


</div>
<div class="below_body">
<div id="footer_time" class="footer_time">All times are GMT -7. The time now is <span class="time">07:20 PM</span>.</div>

<div id="footer_copyright" class="footer_copyright">
    <!-- Do not remove this copyright notice -->
    Powered by <a href="https://www.vbulletin.com" id="vbulletinlink">vBulletin&reg;</a> Version 4.2.2 <br />Copyright &copy; 2014 vBulletin Solutions, Inc. All rights reserved. 
    <!-- Do not remove this copyright notice -->    
</div>
<div id="footer_morecopyright" class="footer_morecopyright">
    <!-- Do not remove cronimage or your scheduled tasks will cease to function -->

    <!-- Do not remove cronimage or your scheduled tasks will cease to function -->

</div>

</div>

</body>
</html>

这是当我执行“查看”>“开发人员”>“查看源代码”时 Google Chrome 吐出的内容。 但是,当运行上面的 Java 代码时,我得到的是:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en" id="vbulletin_html">
<head>
    <meta charset="ISO-8859-1" />
<meta id="e_vb_meta_bburl" name="vb_meta_bburl" content="http://www.whalesonggames.com/oldforums" />
<base href="http://www.whalesonggames.com/oldforums/" />
<meta name="generator" content="vBulletin 4.2.2" />
<meta name="viewport" content="width=device-width, minimum-scale=1, maximum-scale=1">


        <meta name="keywords" content="android,infinite black,mmo,whalesong" />
        <meta name="description" content="Whalesong Games - Support, Wiki & Forums" />





    <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js"></script>

<script type="text/javascript">
<!--

    if (typeof jQuery === 'undefined') // Load jQuery Local
    {
        document.write('<script type="text/javascript" src="clientscript/jquery/jquery-1.6.4.min.js"><\/script>');
        var remotejquery = false;
    }
    else    // Load Rest of jquery remotely (where possible)
    {
        var remotejquery = true;
    }
    var SESSIONURL = "s=0f57ff6a3b879742a4f67d0cfea40613&";
    var SECURITYTOKEN = "guest";
    var IMGDIR_MISC = "images/misc";
    var IMGDIR_BUTTON = "images/buttons";
    var IMGDIR_MOBILE = "images/mobile";
    var vb_disable_ajax = parseInt("0", 10);
    var SIMPLEVERSION = "422";
    var BBURL = "http://www.whalesonggames.com/oldforums";
    var LOGGEDIN = 0 > 0 ? true : false;
    var THIS_SCRIPT = "printthread";
    var RELPATH = "printthread.php?t=7495&amp;pp=20&amp;page=1";
    var USER_STYLEID = "1";
    var MOBILE_STYLEID = "2";
    var MOBILE_STYLEID_ADV = "2";
    var USER_DEFAULT_STYLE_TYPE = "standard";
// -->
</script>
<script type="text/javascript" src="http://www.whalesonggames.com/oldforums/clientscript/vbulletin-mobile-init.js?v=422"></script>
<script type="text/javascript" src="http://www.whalesonggames.com/oldforums/clientscript/jquery/jquery.mobile-1.0.vb.js?v=422"></script>
<script type="text/javascript" src="http://www.whalesonggames.com/oldforums/clientscript/vbulletin-mobile.js?v=422"></script>





<link rel="stylesheet" href="clientscript/jquery/jquery.mobile-1.0.min.css?v=422" />


    <link rel="stylesheet" type="text/css" href="css.php?styleid=2&amp;langid=1&amp;d=1381351020&amp;td=ltr&amp;sheet=bbcode.css,editor.css,popupmenu.css,reset-fonts.css,vbulletin.css,vbulletin-chrome.css,vbulletin-formcontrols.css," />



    <title>The Infinite Black Forums</title>

</head>
<body>

<div data-role="page" data-theme="d" id="page-home">

<div id="header">
    <div id="header-left">
        <a href="forum.php?s=0f57ff6a3b879742a4f67d0cfea40613" class="logo-image" rel="external"><img src="images/mobile/vbulletin-logo.png" alt="The Infinite Black Forums - Powered by vBulletin" /></a>
    </div>
    <div id="header-right">


            <a href="mobile.php?s=0f57ff6a3b879742a4f67d0cfea40613&amp;do=login" class="headericon" rel="external"><img src="images/mobile/login.png" /></a>

        <a href="mobile.php?s=0f57ff6a3b879742a4f67d0cfea40613&amp;do=gridmenu" class="headericon"><img src="images/mobile/gridmenu.png" /></a>
        <a href="search.php?s=0f57ff6a3b879742a4f67d0cfea40613&amp;search_type=1&amp;contenttype=vBForum_Post" class="headericon" rel="external"><img src="images/mobile/search.png" /></a>&nbsp;<a href="http://www.whalesonggames.com/community/tib/leaderboards/" class="headericon"><img src="images/mobile/merch.png" /></a>
<a href="https://www.theinfiniteblack.com/blackdollars/" class="headericon"><img src="images/mobile/bd.png" /></a>
    </div>


</div>



<div id="pagetitle" class="pagetitle ui-bar-b">
    <h1 class="pagetitle">vBulletin Message</h1>
</div>

<div data-role="content">   
    <div class="ui-body ui-body-e">We are sorry, this content is not supported via the mobile style. <br /><a href="forum.php?s=0f57ff6a3b879742a4f67d0cfea40613" rel="external">Click Here to go to the Forum Homepage</a>.</div>
</div>

<div id="footer">

<ul id="footer_links">


        <li class="first"><a href="mobile.php?s=0f57ff6a3b879742a4f67d0cfea40613&amp;do=login">Log in</a></li>



    <li><a href="register.php?s=0f57ff6a3b879742a4f67d0cfea40613" rel="external">Register</a></li>


        <li><a href="forum.php?styleid=1" class="fullsitelink" rel="external">Full Site</a></li>


    <li class="last"><a href="#top" class="scrolltop" rel="external">Top</a></li>

</ul>

<div id="footer_copyright" class="shade footer_copyright">
    <!-- Do not remove this copyright notice -->
    Powered by <a href="https://www.vbulletin.com" id="vbulletinlink">vBulletin&reg;</a> Version 4.2.2 <br />Copyright &copy; 2014 vBulletin Solutions, Inc. All rights reserved. 
    <!-- Do not remove this copyright notice -->    
</div>
<div id="footer_morecopyright" class="shade footer_morecopyright">
    <!-- Do not remove cronimage or your scheduled tasks will cease to function -->
    <img src="http://www.whalesonggames.com/oldforums/cron.php?s=0f57ff6a3b879742a4f67d0cfea40613&amp;rand=1397183042" alt="" width="1" height="1" border="0" />
    <!-- Do not remove cronimage or your scheduled tasks will cease to function -->

</div>

</div>



</div><!-- data-role="page" -->


    <script type="text/javascript">
      var _gaq = _gaq || [];
        _gaq.push(['_setAccount', 'UA-36823542-1']);
        _gaq.push(['_trackPageview']);
        (function() {
            var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
            ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
            var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
          })();
    </script>


</body>
</html>

这不是我想要的。现在,请记住,我对网络语言及其工作原理几乎一无所知,但我想我已经发现,当浏览器加载页面时,第二个 HTML 片段会“生成”第一个 HTML 片段。如果这是错误的,请纠正我。不管怎样,有没有办法在浏览器中向用户显示之前检索 HTML 的“最终版本”?

最佳答案

您尝试打开的网站似乎无法识别默认的用户代理。

尝试在 URL 对象构造之前添加类似的内容:

System.setProperty("http.agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0");

关于java - 从链接读取生成的 HTML 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23002479/

相关文章:

python - 使用 Flask 和 render_template 显示 HTML 表格

html - 如何删除面板底部多余的空白区域

java - 如何查看 android 应用程序生成的所有线程

java - 正则表达式量词未按预期工作

java - 如何使用 Android 注释通过 Intent 获取额外数据?

java - 如何将方 block 放到方 block 面前面?

html - 如何在不增加标题实际高度的情况下增加标题中 Logo 的高度?

html - Rails 在 <%= 插入之前忽略空格

php - 使用 PHP 解析 HTML 页面以从下拉列表中获取选定的值。是否可以?

java - 获取调查问卷 JSF 的结果