java - Jsoup 从网络浏览器返回不同的输出

标签 java json string jsoup

我有这个 API 需要解析。

https://data.studentedge.com.au/api/comments/getpage?page=1&sort=Oldest&url=%2Fforums%2Fdetails%2Fany-surfers-out-there

当我使用网络浏览器浏览时(启用或不启用 JavaScript) 它返回这个:

{"Items":[{"CommentBody":"<p>I ride a 5'9 and am from the mid north coast</p>\r\n\r\n","MemberName":"Jack F","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/21450f07-ddcc-4f19-8cba-296f22e84ee1.jpeg","PostDate":"2016-09-03T01:38:38+00:00","CommentId":"f1c50066-69b3-4a92-bc0c-a676001b174f","ParentId":null,"PosterId":"28936bc3-f705-45d6-8f94-a5b0004585c6","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":0,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false},{"CommentBody":"<p>I surf everyday on Google Chrome - SA here ;)</p>\r\n\r\n","MemberName":"Bryan A","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/02a713ee-2ca1-4029-85f8-314878386621.png","PostDate":"2016-09-09T10:36:47+00:00","CommentId":"689460a2-4b02-4ca7-851c-a67c00aee6ab","ParentId":null,"PosterId":"5192fcf7-703b-4f78-b6fd-a3a000427119","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":1,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false},{"CommentBody":"<p>Same... Chrome's the only thing I surf....</p>\r\n<p>My mate goes 5'10&quot; and also snowboards...</p>\r\n\r\n","MemberName":"Sandy S","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/6542e863-3f04-496d-b4aa-d6adcb16ca39.jpg","PostDate":"2016-09-09T10:51:40+00:00","CommentId":"9479d9f2-845a-48a5-8d28-a67c00b2fcd7","ParentId":"689460a2-4b02-4ca7-851c-a67c00aee6ab","PosterId":"165dc3d0-9e3d-484f-b5be-a3a100cfc691","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":0,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false}],"PageNumber":1,"Order":"Oldest"}

这是完美的 JSON。 但是当我使用 Jsoup 时它会返回。

<html> <head></head> <body>  {"Items":[{"CommentBody":"  <p>I ride a 5'9 and am from the mid north coast</p>\r\n\r\n","MemberName":"Jack F","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/21450f07-ddcc-4f19-8cba-296f22e84ee1.jpeg","PostDate":"2016-09-03T01:38:38+00:00","CommentId":"f1c50066-69b3-4a92-bc0c-a676001b174f","ParentId":null,"PosterId":"28936bc3-f705-45d6-8f94-a5b0004585c6","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":0,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false},{"CommentBody":"  <p>I surf everyday on Google Chrome - SA here ;)</p>\r\n\r\n","MemberName":"Bryan A","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/02a713ee-2ca1-4029-85f8-314878386621.png","PostDate":"2016-09-09T10:36:47+00:00","CommentId":"689460a2-4b02-4ca7-851c-a67c00aee6ab","ParentId":null,"PosterId":"5192fcf7-703b-4f78-b6fd-a3a000427119","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":1,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false},{"CommentBody":"  <p>Same... Chrome's the only thing I surf....</p>\r\n  <p>My mate goes 5'10" and also snowboards...</p>\r\n\r\n","MemberName":"Sandy S","AvatarUrl":"https://studentedgeapplication.blob.core.windows.net/profiles/6542e863-3f04-496d-b4aa-d6adcb16ca39.jpg","PostDate":"2016-09-09T10:51:40+00:00","CommentId":"9479d9f2-845a-48a5-8d28-a67c00b2fcd7","ParentId":"689460a2-4b02-4ca7-851c-a67c00aee6ab","PosterId":"165dc3d0-9e3d-484f-b5be-a3a100cfc691","Status":"Approved","CurrentMemberComment":false,"UpvoteCount":0,"MemberHasUpvoted":false,"PageUrl":"/forums/details/any-surfers-out-there","IsModerator":false}],"PageNumber":1,"Order":"Oldest"} </body></html>

JSOUP代码:

Document doc = Jsoup.connect(baseUrl + keyword)
            .followRedirects(true)
            .ignoreContentType(true)
            .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0")
            .header("Accept-Encoding", "gzip, deflate")
            .header("Accept-Language", "en-US,en;q=0.5")
            .header("Host", "data.studentedge.com.au")
            .header("Origin", "https://studentedge.com.au")
            .header("Referer", "https://studentedge.com.au/forums/details/any-surfers-out-there")
            .get();
    String result = doc.html();

注意:如果我使用 doc.text() 它会以某种方式破坏 json。

最佳答案

使用executebody获取原始数据:

    String result = Jsoup.connect(baseUrl + keyword)
            .followRedirects(true)
            .ignoreContentType(true)
            .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0")
            .header("Accept-Encoding", "gzip, deflate")
            .header("Accept-Language", "en-US,en;q=0.5")
            .header("Host", "data.studentedge.com.au")
            .header("Origin", "https://studentedge.com.au")
            .header("Referer", "https://studentedge.com.au/forums/details/any-surfers-out-there")
            .execute().body();

关于java - Jsoup 从网络浏览器返回不同的输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39555567/

相关文章:

java - 如何在 Controller 中将 RequestBody 定义为列表,我收到 500 错误

c# - 在 MVC c# 中准备自定义 json 格式返回

c++ - 字符串/数组情况 c++

javascript - 如何返回数组中最长的字符串 - 即使数组包含不是字符串的项目?

python - 使用正则表达式分隔符拆分字符串,除非分隔符被转义

java - 如何将 Rserve 的结果返回到 Apache Camel 的变量中

java - 在 Java 中检索服务器端 HTTP 错误消息

java - 使用 java 修复格式错误的 XML

json - 如何成功将 HttpClient 的 JSON 响应内容绑定(bind)到模型并从后端 REST API 以 JSON 形式返回该模型?

java - 如何使用 org.json.simple 解析没有根元素的 JSON 文件?