我正在使用名为 Kanna 的解析库获取网页的 HTML 代码。 。基本上精简版看起来像这样。
<!DOCTYPE html>
<html lang="en" class="no-js not-logged-in client-root">
<head>
<meta charset="utf-8">
</head>
<body>
<script type="text/javascript">
window._sharedData = {
// Some JSON
};
</script>
<script type="text/javascript">
// Javascript code
</script>
<script type="text/javascript">
// More Javascript code
</script>
</body>
</html>
body
内有多个 script
标记。我想访问名为 window._sharedData
的变量并提取它的值(JSON 字典)。
我尝试使用正则表达式,但它返回nil
。也许我的模式有问题?
if let doc = try? HTML(url: mixURL, encoding: .utf8), let body = doc.body, let htmlText = body.text {
let range = NSRange(location: 0, length: htmlText.utf8.count)
let regex = try! NSRegularExpression(pattern: "/<script type=\"text/javascript\">window._sharedData = (.*)</script>/")
let s = regex.firstMatch(in: htmlText, options: [], range: range)
print(s)
}
或者有更好的方法吗?
最佳答案
这里是:
import Foundation
import Kanna
let htmlString = "<!DOCTYPE html><html lang=\"en\" class=\"no-js not-logged-in client-root\"><head> <meta charset=\"utf-8\"></head><body> <script type=\"text/javascript\"> window._sharedData = { \"string\": \"Hello World\" }; </script> <script type=\"text/javascript\"> </script> <script type=\"text/javascript\"> </script></body></html>"
guard let doc = try? HTML(html: htmlString, encoding: .utf8) else { print("Build DOM error"); exit(0) }
let body = doc.xpath("//script")
.compactMap { $0.text }
.filter { $0.contains("window._sharedData") }
.map { $0.replacingOccurrences(of: " window._sharedData = ", with: "") }
.map { $0.dropLast(2) }
.first
print("body: ", body)
// body: Optional("{ \"string\": \"Hello World\" }")
之后你可以检查 body 不为零并且准备好
关于javascript - 从 HTML 中提取 JavaScript 代码中的变量值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54230881/