我正在使用网络爬虫来获取数据。现在我有一个包含 HTML 内容的字符串,我需要获取标记之间的对象:
字符串
var targetString =
" <html lang="en" class="no-js not-logged-in "> <!--<![endif]-->
<head></head>
<body class="">Body Content
<script type="text/javascript">objectName = {foo: 5};</script>
</body>
</html>
";
如何从该字符串中取出 objectName
并将其获取到可读对象,以便我可以一致地调用“foo”?
最佳答案
尝试使用 RegExp
对 targetString
调用 .match()
/\{.*\}/
作为论据;使用 RegExp
在 .match()
返回数组的索引 0
处调用 .replace()
字符串 /(\w+)(?=:)/
,替换字符串将匹配包含在转义双引号中 "\"$1\""
;对 .replace()
返回的字符串调用 JSON.parse()
var targetString = ' <html lang="en" class="no-js not-logged-in "> <!--<![endif]-->'
+ '<head></head>'
+ '<body class="">Body Content'
+ '<script type="text/javascript">objectName = {foo: 5};<\/script>'
+ '</body>'
+ '</html>';
var objectName = JSON.parse(
targetString
// match left bracket "{" ,
// followed by any single character
// except the newline characters
// 0 or more times
// followed by right bracket "}"
.match(/\{.*\}/)[0]
// match any alphanumeric character
// 1 or more times
// set replacement string as
// captured any alphanumeric character
// wrapped within escaped double quotes
// on either side of replacement string
.replace(/(\w+)(?=:)/,"\"$1\"")
);
console.log(objectName);
document.write(JSON.stringify(objectName, null, 2));
document.write("<br>" + objectName.foo);
关于javascript - 如何过滤掉 <script> 标签之间的对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30411166/