javascript - 使用 Json 和 BS4 抓取 HTML 中的脚本标签

标签 javascript python json web-scraping beautifulsoup

我希望能够从网页上抓取此链接的代码 https://secure.ewaypayments.com/sharedpage/sharedpayment?AccessCode=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx==

我目前正在 python 中使用 json 和 bs4。

整页源码 https://pastebin.com/iU5c9GBF

<div class="Actions">
                <input class="action" type="submit" id="submit-button" value="Place Order" title="Place Order" onclick="return showModal()" disabled="disabled" />
              <input type="hidden" id="EWAY_TransactionID" name="EWAY_TransactionID" value="" />
              <script src="https://secure.ewaypayments.com/scripts/eCrypt.js"> </script>
              <script type="text/javascript">
                var eWAYConfig = {
                  sharedPaymentUrl: "https://secure.ewaypayments.com/sharedpage/sharedpayment?AccessCode=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=="
                };
                function showModal()
                {
                  // verify captcha

                  // show modal
                  return eCrypt.showModalPayment(eWAYConfig, resultCallback);
                }
                function resultCallback(result, transactionID, errors) {
                  if (result == "Complete") {
                    document.getElementById("EWAY_TransactionID").value = transactionID;
                    document.getElementById("Form_PaymentForm").submit();
                    //Please wait until we process your order, James at 9/10/2017
                    document.getElementById("overlay").style.display = "block";
                  }
                  else if (errors != "")
                  {
                    alert("There was a problem completing the payment: " + errors);
                  }
                }
              </script>

以前使用过的代码

s = requests.session()
orderurl = s.get('https://www.supplystore.com.au/shop/checkout/submit.aspx')
soup = bs(orderurl.text, 'html.parser')
find = soup.find("div", {"class": "Actions"}).find("script")[1]

最佳答案

您不能使用 BeautifulSoup 来解析 Javascript 数据,但您可以使用 re 模块(data 是您的 HTML 代码):

import re    
from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')    
txt = soup.select('.Actions script')[1].text

print(re.search(r'sharedPaymentUrl:\s*"(.*?)"', txt)[1])

打印:

https://secure.ewaypayments.com/sharedpage/sharedpayment?AccessCode=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx==

关于javascript - 使用 Json 和 BS4 抓取 HTML 中的脚本标签,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56611864/

相关文章:

python - 如何更新 python 配置文件中的现有部分? (Python 3.6.6)

javascript - AJAX 响应 : sugestions for JSON format?

javascript - 向左/向右滚动带有动画的图像

javascript - Twitter个人资料图片 uploader 如何在客户端显示图片预览

javascript - javascript中关于参数的奇怪语法

python - 在python中使用装饰器时出现位置参数错误

Python 日志记录 : How can I determine when a handler was added?

python - SimpleJSON 和 NumPy 数组

json - 写入本地 json 文件 dart flutter

javascript - 显示注册模式窗口然后立即登录模式