python - 在 Python 中抓取字符串的元素 <script>

标签 python html web-scraping beautifulsoup lxml

目前正在尝试检查此款 PAGE 的小号库存。 (即0)但专门从此数据中检索尺寸较小的库存:

<script>
(function($) { 
  var variantImages = {},
    thumbnails,
    variant,
    variantImage;





       variant = {"id":18116649221,"title":"XS","option1":"XS","option2":null,"option3":null,"sku":"BGT16073100","requires_shipping":true,"taxable":true,"featured_image":null,"available":true,"name":"Iron Lords T-Shirt - XS","public_title":"XS","options":["XS"],"price":2499,"weight":136,"compare_at_price":null,"inventory_quantity":16,"inventory_management":"shopify","inventory_policy":"deny","barcode":""};
       if ( typeof variant.featured_image !== 'undefined' && variant.featured_image !== null ) {
         variantImage =  variant.featured_image.src.split('?')[0].replace('http:','');
         variantImages[variantImage] = variantImages[variantImage] || {};



           if (typeof variantImages[variantImage]["option-0"] === 'undefined') {
             variantImages[variantImage]["option-0"] = "XS";
           }
           else {
             var oldValue = variantImages[variantImage]["option-0"];
             if ( oldValue !== null && oldValue !== "XS" )  {
               variantImages[variantImage]["option-0"] = null;
             }
           }

       }










       variant = {"id":18116649285,"title":"Small","option1":"Small","option2":null,"option3":null,"sku":"BGT16073110","requires_shipping":true,"taxable":true,"featured_image":null,"available":false,"name":"Iron Lords T-Shirt - Small","public_title":"Small","options":["Small"],"price":2499,"weight":159,"compare_at_price":null,"inventory_quantity":0,"inventory_management":"shopify","inventory_policy":"deny","barcode":""};
       if ( typeof variant.featured_image !== 'undefined' && variant.featured_image !== null ) {
         variantImage =  variant.featured_image.src.split('?')[0].replace('http:','');
         variantImages[variantImage] = variantImages[variantImage] || {};



           if (typeof variantImages[variantImage]["option-0"] === 'undefined') {
             variantImages[variantImage]["option-0"] = "Small";
           }
           else {
             var oldValue = variantImages[variantImage]["option-0"];
             if ( oldValue !== null && oldValue !== "Small" )  {
               variantImages[variantImage]["option-0"] = null;
             }
           }

       }

如何告诉 python 找到 <script>标签,然后是具体的 "inventory_quantity":0要退回小号产品的库存吗?

最佳答案

您可以使用正则表达式找到它:

s = 'some sample text in which "inventory_quantity":0 appears'
occurences = re.findall('"inventory_quantity":(\d+)', s)
print(occurences[0])
'0'
<小时/>

编辑: 我想你可以得到 <script>...</script> 的全部内容在变量 t 中( lxmlxml.etreebeautifulsoup 或简单地 re )。

在开始之前,让我们定义一些变量:

true = True
null = None

然后使用正则表达式查找字典作为文本并转换为 dict通过eval

r = re.findall('variant = (\{.*}?);', t)

if r:
    variant = eval(r)

这就是你得到的:

>>> variant
{'available': True,
 'barcode': '',
 'compare_at_price': None,
 'featured_image': None,
 'id': 18116649221,
 'inventory_management': 'shopify',
 'inventory_policy': 'deny',
 'inventory_quantity': 16,
 'name': 'Iron Lords T-Shirt - XS',
 'option1': 'XS',
 'option2': None,
 'option3': None,
 'options': ['XS'],
 'price': 2499,
 'public_title': 'XS',
 'requires_shipping': True,
 'sku': 'BGT16073100',
 'taxable': True,
 'title': 'XS',
 'weight': 136}

现在您可以轻松获取所需的任何信息。

关于python - 在 Python 中抓取字符串的元素 &lt;script&gt;,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41417058/

相关文章:

python - 我可以用 Scrapy 填写网页表单吗?

javascript - 数组索引在 $.get 中不起作用

python - 使用 Python 解析来自 AWS SDK 的 Cloudformation 字符串响应

html - CSS SVG 文本阴影位于文本之上

python - 如何使用字符串操作在 python 中创建一个 n x n 的 X 框? (例如,替换、计数、查找、len 等)

java - 在Java中解析包含JS的HTML页面

javascript - knockout 检查值

python - 如何使用BS4中的find all方法来抓取某些字符串

java - 使用 Amazon Athena 从 S3 读取多个 json 文件

python - 为什么 __init__ 在创建对象之前被调用?