python - 带有查询字符串和变量的 Scrapy

标签 python scrapy query-string

我正在努力改进 Scrapy,但我正面临着一种新的查询字符串和变量问题。

1) 查询字符串似乎需要两个输入(storeInRadiusQuery 和缓存): Here is the request headers with the API url

2) 当我要去 Params ,我有 2 个以 JSON 格式分组的查询字符串。在这个 JSON 中,有 3 个键(operationName、query 和 variables)。

在其他 scrapy 项目中,查询更容易格式化,但在这里我不知道如何用变量处理这个问题。

我尝试了 Formdata scrapy 方法但没有成功:

data = {
        "operationName":"storeInRadiusQuery",
        "variables":{"currentLocation":"50.4376478855132,2.82123986359978","service":[],"storeChain":[],"deliveryTypes":[],"date":[],"__typename":"storeLocatorFilters"},
        "query":"query storeInRadiusQuery($currentLocation: String!, $service: [String], $storeChain: [String], $deliveryTypes: [String], $date: [String]) {\n  viewer {\n    storesInRadius(currentLocation: $currentLocation, services: $service, storeChaine: $storeChain, deliveryTypes: $deliveryTypes, date: $date, radius: 20, isStoreLocator: true) {\n      source {\n        ...StoresMapStoreItemType\n        ...StoreLocatorList\n        store_location\n        sort\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n\nfragment StoreLocatorList on StoreItemType {\n  store_id\n  store_name\n  street\n  zip_code\n  city\n  seo_url\n  day_0\n  day_0_morning_open_time\n  day_0_morning_close_time\n  day_0_afternoon_open_time\n  day_0_afternoon_close_time\n  day_1\n  day_1_morning_open_time\n  day_1_morning_close_time\n  day_1_afternoon_open_time\n  day_1_afternoon_close_time\n  day_2\n  day_2_morning_open_time\n  day_2_morning_close_time\n  day_2_afternoon_open_time\n  day_2_afternoon_close_time\n  day_3\n  day_3_morning_open_time\n  day_3_morning_close_time\n  day_3_afternoon_open_time\n  day_3_afternoon_close_time\n  day_4\n  day_4_morning_open_time\n  day_4_morning_close_time\n  day_4_afternoon_open_time\n  day_4_afternoon_close_time\n  day_5\n  day_5_morning_open_time\n  day_5_morning_close_time\n  day_5_afternoon_open_time\n  day_5_afternoon_close_time\n  day_6\n  day_6_morning_open_time\n  day_6_morning_close_time\n  day_6_afternoon_open_time\n  day_6_afternoon_close_time\n  __typename\n}\n\nfragment StoresMapStoreItemType on StoreItemType {\n  store_id\n  store_name\n  store_location\n  zip_code\n  street\n  city\n  seo_url\n  __typename\n}\n"}

    url = "https://www.monoprix.fr/api/graphql?storeInRadiusQuery&cache"

    yield scrapy.FormRequest(url,
                                method='POST', 
                                body=json.dumps(data), 
                                headers={'Content-Type':'application/json'},
                                callback=self.parse)

我看过 this post关于如何处理查询字符串,但我不知道如何正确放置查询字符串字典。

这里我想尝试修改当前位置和半径参数,以找到商店列表。

如果你有任何想法..谢谢!

最佳答案

以下链接显示了如何正确复制 Graphql 请求。 https://scrapfly.io/blog/web-scraping-graphql-with-python/
在 Scrapy 中完成此操作与上面的链接中所示类似。

query = """
       Just copy the query from browser developer tools and paste it here. 
       Remove any newline(\n) and format it properly.
        """

json_data = {
            "query": query,
            'variables': {
                "variable1": abc,
                "variable2": abc,
                "variable2": "abc"
            }
        }

yield scrapy.Request(url=url, method='POST',
                             body=json.dumps(json_data),
                             headers={
                                 'content-type': 'application/json'
                             })

关于python - 带有查询字符串和变量的 Scrapy,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61248872/

相关文章:

python - 运行 Scrapy 但出现错误 : No module named _util

javascript - 未发送 XMLHttpRequest 发布数据

python - 将 astropy.table.columns 转换为 numpy 数组

python - 发电机产量因子按升序排列(初学者)

python - 从脚本运行时,Scrapy 蜘蛛结果无法通过管道传输到数据库中

eclipse - Scrapy + Eclipse PyDev : how to setup the debugger?

javascript - 使用逗号分隔值巧妙地获取查询字符串

http - 什么是有效的 URL 查询字符串?

python - 如何重写 django 模型删除方法以在删除时保存几个字段

python - 将 Django 项目移植到 1&1 共享主机 Web 服务器