python - 我在抓取的 JSON 中遇到 KeyError

标签 python json beautifulsoup keyerror

我从网站上抓取了一个 JSON。当尝试迭代 JSON 时,我收到 KeyError,但我不确定原因。循环在 JSON 的长度范围内。对于发生的事情有什么想法吗?

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"
}
url = "https://employment.ucsd.edu/jobs?page_size=250&page_number=1&keyword=clinical%20lab%20scientist&location_city" \
      "=Remote&location_city=San%20Diego&location_city=Encinitas&location_city=Murrieta&location_city=La%20Jolla" \
      "&location_city=Not%20Specified&location_city=Vista&sort_by=score&sort_order=DESC "
request = requests.get(url, headers=headers)
response = BeautifulSoup(request.text, "html.parser")
all_data = response.find_all("script", {"type": "application/ld+json"})
df = pd.DataFrame(columns=("Title", "Department", "Salary Range", "Appointment Percent", "URL"))

for data in all_data:
    jsn = json.loads(data.string)
    jsn_length = len(jsn['itemListElement'])
    # print(json.dumps(jsn, indent=4))
    n = 0
    while n < jsn_length:
        # print(jsn['itemListElement'][n])
        print(n)
        df['URL'] = jsn['itemListElement'][n]
        n += 1

编辑:回复

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm 2022.1\plugins\python\helpers\pydev\pydevd.py", line 1491, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm 2022.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/Will/PycharmProjects/UCSD_JOB_SCRAPE/main.py", line 19, in <module>
    jsn_length = len(jsn['itemListElement'])
KeyError: 'itemListElement'

最佳答案

您引用的 JSON 中的元素号 250 似乎确实没有 itemListElement 键:

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "url": "https://health.ucsd.edu/",
  "logo": "https://dy5f5j6i37p1a.cloudfront.net/company/logos/157272/original/b228c5f9007911ecb905ed1c0f90d00e.png",
  "name": "UC San Diego "
}

最安全的事情可能是明确地检查它。例如:

for data in all_data:
    jsn = json.loads(data.string)
    if jsn.get('itemListElement') is None:
        print('No itemListElement in the JSON. The JSON is\n' + data.string)
    else:
        jsn_length = len(jsn['itemListElement'])
        n = 0
        while n < jsn_length:
            # print(jsn['itemListElement'][n])
            print(n)
            df['URL'] = jsn['itemListElement'][n]
            n += 1

关于python - 我在抓取的 JSON 中遇到 KeyError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74924762/

相关文章:

python - Django Rest 框架 JWT "Authentication credentials were not provided."}

java - 带有 Jython 的 Eclipse 不理解 Java 导入

python - 随着数据框的变化更新字典键/值对

javascript - 从 JSON 数据生成无序列表?

python - 美汤 4 : AttributeError: NoneType has no attribute find_next

python - 使用 selenium 抓取 Instagram 粉丝

python - 在 Python 中旋转多个重复列

java - 如何从服务器向客户端发送数据

java - Jackson 解析器不会因明显错误的 json 而因 JsonParseException 失败

python - 提取标签之间的 HTML