我从网站上抓取了一个 JSON。当尝试迭代 JSON 时,我收到 KeyError
,但我不确定原因。循环在 JSON 的长度范围内。对于发生的事情有什么想法吗?
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"
}
url = "https://employment.ucsd.edu/jobs?page_size=250&page_number=1&keyword=clinical%20lab%20scientist&location_city" \
"=Remote&location_city=San%20Diego&location_city=Encinitas&location_city=Murrieta&location_city=La%20Jolla" \
"&location_city=Not%20Specified&location_city=Vista&sort_by=score&sort_order=DESC "
request = requests.get(url, headers=headers)
response = BeautifulSoup(request.text, "html.parser")
all_data = response.find_all("script", {"type": "application/ld+json"})
df = pd.DataFrame(columns=("Title", "Department", "Salary Range", "Appointment Percent", "URL"))
for data in all_data:
jsn = json.loads(data.string)
jsn_length = len(jsn['itemListElement'])
# print(json.dumps(jsn, indent=4))
n = 0
while n < jsn_length:
# print(jsn['itemListElement'][n])
print(n)
df['URL'] = jsn['itemListElement'][n]
n += 1
编辑:回复
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2022.1\plugins\python\helpers\pydev\pydevd.py", line 1491, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2022.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/Will/PycharmProjects/UCSD_JOB_SCRAPE/main.py", line 19, in <module>
jsn_length = len(jsn['itemListElement'])
KeyError: 'itemListElement'
最佳答案
您引用的 JSON 中的元素号 250 似乎确实没有 itemListElement
键:
{
"@context": "https://schema.org",
"@type": "Organization",
"url": "https://health.ucsd.edu/",
"logo": "https://dy5f5j6i37p1a.cloudfront.net/company/logos/157272/original/b228c5f9007911ecb905ed1c0f90d00e.png",
"name": "UC San Diego "
}
最安全的事情可能是明确地检查它。例如:
for data in all_data:
jsn = json.loads(data.string)
if jsn.get('itemListElement') is None:
print('No itemListElement in the JSON. The JSON is\n' + data.string)
else:
jsn_length = len(jsn['itemListElement'])
n = 0
while n < jsn_length:
# print(jsn['itemListElement'][n])
print(n)
df['URL'] = jsn['itemListElement'][n]
n += 1
关于python - 我在抓取的 JSON 中遇到 KeyError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74924762/