我从 Lobbyview 收到了 json 格式的响应。我尝试将其放入数据框中以仅访问某些变量,但没有成功。如何以可导出到 .dta 的格式仅访问某些变量,例如 id 和委员会?这是我尝试过的代码。
import requests, json
query = {"naics": "424430"}
results = requests.post('https://www.lobbyview.org/public/api/reports',
data = json.dumps(query))
print(results.json())
import pandas as pd
b = pd.DataFrame(results.json())
_id = data["_id"]
committee = data["_source"]["specific_issues"][0]["bills_by_algo"][0]["committees"]
对 json 的观察如下所示:
"_score": 4.421936,
"_type": "object",
"_id": "5EZUMbQp3hGKH8Uq2Vxuke",
"_source":
{
"issue_codes": ["CPT"],
"received": 1214320148,
"client_name": "INTELLECTUAL PROPERTY OWNERS ASSOCIATION",
"amount": 240000,
"client":
{
"legal_name": "INTELLECTUAL PROPERTY OWNERS ASSOCIATION",
"name": "INTELLECTUAL PROPERTY OWNERS ASSOCIATION",
"naics": null,
"gvkey": null,
"ticker": "Unlisted",
"id": null,
"bvdid": "US131283992L"},
"specific_issues": [
{
"text": "H.R. 34, H.R. 1908, H.R. 2336, H.R. 3093 S. 522, S. 681, S. 1145, S. 1745",
"bills_by_algo": [
{
"titles": ["To amend title 35, United States Code, to provide for patent reform.", "Patent Reform Act of 2007", "Patent Reform Act of 2007", "Patent Reform Act of 2007"],
"top_terms": ["Commerce", "Administrative fees"],
"sponsor":
{
"firstname": "Howard",
"district": 28,
"title": "rep",
"id": 400025
},
"committees": ["House Judiciary"],
"introduced": 1176868800,
"type": "HR", "id": "110_HR1908"},
{
"titles": ["To amend title 35, United States Code, relating to the funding of the United States Patent and Trademark Office."],
"top_terms": ["Commerce", "Administrative fees"],
"sponsor":
{
"firstname": "Howard",
"district": 28,
"title": "rep",
"id": 400025
},
"committees": ["House Judiciary"],
"introduced": 1179288000,
"type": "HR",
"id": "110_HR2336"
}],
"gov_entities": ["U.S. House of Representatives", "Patent and Trademark Office (USPTO)", "U.S. Senate", "UNDETERMINED", "U.S. Trade Representative (USTR)"],
"lobbyists": ["Valente, Thomas Silvio", "Wamsley, Herbert C"],
"year": 2007,
"issue": "CPT",
"id": "S4nijtRn9Q5NACAmbqFjvZ"}],
"year": 2007,
"is_latest_amendment": true,
"type": "MID-YEAR AMENDMENT",
"id": "1466CDCD-BA3D-41CE-B7A1-F9566573611A",
"alternate_name": "INTELLECTUAL PROPERTY OWNERS ASSOCIATION"
},
"_index": "collapsed"}```
最佳答案
由于您指定的数据在 JSON 响应中嵌套得相当深,因此您必须循环遍历它并将其临时保存到列表中。为了更好地理解响应数据,我建议您使用一些工具来查看 JSON 结构,例如 online JSON-Viewer 。并非 JSON 中的每个条目都包含必要的数据,因此我尝试通过 try
和 except
捕获错误。为了确保 id
和 committees
正确匹配,我选择将它们作为小字典添加到列表中。然后可以轻松地将这个列表读入 Pandas 中。保存到 .dta
需要您将 committees
列内的列表转换为字符串,而您可能还想另存为 .csv
以获取更通用的格式。
import requests, json
import pandas as pd
query = {"naics": "424430"}
results = requests.post(
"https://www.lobbyview.org/public/api/reports", data=json.dumps(query)
)
json_response = results.json()["result"]
# to save the JSON response
# with open("data.json", "w") as outfile:
# json.dump(results.json()["result"], outfile)
resulting_data = []
# loop through the response
for data in json_response:
# try to find entries with specific issues, bills_by_algo and committees
try:
# loop through the special issues
for special_issue in data["specific_issues"]:
_id = special_issue["id"]
# loop through the bills_by_algo's
for x in special_issue["bills_by_algo"]:
# append the id and committees in a dict
resulting_data.append(({"id": _id, "committees": x["committees"]}))
except KeyError as e:
print(e, "not found in entry.")
continue
# create a DataFrame
df = pd.DataFrame(resulting_data)
# export of list objects in the column is not supported by .dta, therefore we convert
# to strings with ";" as delimiter
df["committees"] = ["; ".join(map(str, l)) for l in df["committees"]]
print(df)
df.to_stata("result.dta")
结果
id committees
0 D8BxG5664FFb8AVc6KTphJ House Judiciary
1 D8BxG5664FFb8AVc6KTphJ Senate Judiciary
2 8XQE5wu3mU7qvVPDpUWaGP House Agriculture
3 8XQE5wu3mU7qvVPDpUWaGP Senate Agriculture, Nutrition, and Forestry
4 kzZRLAHdMK4YCUQtQAdCPY House Agriculture
.. ... ...
406 ZxXooeLGVAKec9W2i32hL5 House Agriculture
407 ZxXooeLGVAKec9W2i32hL5 Senate Agriculture, Nutrition, and Forestry; H...
408 ZxXooeLGVAKec9W2i32hL5 House Appropriations; Senate Appropriations
409 ahmmafKLfRP8wZay9o8GRf House Agriculture
410 ahmmafKLfRP8wZay9o8GRf Senate Agriculture, Nutrition, and Forestry
[411 rows x 2 columns]
关于python - 如何使用Python从Json文件的嵌套列表中提取对象?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59979053/