python - 在Python中读取4.8GB Json文件

import json

with open("reverseURL.json") as file:
    file2 = json.load(file)

eagle = file2["eagle"]

sky = file2["sky"]

eagleAndSky = set(eagle).intersection(sky)

print(eagleAndSky.pop())

print(eagleAndSky.pop())

我尝试使用 4.8 GBs 的 json 文件运行此代码，但每次运行它时，它都会卡住我的计算机，我不知道该怎么办。 json 文件包含在照片中用作键的标签，对于属性，它们是包含该标签的图像 url。当我在从测试和验证集创建的 json 文件上运行它时，该程序可以工作，因为它们很小，但是当我在训练集的 json 文件上运行它时，它会卡住我的计算机，因为该文件很大，如 4.8gb。

最佳答案

最简单的答案是获得更多 RAM。获得足够的空间来保存已解析的 JSON，这样您就成为了两组，并且您的算法将再次变得快速。

如果无法购买更多 RAM，您将需要设计一种不那么消耗内存的算法。第一步考虑使用类似 ijson 的 JSON 解析器。。这将允许您只在内存中存储您关心的文件片段。假设您在 eagle 和 sky 中有很多重复项，单独执行此步骤可能会减少内存使用量，从而再次加快速度。下面是一些代码来说明，您必须运行 pip install ijson 才能运行它:

from ijson import items

eagle = set()
sky = set()
with open("reverseURL.json") as file:
    for o in items(file, "eagle"):
        eagle.update(o)
    # Read the file again
    file.seek(0)
    for o in items(file, "sky"):
        sky.update(o)

eagleAndSky = eagle.intersection(sky)

如果使用 ijson 将 json 解析为 steam 无法充分降低内存使用量，则必须将临时状态存储在磁盘上。 Python sqlite3 模块非常适合此类工作。您可以创建一个临时文件数据库，其中包含 eagle 表和 sky 表，将所有数据插入到每个表中，添加唯一索引以删除重复数据(并加快下一步的查询速度)，然后加入表来获得你的交集。这是一个例子:

import os
import sqlite3
from tempfile import mktemp
from ijson import items

db_path = mktemp(suffix=".sqlite3")
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("create table eagle (foo text unique)")
c.execute("create table sky (foo text unique)")
conn.commit()

with open("reverseURL.json") as file:
    for o in items(file, "eagle.item"):
        try:
            c.execute("insert into eagle (foo) values(?)", o)
        except sqlite3.IntegrityError:
            pass  # this is expected on duplicates
    file.seek(0)
    for o in items(file, "sky.item"):
        try:
            c.execute("insert into sky (foo) values(?)", o)
        except sqlite3.IntegrityError:
            pass  # this is expected on duplicates

conn.commit()

resp = c.execute("select sky.foo from eagle join sky on eagle.foo = sky.foo")
for foo, in resp:
    print(foo)

conn.close()
os.unlink(db_path)

关于python - 在Python中读取4.8GB Json文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46045020/

python - 在Python中读取4.8GB Json文件

上一篇：Python "while x:"语句

下一篇：python - 提取和解析 pandas 数据框中的日期