python - 如何重新格式化此 json 以进行数据库导入?

标签 python mongodb python-3.x pymongo

我有这些“json”文件,我想将它们插入到我的 mongodb 数据库中。

其中一个例子是: http://s.live.ksmobile.net/cheetahlive/de/ff/15201023827214369775/15201023827214369775.json

问题是,它的格式如下:

   { "channelType":"TEMPGROUP", ... } # line 1
   { "channelType":"TEMPGROUP", ... } # line 2

因此,它不是将其作为 1 个文档插入到数据库中,而是将每一行作为 1 个条目插入。最终数据库中 3 个“json”文件中的 3 个文档变成了数据库中的 1189 个文档。

如何将“.json”的全部内容插入到一个文档中?

我的代码是:

replay_url = "http://live.ksmobile.net/live/getreplayvideos?"

userid = 969730808384462848

url2 = replay_url + urllib.parse.urlencode({'userid': userid}) + '&page_size=1000'

raw_replay_data = requests.get(url2).json()

for i in raw_replay_data['data']['video_info']:
    url3 = i['msgfile']
    raw_message_data = urllib.request.urlopen(url3)
    for line in raw_message_data:
        json_data = json.loads(line)
        messages.insert_one(json_data)
        print(json_data)

更新以提供更多信息以供回答

messages.insert(json_data) 给出此错误:

Traceback (most recent call last):
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/collection.py", line 633, in _insert
    blk.execute(concern, session=session)
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/bulk.py", line 432, in execute
    return self.execute_command(generator, write_concern, session)
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/bulk.py", line 329, in execute_command
    raise BulkWriteError(full_result)
pymongo.errors.BulkWriteError: batch op errors occurred

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/import_messages_dev.py", line 43, in <module>
    messages.insert(json_data)
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/collection.py", line 2941, in insert
    check_keys, manipulate, write_concern)
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/collection.py", line 635, in _insert
    _raise_last_error(bwe.details)
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/helpers.py", line 220, in _raise_last_error
    _raise_last_write_error(write_errors)
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/helpers.py", line 188, in _raise_last_write_error
    raise DuplicateKeyError(error.get("errmsg"), 11000, error)
pymongo.errors.DuplicateKeyError: E11000 duplicate key error index: liveme.messages.$_id_ dup key: { : ObjectId('5aa2fc6f5d60126499060949') }

messages.insert_one(json_data) 给我这个错误:

Traceback (most recent call last):
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/import_messages_dev.py", line 43, in <module>
    messages.insert_one(json_data)
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/collection.py", line 676, in insert_one
    common.validate_is_document_type("document", document)
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/common.py", line 434, in validate_is_document_type
    "collections.MutableMapping" % (option,))
TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping

messages.insert_many(json_data) 给了我这个错误:

Traceback (most recent call last):
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/import_messages_dev.py", line 43, in <module>
    messages.insert_many(json_data)
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/collection.py", line 742, in insert_many
    blk.execute(self.write_concern.document, session=session)
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/bulk.py", line 432, in execute
    return self.execute_command(generator, write_concern, session)
  File "/media/anon/06bcf743-8b4d-409f-addc-520fc4e19299/PycharmProjects/LiveMe/venv1/lib/python3.6/site-packages/pymongo/bulk.py", line 329, in execute_command
    raise BulkWriteError(full_result)
pymongo.errors.BulkWriteError: batch op errors occurred

messages.insert 和 messages.insert_many 都插入 1 行并抛出错误。

最佳答案

这些文件显然不包含格式正确的 json - 相反,它们每行都包含一个单独的对象。

要将它们转换为有效的 json,您可能需要一个对象列表,即:

[{ "channelType":"TEMPGROUP", ... },
 { "channelType":"TEMPGROUP", ... }]

您可以通过执行以下操作来实现此目的:

for i in raw_replay_data['data']['video_info']:
    url3 = i['msgfile']
    raw_message_data = urllib.request.urlopen(url3)
    json_data = []
    for line in raw_message_data:
        json_data.append(json.loads(line))
        messages.insert_one(json_data)
        print(json_data)

关于python - 如何重新格式化此 json 以进行数据库导入?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49198774/

相关文章:

python - 如何通过检查列表中的子级索引值来过滤 Pandas 数据帧的行?

Python Snakefood 模块导入普遍失败

mongodb - GeoNear Mongoose 和 2d 索引

node.js - 使用 switch case 更新集合中的多个文档

Python 通过 JSON 迭代数据

python - python 3.9中是否有与|=(管道相等/更新)对应的__dunder__方法?

python - 给定 scipy.stats.binned_statistic 函数......如何使用不同大小的垃圾箱?

arrays - MongoDB:使用索引更新数组中的子文档

Python名称错误: name is not defined

python - 如何将不同行的值分配给新列