node.js - MongoDB的filemd5是否能够设置readPreference

标签 node.js mongodb meteor database-replication gridfs

我有一个在 Node/Meteor 中构建的文件存储服务,它利用 GridFS,并且可以跨多个容器进行复制。我目前正在尝试查找的是,这段代码是否确实意识到读/写一致性

db.command({
  filemd5: someFileId,
  root: 'fs'
}, function callback(err, results) {
  ...
})

我正在分块上传文件,并将所有 block 合并到一个文件中后执行该命令。我有一种感觉,它正在使用次要成员(我有几个 md5 值,它们是空文件 - d41d8cd98f00b204e9800998ecf8427e)。有相关文档或其他设置吗?

这 2 个参数是文档中描述的唯一选项.. https://docs.mongodb.com/manual/reference/command/filemd5/

更新
合并 block 的确切代码位于第 3 方包中:

         cursor = files.find(
            {
               'metadata._Resumable.resumableIdentifier': file.metadata._Resumable.resumableIdentifier
               length:
                  $ne: 0
            },
            {
               fields:
                  length: 1
                  metadata: 1
               sort:
                  'metadata._Resumable.resumableChunkNumber': 1
            }
         )

https://github.com/vsivsi/meteor-file-collection/blob/master/src/resumable_server.coffee#L26

然后是第 111-119 行,首先执行 filemd5,然后对文件运行更新

                @db.command md5Command, (err, results) ->
                   if err
                      lock.releaseLock()
                      return callback err
                   # Update the size and md5 to the file data
                   files.update { _id: fileId }, { $set: { length: file.metadata._Resumable.resumableTotalSize, md5: results.md5 }},
                      (err, res) =>
                         lock.releaseLock()
                         callback err

https://github.com/vsivsi/meteor-file-collection/blob/master/src/resumable_server.coffee#L111-L119

写入最后一个 block 后,cursor = files.find() 会与所有合并内容一起启动,因此如果读取首选项为 secondaryPreferred 那么它们可能不会仍然存在在吗?是否应该重构该代码以仅使用主要代码?

最佳答案

GridFS 创建 2 个集合:fileschunks .

典型的files条目如下所示:

{
    "_id" : ObjectId("58cfbc8b6900bb31c7b1b8d9"),
    "length" : 4,
    "chunkSize" : 261120,
    "uploadDate" : ISODate("2017-03-20T11:27:07.812Z"),
    "md5" : "d3b07384d113edec49eaa6238ad5ff00",
    "filename" : "foo.txt"
}

filemd5管理命令应该简单地返回 md5相关文件文档的字段(以及 block 的数量)。

files.md5
An MD5 hash of the complete file returned by the filemd5 command. This value has the String type.

source: GridFS docs

它应该代表完整文件的哈希值,或者至少代表最初保存的哈希值。

What is the ‘md5’ field of a files collection document and how is it used?
‘md5’ holds an MD5 checksum that is computed from the original contents of a user file. Historically, GridFS did not use acknowledged writes, so this checksum was necessary to ensure that writes went through properly. With acknowledged writes, the MD5 checksum is still useful to ensure that files in GridFS have not been corrupted. A third party directly accessing the 'files' and ‘chunks’ collections under GridFS could, inadvertently or maliciously, make changes to documents that would make them unusable by GridFS. Comparing the MD5 in the files collection document to a re-computed MD5 allows detecting such errors and corruption. However, drivers now assume that the stored file is not corrupted, and applications that want to use the MD5 value to check for corruption must do so themselves.

source: GridFS spec

如果以这样的方式更新,驱动程序的 mongoc_gridfs_file_save不使用(例如,流式传输),md5字段不会更新。

Actually, further reading the spec:

Why store the MD5 checksum instead of creating the hash as-needed? The MD5 checksum must be computed when a file is initially uploaded to GridFS, as this is the only time we are guaranteed to have the entire uncorrupted file. Computing it on-the-fly as a file is read from GridFS would ensure that our reads were successful, but guarantees nothing about the state of the file in the system. A successful check against the stored MD5 checksum guarantees that the stored file matches the original and no corruption has occurred.

这就是我们正在做的事情。只有 mongoc_gridfs_file_save 会计算文件的 md5 和并存储它。任何其他入口点(例如流)都期望用户已创建所有支持的 mongoc_gridfs_file_opt_t 并正确计算 md5

来源:JIRA issue

关于node.js - MongoDB的filemd5是否能够设置readPreference,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42900479/

相关文章:

node.js - Webpack 传输轮询错误

node.js - 如何在更新 Mongoose 上推送对象数组

javascript - 在具有许多连接的数据库中快速分配游戏管理

javascript - Meteor.js 和 Showdown 扩展 - 如何将表扩展添加到渲染器/转换器?

带有 Google OAuth 2.0 的 Nginx 代理

node.js - Mongoose 在自定义 _id 上查找不起作用

mysql - 书架级联删除

mongodb - 如何使用带有 mongo-hadoop 连接器的 spark 在 mongo 集合中保存数据?

meteor - 单击按钮时多次调用 meteor 中的服务器方法

node.js - 使用 nodemailer 和 mandrill smtp 开始出错。 CERT_HAS_EXPIRED 错误 : certificate has expired