我有一个在 Node/Meteor 中构建的文件存储服务,它利用 GridFS,并且可以跨多个容器进行复制。我目前正在尝试查找的是,这段代码是否确实意识到读/写一致性
db.command({
filemd5: someFileId,
root: 'fs'
}, function callback(err, results) {
...
})
我正在分块上传文件,并将所有 block 合并到一个文件中后执行该命令。我有一种感觉,它正在使用次要成员(我有几个 md5 值,它们是空文件 - d41d8cd98f00b204e9800998ecf8427e
)。有相关文档或其他设置吗?
这 2 个参数是文档中描述的唯一选项.. https://docs.mongodb.com/manual/reference/command/filemd5/
更新
合并 block 的确切代码位于第 3 方包中:
cursor = files.find(
{
'metadata._Resumable.resumableIdentifier': file.metadata._Resumable.resumableIdentifier
length:
$ne: 0
},
{
fields:
length: 1
metadata: 1
sort:
'metadata._Resumable.resumableChunkNumber': 1
}
)
https://github.com/vsivsi/meteor-file-collection/blob/master/src/resumable_server.coffee#L26
然后是第 111-119 行,首先执行 filemd5,然后对文件运行更新
@db.command md5Command, (err, results) ->
if err
lock.releaseLock()
return callback err
# Update the size and md5 to the file data
files.update { _id: fileId }, { $set: { length: file.metadata._Resumable.resumableTotalSize, md5: results.md5 }},
(err, res) =>
lock.releaseLock()
callback err
https://github.com/vsivsi/meteor-file-collection/blob/master/src/resumable_server.coffee#L111-L119
写入最后一个 block 后,cursor = files.find()
会与所有合并内容一起启动,因此如果读取首选项为 secondaryPreferred
那么它们可能不会仍然存在在吗?是否应该重构该代码以仅使用主要代码?
最佳答案
GridFS 创建 2 个集合:files
和chunks
.
典型的files
条目如下所示:
{
"_id" : ObjectId("58cfbc8b6900bb31c7b1b8d9"),
"length" : 4,
"chunkSize" : 261120,
"uploadDate" : ISODate("2017-03-20T11:27:07.812Z"),
"md5" : "d3b07384d113edec49eaa6238ad5ff00",
"filename" : "foo.txt"
}
filemd5
管理命令应该简单地返回 md5
相关文件文档的字段(以及 block 的数量)。
files.md5
An MD5 hash of the complete file returned by the filemd5 command. This value has the String type.source: GridFS docs
它应该代表完整文件的哈希值,或者至少代表最初保存的哈希值。
What is the ‘md5’ field of a files collection document and how is it used?
‘md5’ holds an MD5 checksum that is computed from the original contents of a user file. Historically, GridFS did not use acknowledged writes, so this checksum was necessary to ensure that writes went through properly. With acknowledged writes, the MD5 checksum is still useful to ensure that files in GridFS have not been corrupted. A third party directly accessing the 'files' and ‘chunks’ collections under GridFS could, inadvertently or maliciously, make changes to documents that would make them unusable by GridFS. Comparing the MD5 in the files collection document to a re-computed MD5 allows detecting such errors and corruption. However, drivers now assume that the stored file is not corrupted, and applications that want to use the MD5 value to check for corruption must do so themselves.source: GridFS spec
如果以这样的方式更新,驱动程序的 mongoc_gridfs_file_save
不使用(例如,流式传输),md5
字段不会更新。
Actually, further reading the spec:
Why store the MD5 checksum instead of creating the hash as-needed? The MD5 checksum must be computed when a file is initially uploaded to GridFS, as this is the only time we are guaranteed to have the entire uncorrupted file. Computing it on-the-fly as a file is read from GridFS would ensure that our reads were successful, but guarantees nothing about the state of the file in the system. A successful check against the stored MD5 checksum guarantees that the stored file matches the original and no corruption has occurred.
这就是我们正在做的事情。只有 mongoc_gridfs_file_save 会计算文件的 md5 和并存储它。任何其他入口点(例如流)都期望用户已创建所有支持的 mongoc_gridfs_file_opt_t 并正确计算 md5
来源:JIRA issue
block 引用>
关于node.js - MongoDB的filemd5是否能够设置readPreference,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42900479/