elasticsearch - 更新数百万文档的嵌套字段

我使用带有脚本的批量更新来更新嵌套字段，但这非常慢:

POST index/type/_bulk

{"update":{"_id":"1"}}
{"script"{"inline":"ctx._source.nestedfield.add(params.nestedfield)","params":{"nestedfield":{"field1":"1","field2":"2"}}}}
{"update":{"_id":"2"}}
{"script"{"inline":"ctx._source.nestedfield.add(params.nestedfield)","params":{"nestedfield":{"field1":"3","field2":"4"}}}}

 ... [a lot more splitted in several batches]

您知道另一种可以更快的方法吗？

似乎可以存储脚本以避免每次更新都重复它，但我找不到保持“动态”参数的方法。

最佳答案

与性能优化问题一样，没有单一的答案，因为性能不佳的可能原因有很多。

在您的情况下，您正在发出批量 update 请求。执行 更新 时，文档为 actually being re-indexed :

... to update a document is to retrieve it, change it, and then reindex the whole document.

因此看一看 indexing performance tuning tips 是有意义的.在您的情况下，我首先要考虑的几件事是选择正确的批量大小，使用多个线程进行批量请求并增加/禁用 indexing refresh interval .

您还可以考虑使用支持并行批量请求的现成客户端，例如 Python elasticsearch client

最好监控 ElasticSearch 性能指标以了解瓶颈在哪里，以及您的性能调整是否带来了实际 yield 。 Here是一篇关于 ElasticSearch 性能指标的概述博客文章。

关于elasticsearch - 更新数百万文档的嵌套字段，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46813530/

上一篇：elasticsearch - 是否可以在 Elasticsearch 中使用 doc_values=true 过滤非索引字段

下一篇：powershell - 为什么PowerShell创建的线程不能执行脚本函数？

elasticsearch - elasticsearch批量脚本与elasticsearch.yml更改都不起作用

elasticsearch - 如何更新 ElasticSearch 中的多个项目？

elasticsearch - ElasticSearch快照创建-了解如何/在何处存储它们

elasticsearch - Elasticsearch 自动切片有什么作用？

spring-boot - spring boot项目中如何在ElasticSearch中进行源过滤？

python - 如何模拟 Elasticsearch Python？

php - Plastic/Elasticsearch-搜索具有空值的条目

elasticsearch - 如何处理批量请求中的错误

elasticsearch - 如何使用Elasticsearch处理多个更新/删除？