google-cloud-storage - Presto 抢占式 GCE 实例

标签 google-cloud-storage presto orc google-compute-engine

我正在运行一个由 20 个 Preemptible GCE 实例组成的实例组来读取 Google 存储上的 ORC 文件，数据按小时分区，每小时大约 2GB。

我应该使用什么类型的实例？

JVM 应该使用多少 Ram ？

我正在使用 80% CPU 和 10 分钟冷却的自动缩放配置，是否有更多的 Presto 字幕配置？

由于缺乏资源，是否有服务器关闭的解决方案？

部分响应也将不胜感激。

最佳答案

由于 PrestoDB 的 0.199 版本没有用于 Presto 的谷歌云存储连接器，因此无法查询 GCS 数据。

关于硬件要求，我会引用 Terada doc这里。

Memory

You should allocate a minimum of 16GB of RAM per node for Presto. But recommend 64GB for most production workloads.

Network Bandwidth

It is recommended to have 10 Gigabit Ethernet between all the nodes in the cluster.

Other Recommendations

Presto can be installed on any normally configured Hadoop cluster. YARN should be configured to account for resources dedicated to Presto. For example, if a node has 64GB of RAM, perhaps you would normally allocate 60GB to YARN. If you install Presto on that node and give Presto 32GB of RAM, then you should subtract 32GB from the 60GB and let YARN only allocate 28GB per node. An optimized configuration might choose to have separate Presto and Hadoop nodes. The optimized configuration allows you to give more memory to Presto, and thus perform larger join queries, for example.

关于google-cloud-storage - Presto 抢占式 GCE 实例，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44619179/

上一篇：haskell - GHC 分析器输出

下一篇：botframework - Microsoft Bot Framework 何时支持 Facebook 聊天扩展

相关文章：

presto - 具有多个非嵌套表的 ATHENA/PRESTO 复杂查询

amazon-web-services - 如何从 AWS 中的 Athena 检查分区列表？

hadoop - ORC Hive Create Table 错误与位置

apache-spark - 在Spark中读取ORC文件时如何保留分区列

hadoop - 在HDFS上，我想显示以ORC格式存储的配置单元表的普通文本

ruby-on-rails - 如何从 Active Storage 获取图像的永久 URL？

java - 为什么 Kubernetes 需要更新 cacerts

sql - 查询以删除列中字符串的第一个和第二个连字符之间的所有内容

node.js - 通过 node.js 在当前 shell session 中设置 GOOGLE_APPLICATION_CREDENTIALS

symfony - 如何使用 symfony/gauferette/VichUploaderBundle 将文件上传到 Google Cloud Storage