我正在运行一个由 20 个 Preemptible GCE 实例组成的实例组来读取 Google 存储上的 ORC 文件,数据按小时分区,每小时大约 2GB。
部分响应也将不胜感激。
最佳答案
由于 PrestoDB 的 0.199 版本没有用于 Presto 的谷歌云存储连接器,因此无法查询 GCS 数据。
关于硬件要求,我会引用 Terada doc这里。
Memory
You should allocate a minimum of 16GB of RAM per node for Presto. But recommend 64GB for most production workloads.
Network Bandwidth
It is recommended to have 10 Gigabit Ethernet between all the nodes in the cluster.
Other Recommendations
Presto can be installed on any normally configured Hadoop cluster. YARN should be configured to account for resources dedicated to Presto. For example, if a node has 64GB of RAM, perhaps you would normally allocate 60GB to YARN. If you install Presto on that node and give Presto 32GB of RAM, then you should subtract 32GB from the 60GB and let YARN only allocate 28GB per node. An optimized configuration might choose to have separate Presto and Hadoop nodes. The optimized configuration allows you to give more memory to Presto, and thus perform larger join queries, for example.
关于google-cloud-storage - Presto 抢占式 GCE 实例,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44619179/