google-cloud-storage – Presto on Preemptible GCE实例

我正在运行一个20个Preemptible GCE实例的实例组来读取Google存储上的ORC文件,数据按小时划分,每小时约2GB.

>我应该使用什么类型的实例?
> JVM应该使用多少Ram?
>我正在使用80%CPU和10分钟冷却时间的自动配置,是否有更多字幕配置为Presto?
>由于缺乏资源,是否存在服务器关闭的解决方案?

部分回复也将受到赞赏.

最佳答案 作为PrestoDB的0.199版本,Presto没有谷歌云存储连接器,因此无法查询GCS数据.

关于硬件要求,我在这里引用Terada doc.

Memory

You should allocate a minimum of 16GB of RAM per node for Presto. But
recommend 64GB for most production workloads.

Network Bandwidth

It is recommended to have 10 Gigabit Ethernet between all the nodes in
the cluster.

Other Recommendations

Presto can be installed on any normally configured Hadoop cluster.
YARN should be configured to account for resources dedicated to
Presto. For example, if a node has 64GB of RAM, perhaps you would
normally allocate 60GB to YARN. If you install Presto on that node and
give Presto 32GB of RAM, then you should subtract 32GB from the 60GB
and let YARN only allocate 28GB per node. An optimized configuration
might choose to have separate Presto and Hadoop nodes. The optimized
configuration allows you to give more memory to Presto, and thus
perform larger join queries, for example.

点赞