hadoop - 为什么创建了很多 spark-warehouse 文件夹？

我在ubuntu上安装了hadoop 2.8.1，然后在上面安装了spark-2.2.0-bin-hadoop2.7。我使用了 spark-shell 并创建了表格。我再次使用直线并创建了表格。我观察到创建了三个名为 spark-warehouse 的不同文件夹:

1-spark-2.2.0-bin-hadoop2.7/spark-warehouse

2-spark-2.2.0-bin-hadoop2.7/bin/spark-warehouse

3-spark-2.2.0-bin-hadoop2.7/sbin/spark-warehouse

究竟什么是 spark-warehouse，为什么要创建多次？有时我的 spark shell 和直线显示不同的数据库和表，有时它显示相同。我不明白发生了什么事？

此外，我没有安装 hive，但我仍然可以使用 beeline，而且我可以通过 java 程序访问数据库。 hive 是如何出现在我的机器上的？请帮我。我是 spark 的新手，通过在线教程安装了它。

下面是我用来通过 JDBC 连接 apache spark 的 java 代码:

 private static String driverName = "org.apache.hive.jdbc.HiveDriver";

public static void main(String[] args) throws SQLException {
    try {
        Class.forName(driverName);
    } catch (ClassNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
        System.exit(1);
    }
    Connection con = DriverManager.getConnection("jdbc:hive2://10.171.0.117:10000/default", "", "");
    Statement stmt = con.createStatement();

最佳答案

What is exactly spark-warehouse and why are these created many times?

除非另外配置，否则 Spark 将创建一个名为 metastore_db 的内部 Derby 数据库和一个 derby.log。看起来你没有改变它。

This is the default behavior, as point out in the Documentation

When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started

Sometimes my spark shell and beeline shows different databases and tables and sometimes it show same

您在那些不同的文件夹中启动这些命令，因此您看到的内容仅限于当前工作目录。

I used beeline and created tables... How the hive came on my machine?

它没有。您可能正在连接到 Spark Thrift Server ，它与 HiveServer2 协议(protocol)完全兼容，如前所述，Derby 数据库，或者，您实际上确实有一个位于 10.171.0.117

的 HiveServer2 实例

无论如何，这里不需要JDBC连接。您可以直接使用 SparkSession.sql 函数。

关于hadoop - 为什么创建了很多 spark-warehouse 文件夹？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45819568/

hadoop - 为什么创建了很多 spark-warehouse 文件夹？

上一篇：hadoop - 使用直线 shell 拒绝匿名用户的 Hive 权限

下一篇：hadoop - 如何延长hbase命令行超时时间