java - 使用 Spark SQL 连接 cassandra 中的两个表 - 错误 : missing EOF

标签 java cassandra apache-spark-sql spark-cassandra-connector

我在我的计算机上安装了 Cassandra 和 Spark 以及 SparkSQL。 Spark SQL支持JOIN关键字

https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/spark/sparkSqlSupportedSyntax.html

Supported syntax of Spark SQL The following syntax defines a SELECT query.

SELECT [DISTINCT] [column names]|[wildcard] FROM [kesypace name.]table name [JOIN clause table name ON join condition] [WHERE condition] [GROUP BY column name] [HAVING conditions] [ORDER BY column names [ASC | DSC]]

我有以下代码

SparkConf conf = new SparkConf().setAppName("My application").setMaster("local");
conf.set("spark.cassandra.connection.host", "localhost");
JavaSparkContext sc = new JavaSparkContext(conf);
CassandraConnector connector = CassandraConnector.apply(sc.getConf());
Session session = connector.openSession();

ResultSet results;
String sql ="";


BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(System.in));
sql = "SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID ALLOW FILTERING;";
results = session.execute(sql);

我收到以下错误

Caused by: com.datastax.driver.core.exceptions.SyntaxError: line 1:25 missing EOF at ',' (SELECT * from siem.report[,] siem...) 11:14 AM com.datastax.driver.core.exceptions.SyntaxError: line 1:25 missing EOF at ',' (SELECT * from siem.report[,] siem...) at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:58) at com.datastax.driver.core.exceptions.SyntaxError.copy(SyntaxError.java:24) at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33) at com.sun.proxy.$Proxy59.execute(Unknown Source) at com.ge.predix.rmd.siem.boot.PersistenceTest.test_QuerySparkOnReport_GIACOMO_LogDao(PersistenceTest.java:178) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:73) at org.springframework.test.context.junit4.statements

也尝试过

SELECT * from siem.report JOIN siem.netstat on report.REPORTUUID = netstat.NETSTATREPORTUUID ALLOW FILTERING

也尝试过

SELECT * from siem.report R JOIN siem.netstat N on R.REPORTUUID = N.NETSTATREPORTUUID ALLOW FILTERING

有人可以帮助我吗?我真的在使用 SparkSQL 或 CQL?

更新

我试过了

public void test_JOIN_on_Cassandra () {

        SparkConf conf = new SparkConf().setAppName("My application").setMaster("local");
        conf.set("spark.cassandra.connection.host", "localhost");
        JavaSparkContext sc = new JavaSparkContext(conf);


        SQLContext sqlContext = new SQLContext(sc);
        try {
            //QueryExecution test1 = sqlContext.executeSql("SELECT * from siem.report");
            //QueryExecution test2 = sqlContext.executeSql("SELECT * from siem.report JOIN siem.netstat on report.REPORTUUID = netstat.NETSTATREPORTUUID");
            QueryExecution test3 = sqlContext.executeSql("SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID");

        } catch (Exception e) {
            e.printStackTrace();
        }

       // SchemaRDD results = sc.sql("SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID");

}

我明白了

== Parsed Logical Plan == 'Project [unresolvedalias()] +- 'Join Inner, Some(('siem.report.REPORTUUID = 'siem.netstat.NETSTATREPORTUUID)) :- 'UnresolvedRelation siem.report, None +- 'UnresolvedRelation siem.netstat, None == Analyzed Logical Plan == org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to toAttribute on unresolved object, tree: unresolvedalias() 'Project [unresolvedalias(*)] +- 'Join Inner, Some(('siem.report.REPORTUUID = 'siem.netstat.NETSTATREPORTUUID)) :- 'UnresolvedRelation siem.report, None +- 'UnresolvedRelation siem.netstat, None == Optimized Logical Plan == org.apache.spark.sql.AnalysisException: Table not found: siem.report; == Physical Plan == org.apache.spark.sql.AnalysisException: Table not found: siem.report;

最佳答案

看起来您在这里混合了几个导致错误的概念。您正在创建的 session 将打开到 Cassandra 的直接线路,这意味着它将接受 CQL 而不是 SQL。如果你想运行 SQL,你可以做一个小改变

SparkConf conf = new SparkConf().setAppName("My application").setMaster("local");
conf.set("spark.cassandra.connection.host", "localhost");
JavaSparkContext sc = new JavaSparkContext(conf);

SchemaRDD results = sparkContext.sql("SELECT * from siem.report JOIN siem.netstat on siem.report.REPORTUUID = siem.netstat.NETSTATREPORTUUID");

您从 Spark 上下文调用 SparkSQL,而不是直接连接到 Cassandra。更多信息请参见:http://docs.datastax.com/en/latest-dse/datastax_enterprise/spark/sparkSqlJava.html

关于java - 使用 Spark SQL 连接 cassandra 中的两个表 - 错误 : missing EOF,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36304591/

相关文章:

performance - Cassandra 添加行与添加列性能

python - Impala 查询在 Pyspark 中返回错误结果

Scala和Spark UDF功能

java - 如何监控长时间运行作业的 REST 端点

java - 如何通过 PowerMockito 模拟多个静态类

java - 创建 JSP 布局模板的最佳方法是什么?

date - to_date 在格式 yyyyww 上给出空值(202001 和 202053)

java - 关闭 SSH session 时,进程在服务器上被杀死

database - Cassandra 中的频繁更新表

cassandra - 如何在 Cassandra 中将 Unix 10 位纪元时间存储和查询为人类可读的内容?