google-cloud-dataflow - Apache Beam - org.apache.beam.sdk.util.UserCodeException : java. sql.SQLException:无法创建 PoolableConnectionFactory(不支持方法)

标签 google-cloud-dataflow apache-beam apache-beam-io

我正在尝试使用 Apache beam-dataflow 连接到安装在云实例中的配置单元实例。当我运行它时,出现以下异常。当我使用 Apache Beam 访问此数据库时,就会发生这种情况。我见过很多与 apache beam 或 google dataflow 无关的相关问题。

(c9ec8fdbe9d1719a): java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.sql.SQLException: Cannot create PoolableConnectionFactory (Method not supported)
at com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:289)
at com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:261)
at com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:55)
at com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:43)
at com.google.cloud.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:78)
at com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory.create(MapTaskExecutorFactory.java:152)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.doWork(DataflowWorker.java:272)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:244)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:125)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:105)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:92)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.beam.sdk.util.UserCodeException: java.sql.SQLException: Cannot create PoolableConnectionFactory (Method not supported)
at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36)
at org.apache.beam.sdk.io.jdbc.JdbcIO$Read$ReadFn$auxiliary$8CR0LcYI.invokeSetup(Unknown Source)
at com.google.cloud.dataflow.worker.runners.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.deserializeCopy(DoFnInstanceManagers.java:65)
at com.google.cloud.dataflow.worker.runners.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.peek(DoFnInstanceManagers.java:47)
at com.google.cloud.dataflow.worker.runners.worker.UserParDoFnFactory.create(UserParDoFnFactory.java:100)
at com.google.cloud.dataflow.worker.runners.worker.DefaultParDoFnFactory.create(DefaultParDoFnFactory.java:70)
at com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory.createParDoOperation(MapTaskExecutorFactory.java:365)
at com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:278)
... 14 more
        Caused by: java.sql.SQLException: Cannot create PoolableConnectionFactory (Method not supported)
at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:2294)
at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:2039)
at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:1533)
at org.apache.beam.sdk.io.jdbc.JdbcIO$Read$ReadFn.setup(JdbcIO.java:377)
Caused by: java.sql.SQLException: Method not supported
at org.apache.hive.jdbc.HiveConnection.isValid(HiveConnection.java:898)
at org.apache.commons.dbcp2.DelegatingConnection.isValid(DelegatingConnection.java:918)
at org.apache.commons.dbcp2.PoolableConnection.validate(PoolableConnection.java:283)
at org.apache.commons.dbcp2.PoolableConnectionFactory.validateConnection(PoolableConnectionFactory.java:357)
at org.apache.commons.dbcp2.BasicDataSource.validateConnectionFactory(BasicDataSource.java:2307)
at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:2290)
at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:2039)
at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:1533)
at org.apache.beam.sdk.io.jdbc.JdbcIO$Read$ReadFn.setup(JdbcIO.java:377)
at org.apache.beam.sdk.io.jdbc.JdbcIO$Read$ReadFn$auxiliary$8CR0LcYI.invokeSetup(Unknown Source)
at com.google.cloud.dataflow.worker.runners.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.deserializeCopy(DoFnInstanceManagers.java:65)
at com.google.cloud.dataflow.worker.runners.worker.DoFnInstanceManagers$ConcurrentQueueInstanceManager.peek(DoFnInstanceManagers.java:47)
at com.google.cloud.dataflow.worker.runners.worker.UserParDoFnFactory.create(UserParDoFnFactory.java:100)
at com.google.cloud.dataflow.worker.runners.worker.DefaultParDoFnFactory.create(DefaultParDoFnFactory.java:70)
at com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory.createParDoOperation(MapTaskExecutorFactory.java:365)
at com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:278)
at com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory$3.typedApply(MapTaskExecutorFactory.java:261)
at com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:55)
at com.google.cloud.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:43)
at com.google.cloud.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:78)
at com.google.cloud.dataflow.worker.runners.worker.MapTaskExecutorFactory.create(MapTaskExecutorFactory.java:152)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.doWork(DataflowWorker.java:272)
at com.google.cloud.dataflow.worker.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:244)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:125)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:105)
at com.google.cloud.dataflow.worker.runners.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:92)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

使用相同的连接字符串和驱动程序文件,我可以使用普通的 java-jdbc 程序连接到这个实例。

这个问题困扰了我一段时间,我找不到解决办法。任何人都可以对此提出任何想法吗?

请查看下面连接到配置单元的代码片段:

PCollection<Customer> collection = dataflowPipeline.apply(JdbcIO.<Customer>read()
            .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration
                    .create("org.apache.hive.jdbc.HiveDriver", "jdbc:hive2://<external IP of computer instance>:10000/dbtest")
                    .withUsername("username").withPassword("password"))     
            .withQuery(
                    "select c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_day,from dbtest.customer")     
            .withRowMapper(new JdbcIO.RowMapper<Customer>() {
                @Override
                public Customer mapRow(ResultSet resultSet) throws Exception {
                    // TODO Auto-generated method stub
                    Customer customer = new Customer();
                    customer.setC_customer_id(resultSet.getString("c_customer_id"));
                    customer.setC_first_name(resultSet.getString("c_first_name"));
                    customer.setC_last_name(resultSet.getString("c_last_name"));
                    customer.setC_preferred_cust_flag(resultSet.getString("c_preferred_cust_flag"));
                    customer.setC_birth_day(resultSet.getInt("c_birth_day"));
                    return customer;
                }
            }).withCoder(AvroCoder.of(Customer.class)));

最佳答案

Apache DBCP BasicDataSource 使用方法 isValid 来验证连接,这不是由旧版本的 Hive JDBC 驱动程序实现的 - 参见 JDBC to hive connection fails on invalid operation isValid() .

但是在Hive 2.1.0之后的版本中实现了该方法。 https://github.com/apache/hive/commit/2d2ab0942482a6ce1523dd9dd0f4094865e93b28 .

能否使用较新版本的 Hive JDBC 驱动程序?

关于google-cloud-dataflow - Apache Beam - org.apache.beam.sdk.util.UserCodeException : java. sql.SQLException:无法创建 PoolableConnectionFactory(不支持方法),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44901372/

相关文章:

java - 使用 Java SDK 检查数据流作业(异步)状态

java - Apache beam Dataflow SDK 错误示例

java - Eclipse插件创建项目失败 "Google Cloud Tools for Eclipse"

java - 使用 DataFlow 从多个 PubSub 主题流式传输到 BigQuery 时,消息陷入 GBP 状态?

google-cloud-dataflow - 缩放具有阻塞网络调用的 ParDo 转换

python - WriteToText 仅写入临时文件

Cassandra IO : Inserting date field doesn't work

java - GCP Dataflow 刷新您的凭据时出现问题

java - Apache Beam 数据流中的外部 API 调用

java - Apache Beam DataflowRunner 无法写入 AWS S3