r - 通过 JDBC 连接到 R 中的 S3 Athena

标签 r jdbc amazon-s3

我正在尝试使用 JDBC 连接到 Amazon 的 Athena。在 R 中使用 RJDBC 库我有以下内容:

download.file('https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.0.jar','AthenaJDBC41-1.0.0.jar' )

jdbcDriver <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", 'AthenaJDBC41-1.0.0.jar',
                identifier.quote="'")

然后使用凭据运行:

jdbcConnection <- dbConnect(jdbcDriver, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
              "s3_staging_dir URL", "s3://testbucket/","
              "USERNAME"," USERKEY","PASSWORD","PASSWORDKEY" )

但我不断收到此错误:

Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1],  : 
  java.sql.SQLException: property s3_staging_dir must be set

我尝试在连接调用中设置 s3_staging_dr 但它不起作用。

任何指导将不胜感激。

最佳答案

library(RJDBC)

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.0.jar'
fil <- basename(URL)
if (!file.exists(fil)) download.file(URL, fil)

drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir="s3://yourbucket",
                                   user=Sys.getenv("ATHENA_USER"),
                                   password=Sys.getenv("ATHENA_PASSWORD"))  


dbListTables(con)
## [1] "elb_logs"

将您的访问 key 和seekrit放入.Renviron(在明显命名的环境变量中),重新启动R并尝试上述操作(使用您的可访问的存储桶。

dbGetQuery(con, "SELECT * FROM sampledb.elb_logs LIMIT 10") %>% 
    dplyr::glimpse()
## Observations: 10
## Variables: 16
## $ timestamp             <chr> "2014-09-27T00:00:25.424956Z", "2014-09-27T00:00:56.439218Z", "2014-09-27T00:01:27.441734Z", "2014-09-27T00:01:58.366715Z", "2014-09-27T00:02:29.446363Z", "2014-09-2...
## $ elbname               <chr> "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo", "lb-demo"
## $ requestip             <chr> "241.230.198.83", "252.26.60.51", "250.244.20.109", "247.59.58.167", "254.64.224.54", "245.195.140.77", "245.195.140.77", "243.71.49.173", "240.139.5.14", "251.192.4...
## $ requestport           <dbl> 27026, 27026, 27026, 27026, 27026, 27026, 27026, 27026, 27026, 27026
## $ backendip             <chr> "251.192.40.76", "249.89.116.3", "251.111.156.171", "251.139.91.156", "251.111.156.171", "254.64.224.54", "254.64.224.54", "250.244.20.109", "247.65.176.249", "250.2...
## $ backendport           <dbl> 443, 8888, 8888, 8888, 8000, 8888, 8888, 8888, 8888, 8888
## $ requestprocessingtime <dbl> 9.1e-05, 9.4e-05, 8.4e-05, 9.7e-05, 9.1e-05, 9.3e-05, 9.4e-05, 8.3e-05, 9.0e-05, 9.0e-05
## $ backendprocessingtime <dbl> 0.046598, 0.038973, 0.047054, 0.039845, 0.061461, 0.037791, 0.047035, 0.048792, 0.045724, 0.029918
## $ clientresponsetime    <dbl> 4.9e-05, 4.7e-05, 4.9e-05, 4.9e-05, 4.0e-05, 7.7e-05, 7.5e-05, 7.3e-05, 4.0e-05, 6.7e-05
## $ elbresponsecode       <chr> "200", "200", "200", "200", "200", "200", "200", "200", "200", "200"
## $ backendresponsecode   <chr> "200", "200", "200", "200", "200", "400", "400", "200", "200", "200"
## $ receivedbytes         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ sentbytes             <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
## $ requestverb           <chr> "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET", "GET"
## $ url                   <chr> "http://www.abcxyz.com:80/jobbrowser/?format=json&state=running&user=20g578y", "http://www.abcxyz.com:80/jobbrowser/?format=json&state=running&user=20g578y", "http:/...
## $ protocol              <chr> "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1", "HTTP/1.1"

关于r - 通过 JDBC 连接到 R 中的 S3 Athena,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40980767/

相关文章:

java - WildFly 8.1.0.Final 上的 Firebird 数据源出现 "Failed to load module"错误

java - 使用 Java JDBC 连接到 Google Cloud SQL 数据库的方法

java - 使用 JDBC 的多个搜索条件

amazon-web-services - Terraform:如何在项目之间迁移状态?

json - AWS FileSystemCredentials 不是构造函数

django - 使用 django-storages 自定义 S3Boto3Storage

r - purrr::pmap 与 rlang 的混淆行为; "to quote"或不引用 Q 的参数

html - 使用 htmlOutput 在 Shiny 的应用程序中将矢量渲染为逗号分隔的文本

regex - 在由 "."分隔的字符上通过正则表达式拆分数据框列

r - 如何保持软件包在Google Cloud VM上安装的RStudio上的完整性