hadoop - 从 Presto HIVE_CURSOR_ERROR 中的表中获取数据时出错

标签 hadoop hive presto

我们在单节点服务器上使用 Prestodb(0.69) 和客户端。 在我们使用 Hive 目录的地方,表采用 ORC 格式,包含 350,000,000 行。

在运行查询“select column1 from ORC_Table1 where column2=123456789”时,我们收到 HIVE_CURSOR_ERROR。 column2 的数据类型是“int” 下面是错误堆栈:-

    "failures" : [ { 
      "type" : "com.facebook.presto.spi.PrestoException", 
      "message" : "Read past end of RLE integer from compressed stream Stream for column 2 kind DATA position: 477741 length: 477741 range: 0 offset: 478409 limit: 478409 range 0 = 0 to 477741 uncompressed: 212681 to 212681", 
      "cause" : { 
        "type" : "java.io.EOFException", 
        "message" : "Read past end of RLE integer from compressed stream Stream for column 2 kind DATA position: 477741 length: 477741 range: 0 offset: 478409 limit: 478409 range 0 = 0 to 477741 uncompressed: 212681 to 212681", 
        "suppressed" : [ ], 
        "stack" : [ "org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:46)", "org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:287)", "org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:473)", "org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1157)", "org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2196)", "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:106)", "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:57)", "com.facebook.presto.hive.GenericHiveRecordCursor.advanceNextPosition(GenericHiveRecordCursor.java:241)", "ScanFilterAndProjectOperator_11.filterAndProjectRowOriented(Unknown Source)", "com.facebook.presto.operator.AbstractScanFilterAndProjectOperator.getOutput(AbstractScanFilterAndProjectOperator.java:177)", "com.facebook.presto.operator.Driver.process(Driver.java:329)", "com.facebook.presto.operator.Driver.processFor(Driver.java:271)", "com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:674)", "com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:443)", "com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:577)", "java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)", "java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)", "java.lang.Thread.run(Thread.java:745)" ] 
      }, 
      "suppressed" : [ ], 
      "stack" : [ "com.facebook.presto.hive.GenericHiveRecordCursor.advanceNextPosition(GenericHiveRecordCursor.java:257)", "ScanFilterAndProjectOperator_11.filterAndProjectRowOriented(Unknown Source)", "com.facebook.presto.operator.AbstractScanFilterAndProjectOperator.getOutput(AbstractScanFilterAndProjectOperator.java:177)", "com.facebook.presto.operator.Driver.process(Driver.java:329)", "com.facebook.presto.operator.Driver.processFor(Driver.java:271)", "com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:674)", "com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:443)", "com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:577)", "java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)", "java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)", "java.lang.Thread.run(Thread.java:745)" ], 
      "errorCode" : { 
        "code" : 16777217, 
        "name" : "HIVE_CURSOR_ERROR" 
      } 

查询在由几行组成的表上运行良好。 谁能帮我解决这个问题。

下面是 config.properties:

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
task.max-memory=1GB
discovery-server.enabled=true
discovery.uri=http://172.168.1.99:8080

最佳答案

Hive 可以读取这个表吗?如果可以,这很可能是在比 Presto 使用的更新版本的 Hive 库中修复的错误,您需要等到 Presto 升级到最新的 Hive 版本。如果 Hive 无法读取表,则文件已损坏或 ORC 读取器中仍然存在错误。

关于hadoop - 从 Presto HIVE_CURSOR_ERROR 中的表中获取数据时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25909382/

相关文章:

Java 读取 Parquet 文件到 JSON 输出

hadoop - 从单个 csv 文件在 Hadoop Hive 中创建星型模式(维度和事实表)

Hadoop Hive Web 界面选项

hadoop - 找出原始 Parquet 文件的大小?

java - Hadoop 1.2.1:将jars放在hdfs中的类路径中

Hadoop 3 : how to configure/enable erasure coding?

hadoop - 当多个文件(在完全相同的文件夹/目录中)具有完全相同的列时,如何创建一个Hive表?

unicode - 使用 presto sql 和 AWS athena 中的编码将 varbinary 转换为 varchar

sql - 使用 Presto 从数据构建 json

SQL/Presto SQL : sum by group in a same column