java - 如何使用 Apache Beam 读取 Hadoop 文件？

我正在尝试使用 Apache Beam 在 Hadoop 服务器(非本地)上读取文件。问题是:我怎样才能做到这一点？我读到了一些关于 Hadoop I/O Format with Beam 的内容:

https://beam.apache.org/documentation/io/built-in/hadoop/

我不太明白这部分:

Configuration myHadoopConfiguration = new Configuration(false);
THIS --> // Set Hadoop InputFormat, key and value class in configuration <-- THIS
myHadoopConfiguration.setClass("mapreduce.job.inputformat.class", 
InputFormatClass,
InputFormat.class);
myHadoopConfiguration.setClass("key.class", InputFormatKeyClass, Object.class);
myHadoopConfiguration.setClass("value.class", InputFormatValueClass, Object.class);

如何设置此格式？我需要创建类(class)吗？因为如果我 c/p 这段代码就不起作用。谢谢

最佳答案

标准默认输入格式为 TextInputFormat ，其中extends FileInputFormat<LongWritable,Text>

上面写着Long值作为文件中的字节偏移量。 import org.apache.hadoop.io.LongWritable

和Text值作为奇异线。 import org.apache.hadoop.io.Text

该代码不起作用，因为 InputFormatClass , InputFormatKeyClass或InputFormatValueClass不是实际变量

关于java - 如何使用 Apache Beam 读取 Hadoop 文件？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/50025325/

上一篇：sql - hive 查询 : Ambiguous column reference acct_nbr in stage

下一篇：linux - 预认证失败 : Password read interrupted while getting initial credentials

java - Apache Beam/Dataflow - PubSub 丢失消息

java - 如何使用 JDBCIO (apache beam) 执行存储过程/例程

java - 如何处理 java.time 中的完整周期？

mysql - sqoop导入mysql报错-通信链接失败

hadoop - Apache Kafka 中的魔法字节

java - 在映射器中写入自定义对象时出错

java - 基于JtextArea动态显示Jlabel，无需点击按钮

java - 这是验证仅 0 和 1 字符串输入的最佳方法吗？

java - 创建和使用 GUI 时