spring - 如何将 Hadoop 作为 Spring 应用程序测试套件的一部分运行？

我想设置一个简单的“Hello, World!”了解如何使用基本的 Hadoop 功能，例如使用 HDFS 存储/读取文件。
是否有可能:

将嵌入式 Hadoop 作为我的应用程序的一部分运行？

在我的测试中运行嵌入式 Hadoop？

我想为此设置一个最小的 Spring Boot。为此所需的最小 Spring 配置是什么？有足够的示例说明如何使用 HDFS 读取/写入文件，但我仍然无法计算出我需要的 Spring 配置。很难弄清楚一个人真正需要什么库，因为 Spring Hadoop 示例似乎已经过时了。任何帮助将非常感激。

最佳答案

您可以轻松使用 Hadoop 文件系统 API 1 2使用任何没有 Hadoop 集群的本地 POSIX 文件系统。
Hadoop API 非常通用，为不同的存储系统(如 HDFS、S3、Azure Data Lake Store 等)提供了许多具体的实现。
您可以在应用程序中嵌入 HDFS(即使用单个 JVM 进程运行 Namenode 和 Datanodes)，但这仅适用于测试。
您可以从命令行 (CLI MiniCluster) 3 启动 Hadoop Minicluster或通过单元测试中的 Java API 使用 MiniDFSCluster类(class) 4发现于 hadoop-minicluster包裹。
您可以通过对其进行单独配置并将其用作 @ContextConfiguration 来使用 Spring 启动 Mini Cluster。与您的单元测试。

@org.springframework.context.annotation.Configuration
public class MiniClusterConfiguration {

    @Bean(name = "temp-folder", initMethod = "create", destroyMethod = "delete")
    public TemporaryFolder temporaryFolder() {
        return new TemporaryFolder();
    }

    @Bean
    public Configuration configuration(final TemporaryFolder temporaryFolder) {
        final Configuration conf = new Configuration();
        conf.set(
            MiniDFSCluster.HDFS_MINIDFS_BASEDIR,
            temporaryFolder.getRoot().getAbsolutePath()
        );
        return conf;
    }

    @Bean(destroyMethod = "shutdown")
    public MiniDFSCluster cluster(final Configuration conf) throws IOException {
        final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf)
            .clusterId(String.valueOf(this.hashCode()))
            .build();
        cluster.waitClusterUp();
        return cluster;
    }

    @Bean
    public FileSystem fileSystem(final MiniDFSCluster cluster) throws IOException {
        return cluster.getFileSystem();
    }

    @Bean
    @Primary
    @Scope(BeanDefinition.SCOPE_PROTOTYPE)
    public Path temp(final FileSystem fs) throws IOException {
        final Path path = new Path("/tmp", UUID.randomUUID().toString());
        fs.mkdirs(path);
        return path;
    }
}

您将注入(inject) FileSystem和一个临时的Path进入您的测试，正如我上面提到的，无论是真正的集群、迷你集群还是本地文件系统，从 API 的角度来看都没有区别。请注意，这会产生启动成本，因此您可能希望使用 @DirtiesContext(classMode = DirtiesContext.ClassMode.AFTER_EACH_TEST_METHOD) 注释您的测试。为了防止每次测试都重启集群。
如果您希望此代码在 Windows 上运行，您将需要一个名为 wintuils 的兼容层。 5 (这使得以 POSIX 方式访问 Windows 文件系统成为可能)。
你必须指向环境变量HADOOP_HOME给它，并根据版本加载它的共享库

String HADOOP_HOME = System.getenv("HADOOP_HOME");
System.setProperty("hadoop.home.dir", HADOOP_HOME);
System.setProperty("hadoop.tmp.dir", System.getProperty("java.io.tmpdir"));
final String lib = String.format("%s/lib/hadoop.dll", HADOOP_HOME);
System.load(lib);

关于spring - 如何将 Hadoop 作为 Spring 应用程序测试套件的一部分运行？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62541894/

spring - 如何将 Hadoop 作为 Spring 应用程序测试套件的一部分运行？

上一篇：docker - 创建 docker 镜像的开销是多少？

下一篇：mongodb - docker容器中的环回无法连接到mongo