java - Apache Beam Maven 依赖错误

标签 java maven google-cloud-dataflow apache-beam google-cloud-pubsub

我正在尝试使用 Java 中的 Apache Beam 作为某种数据管道。我已经编写了一个简单的类,该类源自 Google Pubsub 并接收到 Google Bigquery,但我无法让它为我的生活构建。我正在使用 Maven 进行构建,并添加了我能找到的每个 Beam 包,但我仍然收到“未找到类文件”错误。

具体:

[ERROR] /X:/Work/pipeline/backup-pipeline/src/main/java/PassthroughPipeline.java:[28,16] cannot access org.apache.beam.sdk.options.GcpOptions
  class file for org.apache.beam.sdk.options.GcpOptions not found
[ERROR] /X:/Work/pipeline/backup-pipeline/src/main/java/PassthroughPipeline.java:[29,16] cannot access org.apache.beam.sdk.options.BigQueryOptions
  class file for org.apache.beam.sdk.options.BigQueryOptions not found
[ERROR] /X:/Work/pipeline/backup-pipeline/src/main/java/PassthroughPipeline.java:[31,16] cannot access org.apache.beam.sdk.options.GcsOptions
  class file for org.apache.beam.sdk.options.GcsOptions not found

有谁知道我需要添加哪些软件包来解决这些问题?不幸的是,谷歌没有提供任何帮助。

我拥有的 POM 文件基于 Apache 为 Wordcount 提供的示例 POM,但添加了额外的依赖项。以下是我放入其中的依赖项。如果需要,我可以提供完整的文件,但它相当单一。

<dependencies>
                <dependency>
                    <groupId>org.apache.beam</groupId>
                    <artifactId>beam-runners-apex</artifactId>
                    <version>${beam.version}</version>
                    <scope>runtime</scope>
                </dependency>
                <!--
                  Apex depends on httpclient version 4.3.5, project has a transitive dependency to httpclient 4.0.1 from
                  google-http-client. Apex dependency version being specified explicitly so that it gets picked up. This
                  can be removed when the project no longer has a dependency on a different httpclient version.
                -->
                <dependency>
                    <groupId>org.apache.httpcomponents</groupId>
                    <artifactId>httpclient</artifactId>
                    <version>4.3.5</version>
                    <scope>runtime</scope>
                    <exclusions>
                        <exclusion>
                            <groupId>commons-codec</groupId>
                            <artifactId>commons-codec</artifactId>
                        </exclusion>
                    </exclusions>
                </dependency>
            </dependencies>
        </profile>

        <profile>
            <id>dataflow-runner</id>
            <!-- Makes the DataflowRunner available when running a pipeline. -->
            <dependencies>
                <dependency>
                    <groupId>org.apache.beam</groupId>
                    <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
                    <version>${beam.version}</version>
                    <scope>runtime</scope>
                </dependency>
            </dependencies>
        </profile>

        <profile>
            <id>flink-runner</id>
            <!-- Makes the FlinkRunner available when running a pipeline. -->
            <dependencies>
                <dependency>
                    <groupId>org.apache.beam</groupId>
                    <artifactId>beam-runners-flink_2.10</artifactId>
                    <version>${beam.version}</version>
                    <scope>runtime</scope>
                </dependency>
            </dependencies>
        </profile>

        <profile>
            <id>spark-runner</id>
            <!-- Makes the SparkRunner available when running a pipeline. Additionally,
                 overrides some Spark dependencies to Beam-compatible versions. -->
            <dependencies>
                <dependency>
                    <groupId>org.apache.beam</groupId>
                    <artifactId>beam-runners-spark</artifactId>
                    <version>${beam.version}</version>
                    <scope>runtime</scope>
                </dependency>
                <dependency>
                    <groupId>org.apache.beam</groupId>
                    <artifactId>beam-sdks-java-io-hadoop-file-system</artifactId>
                    <version>${beam.version}</version>
                    <scope>runtime</scope>
                </dependency>
                <dependency>
                    <groupId>org.apache.spark</groupId>
                    <artifactId>spark-streaming_2.10</artifactId>
                    <version>${spark.version}</version>
                    <scope>runtime</scope>
                    <exclusions>
                        <exclusion>
                            <groupId>org.slf4j</groupId>
                            <artifactId>jul-to-slf4j</artifactId>
                        </exclusion>
                    </exclusions>
                </dependency>
                <dependency>
                    <groupId>com.fasterxml.jackson.module</groupId>
                    <artifactId>jackson-module-scala_2.10</artifactId>
                    <version>${jackson.version}</version>
                    <scope>runtime</scope>
                </dependency>
            </dependencies>
        </profile>
    </profiles>

    <dependencies>
        <!-- Adds a dependency on the Beam SDK. -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-core</artifactId>
            <version>2.2.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-io-google-cloud-platform -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
            <version>2.2.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-common-fn-api -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-common-fn-api</artifactId>
            <version>2.2.0</version>
            <scope>test</scope>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-extensions-google-cloud-platform-core -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-extensions-google-cloud-platform-core</artifactId>
            <version>2.2.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-io-common -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-io-common</artifactId>
            <version>2.2.0</version>
            <scope>test</scope>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-parent -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-gcp-parent -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-gcp-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-extensions-parent -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-extensions-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-parent -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-common-parent -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-common-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/com.google.cloud.dataflow/google-cloud-dataflow-java-sdk-parent -->
        <dependency>
            <groupId>com.google.cloud.dataflow</groupId>
            <artifactId>google-cloud-dataflow-java-sdk-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>


        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-reference -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-reference</artifactId>
            <version>2.2.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-parent -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-parent</artifactId>
            <version>2.2.0</version>
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-build-tools -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-build-tools</artifactId>
            <version>2.2.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-direct-java -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-direct-java</artifactId>
            <version>2.2.0</version>
            <scope>test</scope>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-runners-core-construction-java -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-core-construction-java</artifactId>
            <version>2.2.0</version>
        </dependency>

        <dependency>
            <groupId>com.google.cloud.dataflow</groupId>
            <artifactId>google-cloud-dataflow-java-sdk-all</artifactId>
            <version>[2.1.0, 2.99)</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-common-runner-api -->
        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-common-runner-api</artifactId>
            <version>2.2.0</version>
        </dependency>


        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
            <version>0.4.0</version>
        </dependency>

        <dependency>
            <groupId>com.google.api-client</groupId>
            <artifactId>google-api-client</artifactId>
            <version>${google-clients.version}</version>
            <exclusions>
                <!-- Exclude an old version of guava that is being pulled
                     in by a transitive dependency of google-api-client -->
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava-jdk5</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>com.google.apis</groupId>
            <artifactId>google-api-services-bigquery</artifactId>
            <version>${bigquery.version}</version>
            <exclusions>
                <!-- Exclude an old version of guava that is being pulled
                     in by a transitive dependency of google-api-client -->
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava-jdk5</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>com.google.http-client</groupId>
            <artifactId>google-http-client</artifactId>
            <version>${google-clients.version}</version>
            <exclusions>
                <!-- Exclude an old version of guava that is being pulled
                     in by a transitive dependency of google-api-client -->
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava-jdk5</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>com.google.apis</groupId>
            <artifactId>google-api-services-pubsub</artifactId>
            <version>${pubsub.version}</version>
            <exclusions>
                <!-- Exclude an old version of guava that is being pulled
                     in by a transitive dependency of google-api-client -->
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava-jdk5</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>joda-time</groupId>
            <artifactId>joda-time</artifactId>
            <version>${joda.version}</version>
        </dependency>

        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>${guava.version}</version>
        </dependency>

        <!-- Add slf4j API frontend binding with JUL backend -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>${slf4j.version}</version>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-jdk14</artifactId>
            <version>${slf4j.version}</version>
            <!-- When loaded at runtime this will wire up slf4j to the JUL backend -->
            <scope>runtime</scope>
        </dependency>

        <!-- Hamcrest and JUnit are required dependencies of PAssert,
             which is used in the main code of DebuggingWordCount example. -->
        <dependency>
            <groupId>org.hamcrest</groupId>
            <artifactId>hamcrest-all</artifactId>
            <version>${hamcrest.version}</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>${junit.version}</version>
        </dependency>

    </dependencies>

最佳答案

这些类:

org.apache.beam.sdk.options.GcpOptions
org.apache.beam.sdk.options.GcsOptions
org.apache.beam.sdk.options.BigQueryOptions

...都在 an earlier version 中 Apache 光束。

考虑到 pom.xml 中的依赖项(具体来说,对 Apache Beam v2.2.0 的依赖项),正确导入是:

org.apache.beam.sdk.extensions.gcp.options.GcpOptions 
org.apache.beam.sdk.extensions.gcp.options.GcsOptions 
org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions

关于java - Apache Beam Maven 依赖错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48748790/

相关文章:

java.lang.OutOfMemoryError : Java heap space when stitching 13k . png 图像在一起

java - 用限定符第谷替换快照

scala - 无法访问 Scala 中相对路径中的文件以获取测试资源

google-cloud-platform - 如何在 Java 的数据流作业中指定将在每个数据流 VM worker 上执行的启动脚本

java - 与 jersey json REST 客户端共享对象

java - 检查是否已加载 dll 库? ( java )

google-cloud-dataflow - Dataflow 是否使用 Google Cloud Storage 的 gzip 转码?

java - Dataflow 大型侧输入中的 Apache Beam

java - 带有处理功能的 Skype API 或 Arduino

java - 解决依赖的重复版本