java - 运行 Pigunit 样本测试时出现奇怪的错误

标签 java apache-pig

我正在尝试设置 pig 单元测试,并且正在研究他们提供的文档。看起来有点过时了所以我改用了svn trunk。第一个奇怪的事情是,实际上它需要更多的库,而不仅仅是 Pigunit、pig 和 hadoop-commons 才能工作(添加 hadoop-hdfs、hadoop-mapreduce-client-core、hadoop-mapreduce-client-jobclient)。我不确定将这些放在我的依赖管理器中是好事,但这不是主要问题。这是我尝试执行的测试:

 @Test
public void testNtoN() throws ParseException, IOException {
    String[] args = {
                    "n=3",
                    "reducers=1",
                    "input=top_queries_input_data.txt",
                    "output=top_3_queries",
    };
    test = new PigTest("script dir", args);

    String[] output = {
                    "(yahoo,25)",
                    "(facebook,15)",
                    "(twitter,7)",
    };

    test.assertOutput("queries_limit", output);
}

这是实际的脚本:

 data =
     LOAD '$input'
     AS (query:CHARARRAY, count:INT);

 queries_group = 
     GROUP data 
     BY query
     PARALLEL $reducers;

 queries_sum = 
     FOREACH queries_group 
     GENERATE 
         group AS query, 
         SUM(data.count) AS count;

 queries_ordered = 
     ORDER queries_sum 
     BY count DESC
     PARALLEL $reducers;

 queries_limit = LIMIT queries_ordered $n;

 STORE queries_limit INTO '$output';

这是堆栈跟踪:

 STORE queries_limit INTO 'top_3_queries';
 --> none

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias queries_limit

 at org.apache.pig.PigServer.openIterator(PigServer.java:1019)
 at org.apache.pig.pigunit.PigTest.getAliasFromCache(PigTest.java:224)
 at org.apache.pig.pigunit.PigTest.getActualResults(PigTest.java:319)
 at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:409)
 at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:400)
 at BlaUnitTest.testBla(BlaUnitTest.java:24)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
 at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
 at org.mockito.internal.runners.JUnit45AndHigherRunnerImpl.run(JUnit45AndHigherRunnerImpl.java:37)
 at org.mockito.runners.MockitoJUnitRunner.run(MockitoJUnitRunner.java:62)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
 at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:117)
 at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42)
 at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:262)
 at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:84)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.io.IOException: Couldn't retrieve job.
 at org.apache.pig.PigServer.store(PigServer.java:1083)
 at org.apache.pig.PigServer.openIterator(PigServer.java:994)
 ... 34 more

我尝试调试它以查看实际发生的情况,当它尝试构建查询计划并获取 ExecJob 时会发生这种情况,但我无法弄清楚。我什至尝试简化脚本并删除除加载和存储数据的代码之外的所有内容。结果是一样的。

最佳答案

我成功解决了这个问题。问题是我在类路径中包含了一些依赖项,这似乎扰乱了正确的执行。唯一需要的依赖项是 hadoop-core(我使用 hadoop-aws,因为我将它与 aws 一起使用)、hadoop-client、pig 和 Pigunit。所以现在一切都运行正常。

关于java - 运行 Pigunit 样本测试时出现奇怪的错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41179844/

相关文章:

java - Elasticsearch默认不考虑字符串字段并且不给出正确的匹配结果

java - 一个项目在另一个项目中的 Maven 依赖关系?

java - 如何通过 selenium 从多行 <tag> 中检索文本

hadoop - 把东西从 pig 的袋子里拿出来

hadoop - 在 Pig Latin 中将袋子变成数组

hadoop - 安全退出 pig shell 命令

hadoop - MIn max group wise 和 filter without join in pig

java - 关闭迭代器中的 BufferedReader 以返回文件行

java-读取文件中的最后2行并删除最后一行,然后在文件末尾添加2个固定字符串值

hadoop - 将 50 个大文件中的列/字段合并到一个文件中