Cassandra 上的 MapReduce

标签 mapreduce cassandra

我正在开发一个简单的 MapReduce 程序来从 Cassandra 列族读取数据,但遇到了以下错误。任何有关如何进行的提示都非常感谢。提前致谢!

Cassandra version : 1.0.3
Hadoop version : 0.20.2
HADOOP_CLASSPATH has: apache-cassandra-1.0.3.jar, libthrift-0.6.jar, commons-lang-2.4.jar and guava-10.0.1.jar
What works : Hadoop MR word count example, Reads from Cassandra column family using cassandra-cli, Thrift and Hector

错误:
11/12/01 20:05:23 INFO mapred.JobClient: Running job: job_201112010835_0009<br/>
11/12/01 20:05:24 INFO mapred.JobClient:  map 0% reduce 0%<br/>
11/12/01 20:05:33 INFO mapred.JobClient: Task Id : attempt_201112010835_0009_m_000000_0, Status : FAILED<br/>
Error: java.lang.ClassNotFoundException: com.google.common.collect.AbstractIterator<br/>
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)<br/>
        at java.security.AccessController.doPrivileged(Native Method)<br/>
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)<br/>
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)<br/>
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)<br/>
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)<br/>
        at java.lang.ClassLoader.defineClass1(Native Method)<br/>
        at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)<br/>
        at java.lang.ClassLoader.defineClass(ClassLoader.java:615)<br/>
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)<br/>
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)<br/>
        at java.net.URLClassLoader.access$000(URLClassLoader.java:58)<br/>
        at java.net.URLClassLoader$1.run(URLClassLoader.java:197)<br/>
        at java.security.AccessController.doPrivileged(Native Method)<br/>
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)<br/>
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)<br/>
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)<br/>
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)<br/>
        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.initialize(ColumnFamilyRecordReader.java:158)<br/>
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)<br/>
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)<br/>
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)<br/>
        at org.apache.hadoop.mapred.Child.main(Child.java:170)<br/>

最佳答案

您是否将 cassandra 库添加到所有任务跟踪器的类路径中?来自维基页面 http://wiki.apache.org/cassandra/HadoopSupport :

One configuration note on getting the task trackers to be able to perform queries over
Cassandra: you'll want to update your HADOOP_CLASSPATH in your <hadoop>/conf/hadoop-env.sh 
to include the Cassandra lib libraries. For example you'll want to do something like 
this in the hadoop-env.sh on each of your task trackers:


export HADOOP_CLASSPATH=/opt/cassandra/lib/*:$HADOOP_CLASSPATH

该示例中的路径显然应该替换为系统上 cassandra 库的正确路径。

关于Cassandra 上的 MapReduce,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8347825/

相关文章:

java - 在哪里可以看到从 hadoop pig 语句生成的 mapreduce 代码

hadoop - 用 Pig latin 分组并为每个键流式传输

java - 错误 : java. lang.NumberFormatException:对于输入字符串: "time"

java - Datastax Cassandra java 驱动程序 RetryPolicy for Statement with paging

cassandra - 在 Cassandra 中选择两个表

hadoop - 从reducer输出命名零件文件

Hadoop 机架感知配置

Cassandra 性能

Cassandra Windows 10 访问冲突

java - MaxConnectionsPerHost 与 MaxRequestsPerConnection Cassandra 驱动程序