我要执行以下任务:
我在主节点的 eclipse 中运行了MapReduce应用程序(如WordCount),并且我想查看工作节点如何使用Eclipse工作,因为我知道本地mapreduce作业和完全分布式的mapreduce作业之间存在一些不同的工作流程。
有什么方法可以实现?
最佳答案
您可以在本地运行任务,请参见How to Debug Map/Reduce Programs:
Start by getting everything running (likely on a small input) in the local runner. You do this by setting your job tracker to "local" in your config. The local runner can run under the debugger and runs on your development machine.
A very quick and easy way to set this config variable is to include the following line just before you run the job:
conf.set("mapred.job.tracker", "local");
You may also want to do this to make the input and output files be in the local file system rather than in the Hadoop distributed file system (HDFS):conf.set("fs.default.name", "local");
You can also set these configuration parameters in hadoop-site.xml. The configuration files hadoop-default.xml, mapred-default.xml and hadoop-site.xml should appear somewhere in your program's class path when the program runs.
如果要在实际集群中调试任务,则必须将调试选项添加到Java起始行(例如
-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000
),然后将Eclipse远程连接到等待的Java进程。例如,您可以设置mapred.map.child.java.opts
。有几个执行此操作的示例,但具体操作方法各不相同:一旦您了解到目标是将
-agentlib:...
参数传递给Java命令行以启用远程调试器,以便Eclipse可以附加一些东西,那么实现的具体细节就变得无关紧要了。不过,我会避免使用hadoop-env.sh的修改。AFAIK Cloudera具有一个VM镜像,该镜像附带了用于本地M / R任务开发的预配置Eclipse,请参阅How-to: Use Eclipse with MapReduce in Cloudera’s QuickStart VM
关于debugging - 如何使用Eclipse在MapReduce的主节点中调试工作程序节点?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19135628/