hadoop - hadoop集群:hadoop流映射任务仅在一台主机上运行,​​而不在从属服务器上运行

标签 hadoop mapreduce hdfs hadoop-streaming

我有一个由三台机器组成的Hadoop集群:

  • 一个主节点(ResourceManager,NameNode,SecondaryNameNode)
  • 和两个从属(DataNode,NodeManager)

  • 我使用hadoop流运行一个c++程序,该程序:
  • 在输入中接受包含HDFS下存储的视频名称的文本文件

    input.txt:
    video0001.avi
    
    Video0002.avi
    

    ...
  • 在通过映射器读取每行(作为键)之后,它必须复制从hdfs输入名称的视频并将其存储在从机上,然后程序在视频上运行opencv和ffmpeg,然后切换到视频2做同样的事情
  • 映射器将视频的名称作为键返回,并将视频的一些参数作为值
  • 我在所有群集计算机上都有程序
  • 集群设置很好,我可以将文件复制到你的
  • 当我在单节点上运行程序时,它运行良好,但是当我在单节点上运行程序时,它运行良好,但是当我在三台计算机的群集上运行时,它只能在主服务器上运行,而不使用从属服务器。
  • 我在主计算机上运行此命令:

  • hadoop jar /usr/local/lib/hadoop-2.7.3/share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar -input / user / root / input -output / user / root / output -mapper签名-文件签名
    •   12/20 02:43:51 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
    •   16/12/20 02:43:51 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
    •   16/12/20 02:43:51 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    •   16/12/20 02:43:52 INFO mapred.FileInputFormat: Total input paths to process : 1
    •   16/12/20 02:43:52 INFO mapreduce.JobSubmitter: number of splits:1
    •   16/12/20 02:43:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local815523916_0001
    •   16/12/20 02:43:54 INFO mapred.LocalDistributedCacheManager: Localized file:/home/master/Desktop/Extract_signature/Prog/signature as file:/app/hadoop/tmp/mapred/local/1482230633565/signature
    •   16/12/20 02:43:54 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
    •   16/12/20 02:43:54 INFO mapreduce.Job: Running job: job_local815523916_0001
    •   16/12/20 02:43:54 INFO mapred.LocalJobRunner: OutputCommitter set in config null
    •   16/12/20 02:43:54 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
    •   16/12/20 02:43:54 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    •   16/12/20 02:43:55 INFO mapred.LocalJobRunner: Waiting for map tasks
    •   16/12/20 02:43:55 INFO mapred.LocalJobRunner: Starting task: attempt_local815523916_0001_m_000000_0
    •   16/12/20 02:43:55 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    •   16/12/20 02:43:55 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    •   16/12/20 02:43:55 INFO mapred.MapTask: Processing split: hdfs://Hadoop:54310/user/root/input/input.txt:0+33
    •   16/12/20 02:43:55 INFO mapred.MapTask: numReduceTasks: 1
    •   16/12/20 02:43:55 INFO mapreduce.Job: Job job_local815523916_0001 running in uber mode : false
    •   16/12/20 02:43:55 INFO mapreduce.Job:  map 0% reduce 0%
    •   16/12/20 02:44:48 INFO mapred.LocalJobRunner: hdfs://Hadoop:54310/user/root/input/input.txt:0+33 > map
    •   16/12/20 02:44:48 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
    •   16/12/20 02:44:48 INFO streaming.PipeMapRed: PipeMapRed exec [/home/master/Desktop/Extract_signature/Prog/./signature]
    •   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
    •   16/12/20 02:44:48 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
    •   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
    •   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
    •   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
    •   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
    •   16/12/20 02:44:48 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
    •   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
    •   16/12/20 02:44:48 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
    •   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
    •   16/12/20 02:44:48 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
    •   16/12/20 02:44:48 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
    •   16/12/20 02:44:49 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:1=1/1 [rec/s] out:0=0/1 [rec/s]
    
    •   16/12/20 02:44:54 INFO mapred.LocalJobRunner: hdfs://Hadoop:54310/user/root/input/input.txt:0+33 > map
    •   16/12/20 02:44:54 INFO mapreduce.Job:  map 67% reduce 0%
    
    •   There were 11 warnings (use warnings() to see them)
    •   16/12/20 02:47:48 INFO streaming.PipeMapRed: Records R/W=2/2
    •   16/12/20 02:47:48 INFO streaming.PipeMapRed: MRErrorThread done
    •   16/12/20 02:47:48 INFO streaming.PipeMapRed: mapRedFinished
    •   16/12/20 02:47:48 INFO mapred.LocalJobRunner: Records R/W=2/1 > map
    •   16/12/20 02:47:48 INFO mapred.MapTask: Starting flush of map output
    •   16/12/20 02:47:48 INFO mapred.MapTask: Spilling map output
    •   16/12/20 02:47:48 INFO mapred.MapTask: bufstart = 0; bufend = 40; bufvoid = 104857600
    •   16/12/20 02:47:48 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214392(104857568); length = 5/6553600
    •   16/12/20 02:47:48 INFO mapred.MapTask: Finished spill 0
    •   16/12/20 02:47:48 INFO mapred.Task: Task:attempt_local1256877917_0001_m_000000_0 is done. And is in the process of committing
    •   16/12/20 02:47:48 INFO mapred.LocalJobRunner: Records R/W=2/2
    •   16/12/20 02:47:48 INFO mapred.Task: Task 'attempt_local1256877917_0001_m_000000_0' done.
    •   16/12/20 02:47:48 INFO mapred.LocalJobRunner: Finishing task: attempt_local1256877917_0001_m_000000_0
    •   16/12/20 02:47:48 INFO mapred.LocalJobRunner: map task executor complete.
    •   16/12/20 02:47:48 INFO mapred.LocalJobRunner: Waiting for reduce tasks
    •   16/12/20 02:47:48 INFO mapred.LocalJobRunner: Starting task: attempt_local1256877917_0001_r_000000_0
    •   16/12/20 02:47:48 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
    •   16/12/20 02:47:48 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
    •   16/12/20 02:47:49 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@71589312
    •   16/12/20 02:47:49 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
    •   16/12/20 02:47:49 INFO reduce.EventFetcher: attempt_local1256877917_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
    •   16/12/20 02:47:49 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1256877917_0001_m_000000_0 decomp: 46 len: 50 to MEMORY
    •   16/12/20 02:47:49 INFO reduce.InMemoryMapOutput: Read 46 bytes from map-output for attempt_local1256877917_0001_m_000000_0
    •   16/12/20 02:47:49 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 46, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->46
    •   16/12/20 02:47:49 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
    •   16/12/20 02:47:49 INFO mapred.LocalJobRunner: 1 / 1 copied.
    •   16/12/20 02:47:49 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
    •   16/12/20 02:47:49 INFO mapred.Merger: Merging 1 sorted segments
    •   16/12/20 02:47:49 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 25 bytes
    •   16/12/20 02:47:49 INFO reduce.MergeManagerImpl: Merged 1 segments, 46 bytes to disk to satisfy reduce memory limit
    •   16/12/20 02:47:49 INFO reduce.MergeManagerImpl: Merging 1 files, 50 bytes from disk
    •   16/12/20 02:47:49 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
    •   16/12/20 02:47:49 INFO mapred.Merger: Merging 1 sorted segments
    •   16/12/20 02:47:49 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 25 bytes
    •   16/12/20 02:47:49 INFO mapred.LocalJobRunner: 1 / 1 copied.
    •   16/12/20 02:47:49 INFO mapred.Task: Task:attempt_local1256877917_0001_r_000000_0 is done. And is in the process of committing
    •   16/12/20 02:47:49 INFO mapred.LocalJobRunner: 1 / 1 copied.
    •   16/12/20 02:47:49 INFO mapred.Task: Task attempt_local1256877917_0001_r_000000_0 is allowed to commit now
    •   16/12/20 02:47:49 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1256877917_0001_r_000000_0' to hdfs://Hadoop:54310/user/root/output/_temporary/0/task_local1256877917_0001_r_000000
    
    •   16/12/20 02:47:49 INFO mapred.Task: Task 'attempt_local1256877917_0001_r_000000_0' done.
    •   16/12/20 02:47:49 INFO mapred.LocalJobRunner: Finishing task: attempt_local1256877917_0001_r_000000_0
    •   16/12/20 02:47:49 INFO mapred.LocalJobRunner: reduce task executor complete.
    •   16/12/20 02:47:49 INFO mapreduce.Job:  map 100% reduce 100%
    •   16/12/20 02:47:49 INFO mapreduce.Job: Job job_local1256877917_0001 completed successfully
    •   16/12/20 02:47:50 INFO mapreduce.Job: Counters: 35
    
    •   16/12/20 02:47:50 INFO streaming.StreamJob: Output directory: /user/root/output
    

    最佳答案

    根据您的日志

    •   16/12/20 02:43:52 INFO mapred.FileInputFormat: Total input paths to process : 1
    •   16/12/20 02:43:52 INFO mapreduce.JobSubmitter: number of splits:1
    

    hadoop将整个文件作为单个拆分使用。

    尝试使用NLineInputFormat在几台机器上的映射器之间划分输入

    关于hadoop - hadoop集群:hadoop流映射任务仅在一台主机上运行,​​而不在从属服务器上运行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41231298/

    相关文章:

    scala - CreatePairedStream 不是 MQTTUtils 的成员

    regex - 配置单元查询 regexp_extract

    eclipse - 找到接口(interface) org.apache.hadoop.mapreduce.jobcontext 但是当另一个类工作正常时一个类的类预期错误

    sorting - 在具有 "X"个映射器和 "Y"个缩减器的大型 MapReduce 作业中,排序/洗牌阶段将有多少个不同的复制操作

    scala - mapreduce Job()抛出了 'java.lang.IllegalStateException'异常。无法评估org.apache.hadoop.mapreduce.Job.toString()

    hadoop - 齐柏林飞艇的jdbc解释器中找不到解释器配置单元

    hadoop - ClassCastException : org. apache.hadoop.io.LongWritable 无法转换为 org.apache.hadoop.hbase.io.ImmutableBytesWritable

    hadoop - 什么是 "Hadoop"- Hadoop 的定义?

    hadoop - 可以禁用 WebHDFS UI 删除功能吗?

    hadoop fs -ls 隐藏文件