Hadoop MapReduce原理之核心类

原创
2017/04/26 22:35
阅读数 68

 

Job:
    允许user设置、提交、控制执行、查询状态。
    setXXX方法必须在sumbit之前调用。
    
    The job submitter's view of the Job. 

    It allows the user to configure the job, submit it, control its execution, and query the state. The set methods only work until the job is submitted, afterwards they will throw an IllegalStateException. 

    Normally the user creates the application, describes various facets of the job via Job and then submits the job and monitor its progress.

    Here is an example on how to submit a job:

         // Create a new Job
         Job job = Job.getInstance();
         job.setJarByClass(MyJob.class);
         
         // Specify various job-specific parameters     
         job.setJobName("myjob");
         
         job.setInputPath(new Path("in"));
         job.setOutputPath(new Path("out"));
         
         job.setMapperClass(MyJob.MyMapper.class);
         job.setReducerClass(MyJob.MyReducer.class);

         // Submit the job, then poll for progress until the job is complete
         job.waitForCompletion(true);

         
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
    
    管理集群中所有的资源。
    The ResourceManager is the main class that is a set of components. "I am the ResourceManager. All your resources belong to us..."

org.apache.hadoop.yarn.server.nodemanager.NodeManager
    ContainerManagerImpl
    
org.apache.hadoop.mapreduce.v2.app.MRAppMaster
    Map-Reduced的应用管理器,状态机。
    封装了Job借口的实现,所有状态变化都通过Job接口来实现,
    每个事件会最终导致状态的变化。
    状态机变换是基于事件的。
    组件之间收发事件,事件是载体。
    事件由核心的分发机制进行分发。
    The Map-Reduce Application Master. The state machine is encapsulated in the implementation of Job interface. All state changes happens via Job interface. Each event results in a Finite State Transition in Job. MR AppMaster is the composition of loosely coupled services. The services interact with each other via events. The components resembles the Actors model. The component acts on received event and send out the events to other components. This keeps it highly concurrent with no or minimal synchronization needs. The events are dispatched by a central Dispatch mechanism. All components register to the Dispatcher. The information is shared across different components using AppContext.

org.apache.hadoop.mapred.YarnChild
    The main() for MapReduce task processes.
    MRtask的主要进程,负责启动。

org.apache.hadoop.mapreduce.v2.app.job.impl.MapTaskImpl extends TaskImpl 
    Map 任务的封装

org.apache.hadoop.mapreduce.v2.app.job.impl.ReduceTaskImpl  extends TaskImpl 
    Reduce 任务的封装

展开阅读全文
打赏
0
1 收藏
分享
加载中
更多评论
打赏
0 评论
1 收藏
0
分享
返回顶部
顶部