YARN原论文笔记摘抄

原文:Apache Hadoop YARN: Yet Another Resource Negotiator (2013) pdf, acm

Overview

YARN was designed to address:

  1. tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model;
  2. centralized handling of jobs’ control flow, which resulted in endless scalability concerns for the scheduler.

It achieves: decoupling the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault tolerance) to per-application components.

Guiding Questions

  • What is the relationship between the Resource Manager, Application Master, and Node Manager?
  • How does YARN achieve each of its requirement goals?

Requirements

  • Scalability
  • Multi-tenancy
  • Serviceability
  • Locality awareness
  • High cluster utilization
  • Reliability, availability
  • Secure and auditable operation
  • Support for programming model diversity
  • Flexible resource model
  • Backward compatibility

Architecture

Resource Manager

Basically maps available resources from Node Managers via heartbeats and leases them out to Application Masters as containers. Manages resources.

  • 1 Resource Manager per cluster:
    • track resource usage and node liveliness via heartbeats received from Node Managers
    • enforce allocation invariants
    • arbitrate contention
    • allocates leases (i.e. containers) to applications to run on particular nodes.
  • can also revoke leases or request resources back from an Application Master

Application Master

A job coordinator. Requests resources from the Resource Manager and executes the job on the containers it gets. Basically it will adjust execution according to resource availability and the nature of the job itself.

  • Application Master (1 per job) responsible for:
    • requesting resources from RM by submitting 1+ ResourceRequests
    • generating physical plan from resources received
    • coordinating execution around faults
    • resource management
    • manage flow of execution (mappers -> reducers etc.)
    • handling skew, faults, optimizations ** (what kind of optimization is done?)
    • needs to manage CPU, RAM, disk on multiple nodes
  • sends heartbeats to RM to update record of its demand.
  • if a task fails, the AM is responsible for requesting more resources.

Node Manager

It authenticates container leases, manages containers’ dependencies,
monitors their execution, and provides a set of services to containers.

  • Worker daemon
  • monitors resource availability
  • reports faults
  • container lifecycle management
  • validates lease authenticity

To launch the container, the NM copies all the necessary dependencies– data files, executables, tarballs– to local storage.

YARN framework/application writers

YARN is responsible for:

  1. Submitting the application by passing a CLC for the ApplicationMaster to the RM.
  2. When RM starts the AM, it should register with the RM and periodically advertise its liveness and requirements over the heartbeat protocol
  3. Once the RM allocates a container, AM can construct a CLC to launch the container on the corresponding NM. It may also monitor the status of the running container and stop it when the resource should be reclaimed. Monitoring the progress of work done inside the container is strictly the AM’s responsibility.
  4. Once the AM is done with its work, it should unregister from the RM and exit cleanly.

Fault tolerance and availability

中文词汇

  • Resource Manager: 资源管理器,管理整个系统的资源分配。
  • Application Master: 应用程序管理器
  • Node Manager: 节点管理器
  • Scheduler: 调度器

Terminology

  • Container launch context: map of env variables, dependencies stored in remotely accessible storage, security tokens, payloads for NM services, and the command necessary to create the process.
  • ResourceRequest: a request from the AM to the RM containing:
    • number of containers
    • resources per container (e.g. RAM, CPU)
    • locality preferences
    • priority of requests within application

扩展阅读

http://dl.acm.org/citation.cfm?id=3026195
http://www.jianshu.com/p/c21c3a47dde2
http://www.jianshu.com/p/7151ba6c5e03

点赞