My data lab

YARN(Yet Another Resource Negotiator) 이란?

A framework for job scheduling and cluster resource management.

YARN 핵심 구성 요소

Resource Manager(RM)
- YARN 클러스터의 Master 서버로 하나 또는 이중화를 위해 두개의 서버에만 실행됨
- 클러스터 전체의 리소스를 관리
- YARN 클러스터의 리소스를 사용하고자 하는 다른 플랫롬으로부터 요청을 받아 리소스 할당(스케줄링)
Node Manager(NM)
- YARN 클러스터의 Worker 서버로 Resource Manager를 제외한 모든 서버에 실행
- 사용자가 요청한 프로그램을 실행하는 Container를 fork 시키고 Container를 모니터링
- Container 장애 상황 또는 Container가 요청한 리소스보다 많이 사용하고 있는지 감시(요청한 리소스보다 많이 사용하면 해당 Container를 kill 시킴)
Application Master(AM)

- RM과 협상하여 하둡 클러스터에서 자기가 담당하는 어플리케이션에 필요한 리소스를 할당.
- NM과 협의하여 자기가 담당하는 어플리케이션을 실행하고 그 결과를 주기적으로 모니터
- 자기가 담당하는 어플이케이션의 실행 현황을 주기적으로 RM에게 보고합니다.

DEV 시스템 구성

EPC VM 4 식 (CPU: 2 Core, Mem: 4GB, HDD: 80GB)
Hadoop 3.0 (2017-12-13 GA)
- 주요 특징 : Erasure Coding in HDFS
구현 알고리즘: k-means

hadoop 구성 (0)	2018.11.20
hadoop zipfile as input format (0)	2018.11.20
hadoop - mapreduce (0)	2018.11.20

Java	버전	JDK 8u161
Java	설치경로	/usr/local/java
Hadoop	버전	3.0.0
Hadoop	설치경로	/usr/local/hadoop

파일명	형식	설명
hadoop-env.sh	Bash 스크립트	하둡구동 스크립트에서 사용하는 환경변수
mapred-env.sh	Bash 스크립트	맵리듀스를 구동하는 스크립트에서 사용하는 환경변수 (hadoop-env.sh에서 재정의)
yarn-env.sh	Bash 스크립트	YARN을 구동하는 스크립트에서 사용하는 환경변수 (hadoop-env.sh에서 재정의)
core-site.xml	하둡설정XML	HDFS, 맵리듀스, YARN에서 공통적으로 사용되는 I/O 설정과 같은 하둡코어를 위한 환경설정 구성
hdfs-site.xml	하둡설정XML	네임노드, 보조 네임노드, 데이터노드 등과 같은 hdfs 데몬을 위한 환경 설정 구성
mapred-site.xml	하둡설정XML	잡 히스토리 서버 같은 맵리듀스 데몬을 위한 환경 설정 구성
yarn-site.xml	하둡설정XML	리소스매니저, 웹어클리케이션 프록시서버, 노드매니저와 같은 YARN데몬을 위한 환경 설정 구성
workers (~~slaves~~)	일반텍스트	데이터노드와 노드매니저를 구동할 컴퓨터의 목록
hadoop-metrics2.properties	java 속성	메트릭의 표시를 제어하기 위한 속성
log4.properties	자바 속성	시스템 로그, 네임노드 감사 로그, jvmㅣ 프로세스의 작업로그
hadoop-policy.xml	하둡설정XML	하둡을 보안 모드로 구동할 때 사용되는 접근 제어 목록에 대한 환경 설정 구성

hadoop 과 Yarn (0)	2018.11.20
hadoop zipfile as input format (0)	2018.11.20
hadoop - mapreduce (0)	2018.11.20

hadoop 과 Yarn (0)	2018.11.20
hadoop 구성 (0)	2018.11.20
hadoop - mapreduce (0)	2018.11.20

My data lab

spark,kafka,hadoop ecosystems/apache.hadoop

hadoop 과 Yarn

DEV 시스템 구성

'spark,kafka,hadoop ecosystems > apache.hadoop' 카테고리의 다른 글

hadoop 구성

설치 정보

자바 설치

유닉스 사용자 계정 생성

VM 구성

하둡설치

다운로드

압축해제

SSH구성

RSA공개키/개인키 쌍 생성

호스트 설정

하둡 환경 설정

설정 파일 변경

설정파일 적용

HDFS 파일시스템 포맷

데몬의 시작과 중지

HDFS데몬 시작

hdfs

yarn

mapred

데몬 상태 확인

사용자 디렉토리 생성

'spark,kafka,hadoop ecosystems > apache.hadoop' 카테고리의 다른 글

hadoop zipfile as input format

'spark,kafka,hadoop ecosystems > apache.hadoop' 카테고리의 다른 글

hadoop - mapreduce

'spark,kafka,hadoop ecosystems > apache.hadoop' 카테고리의 다른 글

+ Recent posts

티스토리툴바

name	value	description
mapreduce.jobtracker.jobhistory.location		If job tracker is static the history files are stored in this single well known place. If No value is set here, by default, it is in the local file system at ${hadoop.log.dir}/history.
mapreduce.jobtracker.jobhistory.task.numberprogresssplits	12	Every task attempt progresses from 0.0 to 1.0 [unless it fails or is killed]. We record, for each task attempt, certain statistics over each twelfth of the progress range. You can change the number of intervals we divide the entire range of progress into by setting this property. Higher values give more precision to the recorded data, but costs more memory in the job tracker at runtime. Each increment in this attribute costs 16 bytes per running task.
mapreduce.job.userhistorylocation		User can specify a location to store the history files of a particular job. If nothing is specified, the logs are stored in output directory. The files are stored in "_logs/history/" in the directory. User can stop logging by giving the value "none".
mapreduce.jobtracker.jobhistory.completed.location		The completed job history files are stored at this single well known location. If nothing is specified, the files are stored at ${mapreduce.jobtracker.jobhistory.location}/done.
mapreduce.job.committer.setup.cleanup.needed	true	true, if job needs job-setup and job-cleanup. false, otherwise
mapreduce.task.io.sort.factor	10	The number of streams to merge at once while sorting files. This determines the number of open file handles.
mapreduce.task.io.sort.mb	100	The total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks.
mapreduce.map.sort.spill.percent	0.80	The soft limit in the serialization buffer. Once reached, a thread will begin to spill the contents to disk in the background. Note that collection will not block if this threshold is exceeded while a spill is already in progress, so spills may be larger than this threshold when it is set to less than .5
mapreduce.jobtracker.address	local	The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
mapreduce.local.clientfactory.class.name	org.apache.hadoop.mapred.LocalClientFactory	This the client factory that is responsible for creating local job runner client
mapreduce.jobtracker.http.address	0.0.0.0:50030	The job tracker http server address and port the server will listen on. If the port is 0 then the server will start on a free port.
mapreduce.jobtracker.handler.count	10	The number of server threads for the JobTracker. This should be roughly 4% of the number of tasktracker nodes.
mapreduce.tasktracker.report.address	127.0.0.1:0	The interface and port that task tracker server listens on. Since it is only connected to by the tasks, it uses the local interface. EXPERT ONLY. Should only be changed if your host does not have the loopback interface.
mapreduce.cluster.local.dir	${hadoop.tmp.dir}/mapred/local	The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk i/o. Directories that do not exist are ignored.
mapreduce.jobtracker.system.dir	${hadoop.tmp.dir}/mapred/system	The directory where MapReduce stores control files.
mapreduce.jobtracker.staging.root.dir	${hadoop.tmp.dir}/mapred/staging	The root of the staging area for users' job files In practice, this should be the directory where users' home directories are located (usually /user)
mapreduce.cluster.temp.dir	${hadoop.tmp.dir}/mapred/temp	A shared directory for temporary files.
mapreduce.tasktracker.local.dir.minspacestart	0	If the space in mapreduce.cluster.local.dir drops under this, do not ask for more tasks. Value in bytes.
mapreduce.tasktracker.local.dir.minspacekill	0	If the space in mapreduce.cluster.local.dir drops under this, do not ask more tasks until all the current ones have finished and cleaned up. Also, to save the rest of the tasks we have running, kill one of them, to clean up some space. Start with the reduce tasks, then go with the ones that have finished the least. Value in bytes.
mapreduce.jobtracker.expire.trackers.interval	600000	Expert: The time-interval, in miliseconds, after which a tasktracker is declared 'lost' if it doesn't send heartbeats.
mapreduce.tasktracker.instrumentation	org.apache.hadoop.mapred.TaskTrackerMetricsInst	Expert: The instrumentation class to associate with each TaskTracker.
mapreduce.tasktracker.resourcecalculatorplugin		Name of the class whose instance will be used to query resource information on the tasktracker. The class must be an instance of org.apache.hadoop.util.ResourceCalculatorPlugin. If the value is null, the tasktracker attempts to use a class appropriate to the platform. Currently, the only platform supported is Linux.
mapreduce.tasktracker.taskmemorymanager.monitoringinterval	5000	The interval, in milliseconds, for which the tasktracker waits between two cycles of monitoring its tasks' memory usage. Used only if tasks' memory management is enabled via mapred.tasktracker.tasks.maxmemory.
mapreduce.tasktracker.tasks.sleeptimebeforesigkill	5000	The time, in milliseconds, the tasktracker waits for sending a SIGKILL to a task, after it has been sent a SIGTERM. This is currently not used on WINDOWS where tasks are just sent a SIGTERM.
mapreduce.job.maps	2	The default number of map tasks per job. Ignored when mapreduce.jobtracker.address is "local".
mapreduce.job.reduces	1	The default number of reduce tasks per job. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapreduce.jobtracker.address is "local".
mapreduce.jobtracker.restart.recover	false	"true" to enable (job) recovery upon restart, "false" to start afresh
mapreduce.jobtracker.jobhistory.block.size	3145728	The block size of the job history file. Since the job recovery uses job history, its important to dump job history to disk as soon as possible. Note that this is an expert level parameter. The default value is set to 3 MB.
mapreduce.jobtracker.taskscheduler	org.apache.hadoop.mapred.JobQueueTaskScheduler	The class responsible for scheduling the tasks.
mapreduce.job.running.map.limit	0	The maximum number of simultaneous map tasks per job. There is no limit if this value is 0 or negative.
mapreduce.job.running.reduce.limit	0	The maximum number of simultaneous reduce tasks per job. There is no limit if this value is 0 or negative.
mapreduce.job.reducer.preempt.delay.sec	0	The threshold in terms of seconds after which an unsatisfied mapper request triggers reducer preemption to free space. Default 0 implies that the reduces should be preempted immediately after allocation if there is currently no room for newly allocated mappers.
mapreduce.job.max.split.locations	10	The max number of block locations to store for each split for locality calculation.
mapreduce.job.split.metainfo.maxsize	10000000	The maximum permissible size of the split metainfo file. The JobTracker won't attempt to read split metainfo files bigger than the configured value. No limits if set to -1.
mapreduce.jobtracker.taskscheduler.maxrunningtasks.perjob		The maximum number of running tasks for a job before it gets preempted. No limits if undefined.
mapreduce.map.maxattempts	4	Expert: The maximum number of attempts per map task. In other words, framework will try to execute a map task these many number of times before giving up on it.
mapreduce.reduce.maxattempts	4	Expert: The maximum number of attempts per reduce task. In other words, framework will try to execute a reduce task these many number of times before giving up on it.
mapreduce.reduce.shuffle.fetch.retry.enabled	${yarn.nodemanager.recovery.enabled}	Set to enable fetch retry during host restart.
mapreduce.reduce.shuffle.fetch.retry.interval-ms	1000	Time of interval that fetcher retry to fetch again when some non-fatal failure happens because of some events like NM restart.