Powered by GitBook

Hadoop Fundamental

HDFS

Block size 设置大点减小seek time

Namenodes 不持久化保存 block 位置，在启动的时候从datanode获取

Namenode 单点问题

namenode 异步原子写状态到本地或远程存储上
secondary namenode

Block cache

Managed by namenode
Configurable on a per-file basis

Namenode federation

Multi namespace multi namenodes

HDFS HA

NFS filer
QJM

Failover and fencing

failover controller (default zookeeper)
fencing QJM only allow one namenode to write to the edit log at one time
client dns map

Read/Write

YARN

ResourceManager
ApplicationMaster
Resource Model
- Resource name
- Amount of memory
- CPUs
- Eventually resources such as disk/network I/O, GPUs, and more

Hadoop I/O

Data Integrity

CRC-32 循环检测

Compression

Native lib CodecPool

Serialization

Writable Interface

MapReduce

MRUnit
Web UI
Debug (Remote)
Log
Tuning a Job, Profile
Workflow

How work

Job Submission
Job initialization
Ubertask for small application
Task assignment
Task execution
Progress and Status Updates

FileInputFormat splits max(minimumSize, min(maximumsize, blockSize) small files CombineFileInputFormat

日志收集系统

results matching ""

No results matching ""