HDFS is similar to GFS
To solve how to save data
Metadata:
1 File Info: file name, file size, create time
2 Index: block 1,2,3…n
Index finds the position of data.
How to save big file?
save as a chunk instead block
1 Chunk = 64 MB; 1 Block = 1024 Bytes
Pros
less metadata;
less network traffic
Cons
waste space if file is not big enough
How to save huge file?
Master server + Many chunk servers
Master server: save metadata and Index
Chunk server: save metadata and chunks
Master server manages all chunk servers
How to detect data error?
Verify checksum when reading data
How to handle data error?
Replicas: 3
How to restore chunk?
ask master for help to restore
How to detect Chunk Server down?
Heartbeat
How to restore chunk after CS down?
if CS down, delete the index and add into repair procedure to restore data based on replicas.
Repair priority is based on the number of replicas.
If all replicas loss, the data loss forever.
How to avoid hot spot? 热点数据
Replicate a chunk into more replicas;
Fill the Chunk Server with more space and bandwidth.