Google File System

HDFS is similar to GFS
To solve how to save data

Metadata:
1 File Info: file name, file size, create time
2 Index: block 1,2,3…n

Index finds the position of data.

How to save big file?

save as a chunk instead block
1 Chunk = 64 MB; 1 Block = 1024 Bytes
Pros
less metadata;
less network traffic
Cons
waste space if file is not big enough

How to save huge file?

Master server + Many chunk servers
Master server: save metadata and Index
Chunk server: save metadata and chunks

Master server manages all chunk servers

How to detect data error?

Verify checksum when reading data

How to handle data error?

Replicas: 3

How to restore chunk?

ask master for help to restore

How to detect Chunk Server down?

Heartbeat

How to restore chunk after CS down?

if CS down, delete the index and add into repair procedure to restore data based on replicas.
Repair priority is based on the number of replicas.

If all replicas loss, the data loss forever.

How to avoid hot spot? 热点数据

Replicate a chunk into more replicas;
Fill the Chunk Server with more space and bandwidth.

How to read data?

read

How to write data?

write