Categories of NoSQL

NoSQL database

RDBMS features: ACID
A: Atomicity
C: Consistency
I: Isolation
D: Durability

NoSQL is used to store super large scale data. These data doesn’t need a fix pattern, just scale out.

Distributed system CAP theorem: it cann’t be satisfied three of them at the same time.
Consistency: all nodes has same data at the same time
Availability: guarantee response no matter success or fail
Partition tolerance: system keep operating even part information lost.
Imgur

Categories
Column-based:
-HBase, Cassandra;
-Easy to compress data; fast-search for a certain column.

Document-based:
-MongoDB, CouchDB
-Storage Format is similar to JSON(K/V Pair), Content is document format.

Key-Value:
-Redis, Memcache
-Search value through key no matter the type of value

Column-oriented

Traditional relational database is row-oriented. Each row has a row id and every field store in one table.

When searching in row-oriented database, each row will be traversed, no matter a certain column data is needed or not. If you search a person whose birth month is September. Search row by row to target data.

Index the target column can promote search speed, but index each column cause extra overload and databse will traverse all columns to search data.

Column-oriented database will stores each column seperately. It’s very fast to search when the data amount is small. Search column by column to targetdata.

This design looks like adding index for each row. It sacrifices space and writing more index to speed up searching ability. Index maps row ID to data, but column-based database maps data to rowID. Like the picture above, it is easy to search who like archery. This is an inverted index.

When to use row-oriented and when to use column-oriented? Column-based database is easy to add one new column, but is difficult to add one row. Therefore, row-oriented database is bettern than column-based on transaction. Because it achieves real-time update. It is commonly used in marketing business.

Column-oriented database has advantage of data analysis, like sum of values. Batch processing can adopt column-oriented database at night, and it supports quick search and MapReduce aggregation.

Compared to RDBMS
If you want to get all infomation of one user, column-based database costs more;
If you want to get one piece of information of one user, row-based database costs more.

Row-based is faster than Column-based on Updating a target row;
Column-based is faster than Row-based on Searching part of information of users;

Row-based database: OLTP
Column-based database: OLAP

K/V stores

This is the easiest store in NoSQL databases.

the complex k/v paire is like:

Document-oriented stores

Store based on K/V pair, but structure more complex.

MongoDB

Written in C++
Remains some friendly properties of SQL.(Query, index)
Support secondary index
Master-slave replication
Queries support javascript expressions
Better update-in-place than CouchDB
Sharding built-in
Uses memory mapped files for data storage
After crash, it needs to repair tables
Reads faster than wirtes

Use: need dynamic search. Dedine indexes first, no need map/reduce.

Example: Adapts all situations of MySQL and PostgreSQL

Cassandra

Written in Java
Querying by column
Writes are much faster than reads
Column-based databse.
Weak consistency; High availability; High scalability

Use: adapts to implement in write action more than read action

Example: Logging system, Banking system, Finance system.

HBase

Written in Java
Billion row level and Million column level
Map/Reduce with Hadoop
Store in HDFS
No single point of failure
Strong consistency; Low availability; High scalability

Use: Big table, random and real-time read/write action

Example: Facebook message database

Redis

Written in c/c++
Blazing fast
Disk-backed in-memory database
Master-slave replication
Store as key/value format
Support set, list, hash
Has transactions
Value can be set to expire
Publish/Subscribe and Watch

Use: if data is not huge and changes frequently, store in Redis. Fast read/write data

Example: Stock price analysis, Real-time data processing