NoSQL databases

Two styles of distributing data:

1 Sharding: it distributes different data across multiple servers, so each server acts as the single source for a subset of data.
2 Replication: it copies data across multiple servers, so each bit of data can be found in multiple places.
Two forms:
Master-Slave replication: it makes one node the authoritative copy that handles writes while slaves synchronize with the master and may handle reads.
Peer-to-Peer replication: it allows writes to any node; the nodes coordinate to synchronize their copies of the data.
Master-slave replication reduces the chance of update conflicts but peer-to-peer replication avoids loading all writes onto a single server creating a single point of failure.

CAP theorem

Consistency
Availability
Partition-toleration

Distributed database is doing a trade-off between Consistency and Availability.
Most distributed database is “eventually consistency” in order to get high availability.

Why choose NoSQL Databases

Broad reasons:
1 To improve programmer productivity by using a database that better matches an application’s needs
2 To improve data access performance via some combination of handling larger data volumes, reducing latency, and improving throughput

General guidelines:
1 K/V databases are generally useful for storing session information, user profiles, preferences, shopping cart data. We would avoid using Key-value databases when we need to query by data, have relationships between the data being stored or we need to operate on multiple keys at the same time.
2 Document databases are generally useful for content management systems, blogging platforms, web analytics, real-time analytics, E-commerce applications. We would avoid using document databases for systems that need complex transactions spanning multiple operations or queries against varying aggregate structures.
3 Column family databases are generally useful for content management systems, blogging platforms, maintaining counters, expiring usage, heavy write volume such as log aggregation. We would avoid using column family databases for systems that are in early development, changing query patterns.
4 Graph databases are very well suited to problem spaces where we have connected data, such as social networks, spatial data, routing information for goods and money, recommendation engines

DB-1 MongoDB

1 Document oriented

The document is the unit of storing data in a MongoDB database
Documents map nicely programming language data types:直接将PHP array, JAVA Bean, Python Dictionary, JavaScript Object数据导入Document中。
Embedded documents reduce the need for joins
Dynamic schema makes change easier
JSON-style, stored as BSON

RDBMS MongoDB
Table Collection
Column Key
Value Value
Records/Rows Document/Object

2 JSON

JSON-JavaScript Object Notation
a syntax for storing and exchanging data
an easier-to-use alternative to XML
more human friendly
less data size
easier to read

BSON
Binary JSON used in MongoDB
Compared to JSON, BSON is designed to be efficient both in storage space and scan-speed.

存储时转为BSON,读取数据转化为JSON

3 High Performance

Written in C++;
Use of memory mapped files;
Serialization in BSON for fast parsing;
Indexes can include keys from embedded documents

High Scalable

Vertical scaling v.s. Horizontal scaling
MongoDB supports horizontal scaling through sharding

Structures

A MongoDB instance may have zero or more databases
A database may have zero or more collections
can be though of as the relation(table) in SQL database
A collection may have zero or more Documents
Docs in the same collection don’t even need to have the same fields
Docs are the records in RDBMS
A document may have one or more fields
MongoDB indexes is much like their RDBMS counterparts

Pros

Simple queries;
Much faster than SQL database
Easier and faster integration of data

Cons

not well suited for heavy and complex transactions systems

PyMongo

Super easy to use;
Thread safe;
Built-in connection pool when using MongoClient;
No need to explicitly close to a connection

Steps
1 Create a client connecting to MongoDB instance on 27017

from pymongo import MongoClient

client = MongoClient('localhost:27017')
db = client.real_estate_smart_view

read_estate_smart_view: database name

2 Insert a new document

1	db.users.insert({"Last name": "Wang", "First name": "Wei"})

3 Query a document

1	db.users.find({"Last name":"Mao"})

4 Update an existing document

db.users.update(
  {"Last name": "Mao",
    {
      "$set":{
      "First name": "Qian"
      }
    }

  }
  )

$set is a update operator which is to set the field of using the provided value