NoSQL databases
Two styles of distributing data:
1 Sharding: it distributes different data across multiple servers, so each server acts as the single source for a subset of data.
2 Replication: it copies data across multiple servers, so each bit of data can be found in multiple places.
Two forms:
Master-Slave replication: it makes one node the authoritative copy that handles writes while slaves synchronize with the master and may handle reads.
Peer-to-Peer replication: it allows writes to any node; the nodes coordinate to synchronize their copies of the data.
Master-slave replication reduces the chance of update conflicts but peer-to-peer replication avoids loading all writes onto a single server creating a single point of failure.
CAP theorem
Consistency
Availability
Partition-toleration
Distributed database is doing a trade-off between Consistency and Availability.
Most distributed database is “eventually consistency” in order to get high availability.
Why choose NoSQL Databases
Broad reasons:
1 To improve programmer productivity by using a database that better matches an application’s needs
2 To improve data access performance via some combination of handling larger data volumes, reducing latency, and improving throughput
General guidelines:
1 K/V databases are generally useful for storing session information, user profiles, preferences, shopping cart data. We would avoid using Key-value databases when we need to query by data, have relationships between the data being stored or we need to operate on multiple keys at the same time.
2 Document databases are generally useful for content management systems, blogging platforms, web analytics, real-time analytics, E-commerce applications. We would avoid using document databases for systems that need complex transactions spanning multiple operations or queries against varying aggregate structures.
3 Column family databases are generally useful for content management systems, blogging platforms, maintaining counters, expiring usage, heavy write volume such as log aggregation. We would avoid using column family databases for systems that are in early development, changing query patterns.
4 Graph databases are very well suited to problem spaces where we have connected data, such as social networks, spatial data, routing information for goods and money, recommendation engines
DB-1 MongoDB
1 Document oriented
The document is the unit of storing data in a MongoDB database
Documents map nicely programming language data types:直接将PHP array, JAVA Bean, Python Dictionary, JavaScript Object数据导入Document中。
Embedded documents reduce the need for joins
Dynamic schema makes change easier
JSON-style, stored as BSON
RDBMS MongoDB
Table Collection
Column Key
Value Value
Records/Rows Document/Object
2 JSON
JSON-JavaScript Object Notation
a syntax for storing and exchanging data
an easier-to-use alternative to XML
more human friendly
less data size
easier to read
BSON
Binary JSON used in MongoDB
Compared to JSON, BSON is designed to be efficient both in storage space and scan-speed.
存储时转为BSON,读取数据转化为JSON
3 High Performance
Written in C++;
Use of memory mapped files;
Serialization in BSON for fast parsing;
Indexes can include keys from embedded documents
High Scalable
Vertical scaling v.s. Horizontal scaling
MongoDB supports horizontal scaling through sharding
Structures
A MongoDB instance may have zero or more databases
A database may have zero or more collections
can be though of as the relation(table) in SQL database
A collection may have zero or more Documents
Docs in the same collection don’t even need to have the same fields
Docs are the records in RDBMS
A document may have one or more fields
MongoDB indexes is much like their RDBMS counterparts
Pros
Simple queries;
Much faster than SQL database
Easier and faster integration of data
Cons
not well suited for heavy and complex transactions systems
PyMongo
Super easy to use;
Thread safe;
Built-in connection pool when using MongoClient;
No need to explicitly close to a connection
Steps
1 Create a client connecting to MongoDB instance on 270171
2
3
4from pymongo import MongoClient
client = MongoClient('localhost:27017')
db = client.real_estate_smart_view
read_estate_smart_view: database name
2 Insert a new document1
db.users.insert({"Last name": "Wang", "First name": "Wei"})
3 Query a document1
db.users.find({"Last name":"Mao"})
4 Update an existing document1
2
3
4
5
6
7
8
9
10db.users.update(
{"Last name": "Mao",
{
"$set":{
"First name": "Qian"
}
}
}
)
$set is a update operator which is to set the field of using the provided value