Xinxin Tang

A technical blog


  • Home

  • Archives

ELK

Posted on 2018-05-22

ELK is short of Elasticsearch, Logstash and Kibana.

Elasticsearch

a real-time distributed searching and analyzing engine, which can be used to full-text search, structured search and analysis. Full-text search is based on Apache Lucene which is written by Java.

Features of Elasticsearch

  1. Real-time analysis
  2. Distributed file storing in real-time, taking each field as index
  3. Document-oriented, all objects are documents
  4. High availability, easy to scale, support cluster, sharding and replicas
  5. Friendly APIs, Supports JSON
  6. Zero-setting, automatically sharding
  7. RESTful style
  8. Based on Lucene and provides storage function

    Clusters

    Sharding and replicas

Logstash

a data collecting, filter and modifying engine in real-time, which is written by JRuby.

Features of Logstash

  1. Almost visit any data
  2. Binding with various external applications
  3. Supports resilient scale

Main components

  1. Shipper: send log data
  2. Broker: collect data, built-in Redis
  3. Indexer: data writes

Kibana

a open-source Apache protocol, which is written by JavaScript, to provide visualization function for Elasticsearch and Web platform.

Present data of other two components by HTTP protocols to users.

When to use?

use grep, awk as commands to analyze log directly, but it doesn’t convinent in large scale implement. Because we need to write a script for each machine and combine all results.(Low efficiency) Thus, we use ELK search info in one machine, and access by web application.

Advantages of ELK

  1. Developers can view log from web in detail
  2. help to collect log from each system, which distributes in a wide range

Deployment

L-E-K

Add Redis between L-E, which reduce pressure of server.

Replace Redis with Kafka as transactions is continuously increasing.

References

ELK
ELK Log
ELK
Best Log analysis tool

RabbitMQ

Posted on 2018-05-21 | Edited on 2018-05-22

RabbitMQ is one of middlewares. Middlewares are mostly used to decoupling, which makes producers don’t need to know who is the consumers.

When producers generate huge amount of data, consumers cannot consume with a proper rate. We need a middleware to store these data for future use, which achieves bidirectional decoupling.

Features of RabbitMQ

Exchange

  1. Retrieve and transmit messages to target queue
    direct: transmit messages to target queue of routingKey
    topic: transmit messages by rules
    headers:
    fanout: transmit messages to all binded queue

References

RabbitMQ

Spark

Posted on 2018-05-21 | Edited on 2018-06-12

Spark is a large-scale data computing engine, which is written by Scala.

Features of Apache Spark

  1. Faster speed: In memory,theoretically, the speed of Spark is 100 times faster than Speed of Hadoop
  2. In-memory computing engine: provides Cache mechanism to support continuous iterations or retrieve shared data, reduce cost of I/O of data.
  3. DAG engine: reduce cost of HDFS Read/Write during computing
  4. Use multi-threaded pool to reduce cost of task startup.
  5. Easy to use:
    over 80 high-level operators.
    supports Java, Python, Scala, R
    the amount of code is 2 to 5 times less than MapReduce’s
  6. Generalization: provides many libraries like Spark SQL, Spark MLlib, Spark GraphX, Spark Streaming.
  7. Supports multiple resource schedulers: Hadoop Yarn, Apache Mesos, and Standalone cluster scheduler.

Resilient Distributed Datasets(RDD) : Key Point

  1. Collections in cluster, which is read only and consists of Partitions.
  2. Stores in disk or memory.
  3. Operations is composed by transformation and action
  4. Automatically reorganization if it fails.

Features of Transformation and Action

  1. Different APIs: T: RDD[X]->RDD[Y]; A:RDD[X]->Z(Z is probably a basic data type like array)
  2. Lazy actions:
    Transformation only record RDD transform relation
    Action triggers executions of program

Spark Streaming

Splits streaming data into small pieces and processes batch data. It will reduce latency.

Difference between MR and Spark


MR:

  1. Only privides Map and Reduce operations
  2. Low efficiency of processing: bad at iterating computing, interface processing, and log analysis
  3. The middle result of Map needs to write into disk and Reduce needs to write into HDFS.
  4. High cost of Task Scheduling and Startups.
  5. Cannot fully use of memory
  6. Map and Reduce need to sort data.

Implements of Spark

  1. Log collecting and searching
  2. User’s tap prediction
  3. Common friends
  4. ETL task of SparkSQL and DAG

Reference

Spark: Computing Engine

Language C++

Posted on 2018-05-21 | Edited on 2018-06-01

C++ is a general-purpose, middle-level programming language, with high and low-level programming capabilities, and this is one of the most popular commercial programming languages.

C++ is a multi-paradigm programming language that supports object-oriented programming(OOP).

Uses of C++

C++ is used by programmers to create computer software. It is used to create general systems software, drivers for various computer devices, software for servers and software for specific applications and also widely used in the creation of video games.

C++ is mostly used to write device driver programs, system software, and applications that depend on direct hardware manipulation under real-time constraints.

OOP and C++

C++ supports OOP with four major principles of OOD:

1
2
3
4
Abstraction
Encapsulation
Inheritance
Polymorphism

Features of OO C++

  1. the main focus remains on data rather than procedures
  2. Object-oriented programs are segmented into parts called objects
  3. Data sturctures are designed to categorize the objects
  4. Data member and functions are tied together as a data structure
  5. Data can be hidden and cannot be accessed by external functions using access specifier
  6. Objects can communicate among themselves using functions
  7. New data and functions can be easily added anywhere within a program whenever retuired
  8. It follows a bottom-up approach.

Standard Libraries in C++

1
2
3
C++ core language
C++ standard library
STL(Standard Template Library)

Reference

C++

Big data-SQL

Posted on 2018-05-21 | Edited on 2018-05-30

SQL: Structured Query Language

Purpose of Database

Searching and accessing of data

Advantages of Using Database

  1. Database minimizes data redundancy to a great extent.
  2. Database can control inconsistency of data to a large extent
  3. Sharing of data is also possible using database
  4. Database enforce standards
  5. Use of Database can ensure data security
  6. Integrity can be managed using database

Reference

SQL

MySQL

Posted on 2018-05-21 | Edited on 2018-06-08

What is MySQL?

  1. MySQL is a databse system, used for developing web-based software applications
  2. MySQL used for both small and large applications
  3. It is a relational databse management system
  4. It is fast reliable and flexible and easy to use
  5. It supports standard SQL
  6. It is free to download and use
  7. It is presently developed, distributed, and supported by Oracle
  8. It is written by C, C++

Main features of MySQL

  1. MySQL server design is multi-layered with independent modules
  2. it is fully multithreaded by using kernel threads. It can use multiple CPUs if they are available
  3. it provides transactional and non-transactional storage engines
  4. it has very fast thread-based memory allocation system
  5. it supports in-memory heap table
  6. it handles large databases
  7. MySQL server works in client/server or embedded systems
  8. it works on many different platforms

SQL Commands

DDL

DDL is short name of Data Definition Language, which deals with database schemas and descriptions

1
2
3
4
5
6
CREATE: create database and its objects like table, index, views, store procedure, function, and triggers
ALTER: alters the sturcture of the existing database
DROP: delete objects from the database
TRUNCATE: remove all records from a table, including all spaces allocated for the records are removed
COMMENT: add comments to the data dictionary
RENAME: rename an object

DML

DML is short name of Data Manipulation Language, which deals with data manipulation and includes most common SQL statements such as SELECT, INSERT, UPDATA, and DELETE.

1
2
3
4
5
6
7
8
SELECT: retrieve data from a database
INSERT: insert data into a table
UPDATE: updates existing data with a table
MERGE: UPSERT operation(insert or update)
CALL: call a PL/SQL or Java subprogram
DELETE: delete all records from a database table
EXPLAIN PLAN: interpretation of the data access path
LOCK TABLE: concurrency control

DCL

DCL is short name of Data Control Language, which includes commands such as GRANT and mostly concerned with rights, permissions and other controls of the database system.

1
2
GRANT: allow users access privileges to database
REVOKE: withdraw users access privileges given by using the GRANT command

TCL

TCL is short name of Transaction Control Language, which deals with a transaction within a database.

1
2
3
4
COMMIT: commits a transaction
ROLLBACK: rollback a transaction in case of any error occurs
SAVEPOINT: to rollback the transaction making points within groups
SET TRANSACTION: specify characteristics of the transaction

Replication

Asynchronous replicas based on a Binary LoG;
Log Format:
Statement:SQL statements, smallest size
Row: event data, biggest size, cannot read directly
Mixed: save unsure data between statement and row

Master: replicate database
Slave: replicate database, table

Copy Format:
Coping based on Binary Log
Coping based on event using GTID

Semi-synchronized replication:

Reference

MySQL

Language-Python

Posted on 2018-05-21 | Edited on 2018-05-30

Python is a general-purpose object-oriented programming language with high-level programming capabilities, which includes features of C and Java.

Why Python?

  1. Interpreted language: Python is processed at runtime by Python Interpreter.
  2. Object-oriented language: It supports object-oriented features and techniques of programming
  3. Interactive programming language: Users can interact with the python interpreter directly for writing programs
  4. Easy language
  5. Straightforward syntax
  6. Easy to read
  7. Portable
  8. Extendable
  9. Scalable

What we can do with Python?

Create web and desktop applications

Data types

Numbers

int
float
long
complex

Sequences

Strings
Bytes/Byte array
Lists
Tuples

Boolean

Sets

Dictionaries

Language-Java

Posted on 2018-05-20 | Edited on 2018-05-30

Java is an object-oriented programming language with its runtime environment. It is a combination of features of C and C++ with some essential additional concepts. Java is well suited for both standalone and web application development and is designed to provide solutions to most of the problems faced by users of the internet era.

What is Java?

  1. An object-oriented programming language
  2. Java is a set of features of C & C++. It has obtained its format from C, and OOP features from C++.
  3. Java code that runs on one platform does not need to be recomplied to run on another platform; it’s called write once, run anywhere.
  4. Java Virtual Machine(JVM) executes java code, but it has been written in platform-specific languages such as C/C++/ASM. JVM is not written in Java and hence cannot be platform independent, and Java interpreter is a part of JVM.

Where is Java being used?

  1. JSP: Java Server Pages is used to create dynamic web pages, such as in PHP and ASP.
  2. Applets: Applets are another type of Java programs that are implemented on Internet browsers and are always run as part of a web document.
  3. J2EE: It is a platform-independent environment that is a set of different protocols and APIs and is used by various organizations to transfer data between each other. 4. JavaBeans: This is a set of reusable software components that can be easily used to create new and advanced applications.
  4. Mobile: Many types of games and applications are being made in Java.

Types of Java Applications:

  1. Web Application
  2. Standalone Application
  3. Enterprise Application
  4. Mobile Application

Features of Java:

  1. Object-oriented
  2. Platform independent: Write once, Runs anywhere
  3. Simple: Easy to understand
  4. Secure: It provides a wide range of protection from virues and malicious programs
  5. Portable
  6. Robust
  7. Multi-threaded
  8. Distributed

Structures of Java programs:

  1. Documentation Section: Write a comment here. Comments are beneficial for developer because they help them understand the code.
  2. Package Statements: You can create a package with any name. A package is a group of classes that are defined by a name.

    1
    package package_name
  3. Import Statements: If you want to use a class of another package, then you can do this by importing it directly into your program.

    1
    import calc.add;
  4. Interface Statements: Interfaces are like a class that includes a group of method declarations. It’s an optional section and can be used when programmers want to implement multiple inheritances within a program.

  5. Class Definition: A Java program may contain several class definitions.
  6. Main Method Class: Every Java stand-alone program requires the main method as the starting point of the program.

JVM, JDK, JRE

JVM: Java Virtual Machine
JDK: Java Development Kit(includes JRE plus tools for developing, debugging and monitoring Java applications)
JRE: Java Runtime Environment is used as a package that gives an environment to run the Java program on machines.

Access Control Modifier

default: Scope only inside the same package
public: Scope is visible to world
protected: Scope of the package and all subclasses
private: Scope only within the classes only

Non-Access Modifier

final: modifier to finalizing the implementations of classes, methods, and variables. It means not change for them.
static
abstract: modifier for creating abstract classes and methods
synchronized and volatile: modifiers for using in threads

Data type

Integer:
byte
short
int
long

Rational numbers:
float
double

Characters:
char

Conditional:
boolean

Number

integer
short
byte
float
long
double

Character

char[]
Character

OOP

Major Objectives of OOP

  1. Emphasis is on the data rather than the procedures
  2. Methods that operate on data are tied together in a data structure
  3. Programs are divided into small instance called Objects.
  4. Data remains hidden and cannot be accessed by external functions
  5. Objects may communicate with each other through methods
  6. The programs follow the bottom-up approach

Basic terms and features

  1. Classes and Objects
  2. Data abstraction
  3. Data encapsulation
  4. Inheritance
  5. Polymorphism
  6. Dynamic binding

Advantages of OOP

  1. Code recycle and reuse
  2. Wrapping up of data into a single unit
  3. Easy to partition the work in a project based on objects
  4. Software complexity can be easily handled and managed
  5. Use fo inheritane can eliminate redundant codes in a program

Constructors

It is treated as a special member function because its name is the same as the class name. Java constructors are invoked when their objects are created.

Characterstics

  1. An interface cannot have the constructor
  2. Constructors cannot be private
  3. A constructor cannot be abstract, static, final, native, strictfp, or synchronized
  4. A constructor can be overloaded
  5. Constructors cannot return a value
  6. Constructors don’t hava a return type; not even void
  7. An abstract class can have the constructor

Two types of Constructor

  1. Default

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    import java.util.*;
    import java.lang.*;
    import java.io.*;
    class clerk{
    int roll=101;
    String grade="Manager";
    void display(){System.out.println(roll+" "+grade);}
    public static void main(String args[]){
    clerk c1=new clerk();
    clerk c2=new clerk();
    c1.display();
    c2.display();
    }
    }
  2. Parameterized

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    import java.util.*;
    import java.lang.*;
    import java.io.*;
    class paramC{
    paramC(int a, int b){
    System.out.print("Parameterized Constructor");
    System.out.println(" having Two parameters");
    }
    paramC(int a, int b, int c){
    System.out.print("Parameterized Constructor");
    System.out.println(" having Three parameters");
    }
    public static void main(String args[]){
    paramC pc1 = new paramC(12, 12);
    paramC pc2 = new paramC(1, 2, 13);
    }
    }

Reference

Java tutorials

software architecture pattern

Posted on 2018-05-17

Common used Software Architectural Patterns.

Layered pattern

Presentation layer (UI layer)
Application layer (Service layer)
Business logic layer (Domain layer)
Data visit layer (Persistency layer)

Usage:
Desktip applicaiton
E-commerical web application

Client-server pattern

Two main components: One server-end & Many client-ends.

Usage:
Emails;
Shared files;
Bank services.

Master-slaves pattern

Master device & Slave devices.

Usage:
Database replication. Master database is defined as authorized data source, slave databases keep a consistency with master database.
All databases connect with each other by system bus.

Pipe-filter pattern

Broker pattern

Usage:
Middleware: Apache ActiveMQ, Apache Kafka, RabbitMQ

Peer-to-peer pattern

Event-bus pattern

MVC-Model View Controller pattern

Blackboard pattern

Interpreter pattern

Comparison:
Imgur

String

Posted on 2018-05-13

Common used functions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
1	char charAt(int index)
返回指定索引处的 char 值。
2 int compareTo(Object o)
把这个字符串和另一个对象比较。
3 int compareTo(String anotherString)
按字典顺序比较两个字符串。
4 int compareToIgnoreCase(String str)
按字典顺序比较两个字符串,不考虑大小写。
5 String concat(String str)
将指定字符串连接到此字符串的结尾。
6 boolean contentEquals(StringBuffer sb)
当且仅当字符串与指定的StringBuffer有相同顺序的字符时候返回真。
7 static String copyValueOf(char[] data)
返回指定数组中表示该字符序列的 String。
8 static String copyValueOf(char[] data, int offset, int count)
返回指定数组中表示该字符序列的 String。
9 boolean endsWith(String suffix)
测试此字符串是否以指定的后缀结束。
10 boolean equals(Object anObject)
将此字符串与指定的对象比较。
11 boolean equalsIgnoreCase(String anotherString)
将此 String 与另一个 String 比较,不考虑大小写。
12 byte[] getBytes()
使用平台的默认字符集将此 String 编码为 byte 序列,并将结果存储到一个新的 byte 数组中。
13 byte[] getBytes(String charsetName)
使用指定的字符集将此 String 编码为 byte 序列,并将结果存储到一个新的 byte 数组中。
14 void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)
将字符从此字符串复制到目标字符数组。
15 int hashCode()
返回此字符串的哈希码。
16 int indexOf(int ch)
返回指定字符在此字符串中第一次出现处的索引。
17 int indexOf(int ch, int fromIndex)
返回在此字符串中第一次出现指定字符处的索引,从指定的索引开始搜索。
18 int indexOf(String str)
返回指定子字符串在此字符串中第一次出现处的索引。
19 int indexOf(String str, int fromIndex)
返回指定子字符串在此字符串中第一次出现处的索引,从指定的索引开始。
20 String intern()
返回字符串对象的规范化表示形式。
21 int lastIndexOf(int ch)
返回指定字符在此字符串中最后一次出现处的索引。
22 int lastIndexOf(int ch, int fromIndex)
返回指定字符在此字符串中最后一次出现处的索引,从指定的索引处开始进行反向搜索。
23 int lastIndexOf(String str)
返回指定子字符串在此字符串中最右边出现处的索引。
24 int lastIndexOf(String str, int fromIndex)
返回指定子字符串在此字符串中最后一次出现处的索引,从指定的索引开始反向搜索。
25 int length()
返回此字符串的长度。
26 boolean matches(String regex)
告知此字符串是否匹配给定的正则表达式。
27 boolean regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len)
测试两个字符串区域是否相等。
28 boolean regionMatches(int toffset, String other, int ooffset, int len)
测试两个字符串区域是否相等。
29 String replace(char oldChar, char newChar)
返回一个新的字符串,它是通过用 newChar 替换此字符串中出现的所有 oldChar 得到的。
30 String replaceAll(String regex, String replacement)
使用给定的 replacement 替换此字符串所有匹配给定的正则表达式的子字符串。
31 String replaceFirst(String regex, String replacement)
使用给定的 replacement 替换此字符串匹配给定的正则表达式的第一个子字符串。
32 String[] split(String regex)
根据给定正则表达式的匹配拆分此字符串。
33 String[] split(String regex, int limit)
根据匹配给定的正则表达式来拆分此字符串。
34 boolean startsWith(String prefix)
测试此字符串是否以指定的前缀开始。
35 boolean startsWith(String prefix, int toffset)
测试此字符串从指定索引开始的子字符串是否以指定前缀开始。
36 CharSequence subSequence(int beginIndex, int endIndex)
返回一个新的字符序列,它是此序列的一个子序列。
37 String substring(int beginIndex)
返回一个新的字符串,它是此字符串的一个子字符串。
38 String substring(int beginIndex, int endIndex)
返回一个新字符串,它是此字符串的一个子字符串。
39 char[] toCharArray()
将此字符串转换为一个新的字符数组。
40 String toLowerCase()
使用默认语言环境的规则将此 String 中的所有字符都转换为小写。
41 String toLowerCase(Locale locale)
使用给定 Locale 的规则将此 String 中的所有字符都转换为小写。
42 String toString()
返回此对象本身(它已经是一个字符串!)。
43 String toUpperCase()
使用默认语言环境的规则将此 String 中的所有字符都转换为大写。
44 String toUpperCase(Locale locale)
使用给定 Locale 的规则将此 String 中的所有字符都转换为大写。
45 String trim()
返回字符串的副本,忽略前导空白和尾部空白。
46 static String valueOf(primitive data type x)
返回给定data type类型x参数的字符串表示形式。

1…345…7

Xinxin Tang

63 posts
44 tags
© 2018 Xinxin Tang
Powered by Hexo v3.7.1
|
Theme — NexT.Pisces v6.2.0