ML-NumPy Tutorial

NumPy is one of most popular data analysis package in Python. I use it a lot and here is a NumPy review. Let’s get started!

NumPy’s main object is the homogeneous multidimensional array. In NumPy dimensions are called axes.

For example, [1,2,3] has one axis and length is 3. [[1,2,3],[4,5,6]] has two axes. The first axis length is 2 and the second axis has a length of 3.

Python has a built-in class array.array which only handles one-dimension array and offers less functionality. NumPy’s array class is called ndarray and more powerful. It has many attributes like:

1
2
3
4
5
6
ndarray.ndim: the number of axes of the array; 3-D, 4D
ndarray.shape: the dimensions of the array; 3-D: (2,3,4)
ndarray.size: the total number of elements of the array. 
ndarray.dtype: the type of elements in the array
ndarray.itemsize: the size in byte of each element of the array
ndarray.data: the buffer containing the actual elements of the array

Examples:

1
2
3
4
5
6
7
8
9
10
import numpy as np
a = np.arange(15).reshape(3,5)  # array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
a.shape()  # (3, 5)
a.ndim  # 2
a.dtype.name  # 'int64'
a.itemsize  # 8
a.size  # 15
type(a)  # <type 'numpy.ndarray'>
b = np.array([6,7,8])  # array([6,7,8])
type(b)  # <type 'numpy.ndarray'>

Array Creation:

1
2
3
4
5
6
7
8
9
import numpy as np
a = np.array([2,3,4])
b = np.array([(1,2,3,4), (5,6,7,8)])

c = np.zeros((3,4))  # 2-D with length 3 and 4 for two axis
d = np.zeros((3,4,5), dtype=np.int16)  # specify a data type
e = np.empty((2,3))  # uninitialized, output may vary

f = np.arrange(10, 30, 5)  # create a sequence of number. Start from 10, end with 30 (not included), each iteration adds 5.

Others:

1
zeros_like, ones, ones_like, numpy.random.rand, numpy.random.randn

Basic operations:

1
2
3
4
5
6
a = np.array([20, 30, 40, 60])
b = np.arange(4)  # array([0,1,2,3])
c = a-b  # array([20, 29, 38, 57])
print(b**2)  # array([0, 1,4,9])
print(10*np.sin(a)) 
print(a<35)  array([True, True, False, False])

Product operator * usage in NumPy arrays.

1
2
3
4
5
6
7
a = np.array([[1,1],[0,1]])
b = np.array([[2,0],[3,4]])
print(a*b)  # array([[2,0], [0,4]])
print(a.dot(b))  # array([[5,4],[3,4]])
print(np.dot(a, b))  # array([[5,4],[3,4]])
print(np.arange(12).reshape(3,4).sum(axis=0))  # sum of each column
print(np.arange(12).reshape(3,4).min(axis=1))  # sum of each row

Universal Functions:

Popular use:

1
all, any, argmax, argmin, argsort, average, bincount, diff, dot, floor, inner, max, mean, mdeian, min, minimum, nonzero, outer, round, re, sort, std, sum, transpose, var, vectorize, where

To summary, these are basic NumPy usage. I’ll continue to write more blogs to introduce NumPy.

Reference:

https://docs.scipy.org/doc/numpy/user/quickstart.html#further-reading