CS231N-How to Use Dataset

Posted on 2018-04-29 | Edited on 2018-05-30

To build a complete machine learning model, we need a dataset which can train and test the model. But the test set cannot be touched at all until one time at the very end. This is to train a good generalization of classifier. Therefore, we must split train set in two: training set and validation set.

Validation set can help to tune hyperparameters so that classifier has a better generalization.

Sometimes data is small. To make full use of it, we can use cross-validation. That is, Split training set into 5. 4 parts as training set and 1 part as validation set. So the data has 5 combinations. In practice, however, it doesn’t use often because it is very expensive.

Data splits like below:
image_logo

Reference:

http://cs231n.github.io/classification/

ML-NumPy Tutorial

Posted on 2018-04-29 | Edited on 2018-05-30

NumPy is one of most popular data analysis package in Python. I use it a lot and here is a NumPy review. Let’s get started!

NumPy’s main object is the homogeneous multidimensional array. In NumPy dimensions are called axes.

For example, [1,2,3] has one axis and length is 3. [[1,2,3],[4,5,6]] has two axes. The first axis length is 2 and the second axis has a length of 3.

Python has a built-in class array.array which only handles one-dimension array and offers less functionality. NumPy’s array class is called ndarray and more powerful. It has many attributes like:

ndarray.ndim: the number of axes of the array; 3-D, 4D
ndarray.shape: the dimensions of the array; 3-D: (2,3,4)
ndarray.size: the total number of elements of the array. 
ndarray.dtype: the type of elements in the array
ndarray.itemsize: the size in byte of each element of the array
ndarray.data: the buffer containing the actual elements of the array

Examples:

import numpy as np
a = np.arange(15).reshape(3,5)  # array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
a.shape()  # (3, 5)
a.ndim  # 2
a.dtype.name  # 'int64'
a.itemsize  # 8
a.size  # 15
type(a)  # <type 'numpy.ndarray'>
b = np.array([6,7,8])  # array([6,7,8])
type(b)  # <type 'numpy.ndarray'>

Array Creation:

import numpy as np
a = np.array([2,3,4])
b = np.array([(1,2,3,4), (5,6,7,8)])

c = np.zeros((3,4))  # 2-D with length 3 and 4 for two axis
d = np.zeros((3,4,5), dtype=np.int16)  # specify a data type
e = np.empty((2,3))  # uninitialized, output may vary

f = np.arrange(10, 30, 5)  # create a sequence of number. Start from 10, end with 30 (not included), each iteration adds 5.

Others:

1	zeros_like, ones, ones_like, numpy.random.rand, numpy.random.randn

Basic operations:

a = np.array([20, 30, 40, 60])
b = np.arange(4)  # array([0,1,2,3])
c = a-b  # array([20, 29, 38, 57])
print(b**2)  # array([0, 1,4,9])
print(10*np.sin(a)) 
print(a<35)  array([True, True, False, False])

Product operator * usage in NumPy arrays.

a = np.array([[1,1],[0,1]])
b = np.array([[2,0],[3,4]])
print(a*b)  # array([[2,0], [0,4]])
print(a.dot(b))  # array([[5,4],[3,4]])
print(np.dot(a, b))  # array([[5,4],[3,4]])
print(np.arange(12).reshape(3,4).sum(axis=0))  # sum of each column
print(np.arange(12).reshape(3,4).min(axis=1))  # sum of each row

Universal Functions:

Popular use:

1	all, any, argmax, argmin, argsort, average, bincount, diff, dot, floor, inner, max, mean, mdeian, min, minimum, nonzero, outer, round, re, sort, std, sum, transpose, var, vectorize, where

To summary, these are basic NumPy usage. I’ll continue to write more blogs to introduce NumPy.

Reference:

https://docs.scipy.org/doc/numpy/user/quickstart.html#further-reading

Java-Lambda Function

Posted on 2018-04-29 | Edited on 2018-05-30

Lambda function first appears in Java8. It can be easily used to traverse with forEach function. Moreover, the code will be concise if we use it instead of Runnable function.

1. Iteration:

Here are three ways to print each element. First method is for loop, second and third is Lambda function. Last two functions look more concise.

Solution 1:

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);

for (int element : numbers) {    

    System.out.prinln(element);

}

Solution 2:

1
2
3

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);

numbers.forEach(x -> System.out.println(x));

Solution 3:

1
2
3

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);

numbers.forEach(System.out::println);

2. Event listener:

Using Lambda function has less code line but it has more curly braces.

Solution 1:

button.addActionListener(new ActionListener(){  

     @Override    

     public void actionPerformed(ActionEvent e) {

             //handle the event    

     }

});

Solution 2:

button.addActionListener(e -> {    

      //handle the event

});

3. Predicate Interface:

Predicate interface in java.util.function can be used to filter. If you need to process multiple objects and execute the same process logic, these logics can be encapsulated in function filter. Here also has three methods to make a comparison.

Solution 1:

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);

List<String> words = Arrays.asList("a", "ab", "abc");

numbers.forEach( x -> {

    if (x % 2 == 0) {  

        //process logic

    }

})

words.forEach( x -> {

    if (x.length() > 1) {

         //process logic

    }

})

Solution 2:

public static void main(String[] args) {
   List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
   List<String> words = Arrays.asList("a", "ab", "abc");
   filter(numbers, x -> (int)x % 2 == 0);
   filter(words, x -> ((String)x).length() > 1);
}

public static void filter(List list, Predicate condition) {
   list.forEach(x -> {
       if (condition.test(x)) {
           //process logic
       }
   })
}

Solution 3:

public static void filter(List list, Predicate condition) {
   list.stream().filter(x -> condition.test(x)).forEach(x -> {
       //process logic
   })
}

4. Map:

Use function map convert data to another list, then reverse it to list type using Collect.

1
2
3

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
List<Integer> mapped = numbers.stream().map(x -> x * 2).collect(Collectors.toList());
mapped.forEach(System.out::println);

5. Reduce:

Reduce action means obtaining an output based on two variables. For example, two variables execute add operation then return a sum.

List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);

int sum = numbers.stream().reduce((x, y) -> x + y).get();

System.out.println(sum);

6. Replace Runnable:

Makes code more compact.

Solution 1:

Runnable r = new Runnable() {    

    @Override    

    public void run() {        

        //to do something    

    }

};

Thread t = new Thread(r);

t.start();

Solution 2:

Runnable r = () -> {    

    //to do something

};

Thread t = new Thread(r);

t.start();

Solution 3:

Thread t = new Thread(() -> {    

    //to do something

});

t.start;

Xinxin Tang

63 posts

44 tags