CS231N-How to Use Dataset

To build a complete machine learning model, we need a dataset which can train and test the model. But the test set cannot be touched at all until one time at the very end. This is to train a good generalization of classifier. Therefore, we must split train set in two: training set and validation set.

Validation set can help to tune hyperparameters so that classifier has a better generalization.

Sometimes data is small. To make full use of it, we can use cross-validation. That is, Split training set into 5. 4 parts as training set and 1 part as validation set. So the data has 5 combinations. In practice, however, it doesn’t use often because it is very expensive.

Data splits like below:
image_logo

Reference:

http://cs231n.github.io/classification/