l1 distance is Manhattan Distance.
l2 distance is Euclidean Distance.
l1 norm(is also called Lasso Regression) is
l2 norm(is also called Ridge Regression) is
Activation function: squashes number to a range. Smooth to find best gradient direction.
Batch normalization: do it before activation function. zero-centered and range from [0.1] commonly.
to prevent gradient vanish;
to promote learning rate
to reduce dependency from initialization.
Regularization:
Add term to loss
L1 Regularization;
L2 Regularization;
Elastic net(L1+L2)Dropout