Yuchao's blogspot: Deep Learning ND 1, Intro to Neural Network

Monday, February 13, 2017

Deep Learning ND 1, Intro to Neural Network

I totally understand why Udacity developed Deep Learning Foundation Nanodegree. Because its previous course: intro to deep learning by Vincent is too rush to “sell” TensorFlow without laying out a solid foundation.

However, I am still skeptical about Siraj’s rap-style lecture. The fly-in video clips are actually very distracting for learners. Well, he is trying to show “these deep concepts are really fun”.

Luckily, other lecturers are much more patient to guide you step by step, with crafted exercises to make sure you get it. These are the solid effort that worth the money.

I am now one capstone apart from graduation in Machine Learning Engineer Nanodegree. I already know quite a few knowledge. So the learning notes here are only complementary, which fill my knowledge gap. The most valuable thing is the projects and project feedbacks, which get you familiar how it actually works.

Lesson 1-9

7 Intro to Neural Network

gradient decent

The derivation is very nice here.

There are many error functions, one is square of difference: $E=\frac{1}{2}[y-f(\sum_i w_ix_i)]^2$

There are many activation functions, one is sigmoid: $\sigma(z)=\frac{1}{1+exp(-z)}$

weight update: $\Delta w_i =- \eta \frac{dE}{dw_i}= \eta (y-\sigma)\frac{d\sigma}{dw_i}=\eta (y-\sigma)(1-\sigma)\sigma x_i$

where $\eta$ is learning rate.

For convenience, the middle term is defined as error term: $\delta=(y-f(z))\frac{df}{dz}$

so the weight update function is more universal in case of different activation functions: $\Delta w_i =- \eta \frac{dE}{dw_i}= \eta \delta x_i$

backpropagation

If there is one hidden layer, the weight update of the hidden layer is similar, just change input x to the hidden sigmoid a $\Delta w_h =- \eta \frac{dE}{dw_h}= \eta \delta_o h$

The weight update of the input layer will need another chain rule:

$\Delta w_i =- \eta \frac{dE}{dw_i}= \eta \delta_o \frac{\partial w_h \cdot h }{\partial w_i}=\eta \delta_o w_h (1-h)hx_i$

Because the maximum derivative of the sigmoid function is 0.25, the decent power will diminish quickly with more hidden layers.

bonus: A great video from Frank Chen about the history of deep learning. It’s a 45-minute video, sort of a short documentary, starting in the 1950s and bringing us to the current boom in deep learning and artificial intelligence.

8 Project 1: your first neural network

Note for bike-sharing project: I spend some time to figure the right dimensional shape for each term. The shape is opposite to the previous exercise. Each time only feeds one instance.

Detailed implementation: https://github.com/jychstar/NanoDegreeProject/tree/master/DeepND

Dataset documentation: https://github.com/jychstar/datasets/blob/master/bikeShare/bikeShare_DC.md.

9 model evaluation

# Classification Accuracy
from sklearn.metrics import accuracy_score
# Regression Metrics
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
form skelarn.linear_model import LinearRegression
# K-Fold cv
from sklearn.model_selection import KFold
kf = KFold(tota_size, test_size, shuffle=True)

underfitting: error due to bias, oversimplify the problem
overfitting: error due to variance, overcomplicating the problem, try to remember data not generalize

model complexity graph

4 Apply Deep Learnig

This is a fancy lesson, a collection of some interesting examples.

style transfer

conda create -n style python=2
source activate style
conda install -c conda-forge tensorflow=0.11.0
conda install scipy pillow

python evaluate.py --checkpoint ./rain_princess.ckpt --in-path ./examples/content/chicago.jpg --out-path ./output_image.jpg

Deep Traffic

http://selfdrivingcars.mit.edu/deeptrafficjs/

http://selfdrivingcars.mit.edu/deeptraffic/

Yuchao's blogspot