Thursday, February 23, 2017

Data Analyst ND 1, Statistics

7-day free trial: 2017-2-22~3.1
Program director: Mat Leonard
content developer: Caroline Buckey

Update on 2017-2-27

I soon found there are so much overlap with my existing knowledge, I can even finish all the projects in the 7-day trial! But I change my goal from getting a nano degree to filling my knowledge gap. I am close to get my first data scientist job, a nanodegree is not so important to me now .
projects recap:
  1. bike share. The main takeaway is how to slice the dataset to smaller size and how to use datetimemodule to parse the timestamp
  2. Stroop effect. hypothesis test, epecially t-test.
  3. Titanic Analysis. get you familar with numpy and pandas. This is the same with stage 5 (choose your path) of Intro to Programming ND.
  4. open street map. focus on data wrangling skills. parse various data types: csv, excel, JSON, XML, HTML. extract data from database, SQL or NoSQL.
  5. (Data Set Options) practise R
  6. Enron Email. This is the same with the free course “Intro to machine learning”
  7. (Data Set Options) the course content is about JavaScript plotting API: D3 and Dimple. And the fancy way to draw world map!
  8. Free trial screener of udacity charged courses. This is the same the free course “A/B testing

P0 Bay Area Bike Share Analysis

Two of the major parts of the data analysis process: data wrangling and exploratory data analysis.
before you even start looking at data, think about some questions you might want to understand about the bike share data.
After all, your best analysis is only as good as your ability to communicate it.
When dealing with a lot of data, it can be useful to start by working with only a sample of the data.

P1: Statistics

The course materials in this section has 11 lessons + placement advisor to help you locate your knowledge gap in case you already know some statistics.
Actually, Udacity has provided 2 free courses:

Staticstics placement Advisor

If you are comfortable:
  • Performing a hypothesis test using a two-sample t-test
  • Calculating a p-value and a confidence interval
  • Deciding whether to reject the null based on the result of the above
continue straight to Project 1.

constructs and their operational definition

constructs are concepts difficult to define and measure. Scientists try to quantify them by attemping their operational definition. units are at the heart of measurement.
  • Memory:
  • Guilt
  • Love
  • Stress: levels of cortisol (the stress hormone)
  • depression: Beck’s Depression Inventory: 21 questions
  • anger: number of profanities uttered per min
  • happiness: ratio of minutes spent simling to minutes not smiling

interpreting scatter plots

We can infer a trend, but it is not necessary true.
Correlation does not imply causation. Golden Arches Theory of conflict prevention: No 2 countries with a McDonald’s have ever gone to war since opening McDonald’s.
  • show relationships: observational study, surveys
  • show causation: controlled experiment. Use double blind to avoid placebo effect (unconsciously or consciously alter the measurement).
Most research studies only use a sample because collecting data about a entire population is way too expensive. As a result, we expect our estimates will not be exactly accurate when we do this.
A fixed number is called constant, a changeable number is called variable.
\bar{x} is for sample mean, \mu is for population mean.
We can make prediction by either correlation or causation.

standard normal distribution

Any normal distribution can be normalized by the z-score: z=\frac{x-\mu}{\sigma}
z-table shows the probality the something is less than a z-score.
standard error(SE) is the standard deviation of the distribution of the sample means. SE= \frac{\sigma}{\sqrt(n)}
This is called central limit theorem.
margin of error: 95% of sample means fall within \frac{2\sigma}{\sqrt(n)}
critical value: 98% of sample means fall within \frac{2.33\sigma}{\sqrt(n)}
The level of unlikely is called alpha levels: 5%, 1%, 0.1%. For one-tailed critical region, these correspod to z-value of 1.65, 2.33, 3.08.
e.g. if z= 1.82, we say \bar x is significant at p <0.05. This is interesting. ==The outlier is statistically significant.==
We can also have two-tailed critical region, similar to


H0(null hypothesis): the mean of intervention is outside the critical region.
Ha(alternative hypthesis): the mean of intervention is inside the critical region.
we can’t prove that the null hypothesis is true. we can only obtain evidence to reject the null hypothesis.
e.g. Most dogs have 4 legs. (significance level = 50%)

t distribution

z-test works when we know mu and sigma, but we don’t.
Degree of freedoms are the number of pieces of information that can be freely varied without violating any given restrictions.They are independent pieces of information available to estimate another piece of information.
sample deviation = \sqrt{\frac{\sum_i (x_i-\bar x)^2}{n-1}}, n-1 is the effective sample size.
t-distribution is kind of a flatterned form of normal distribution. As the degree of freedom tends to infinity, they overlap with each other.
t-value can be obtained by checking t-Table
from sample to calculte the t-value:
def t_value(nums, mean):
  length = len(nums)
  x_bar= np.mean(nums)
  sample_sd = np.sqrt(np.var(a)*length/(length-1))
  t = (x_bar-mean)/sample_sd*np.sqrt(length)
  print('x_bar={0:.2f}\n sample_sd={1:.3f}\n t={2:.3f}'.format(x_bar,sample_sd,t))
def t_value(x_bar,ssd, mean,num):
from t-value to get the 2-tailed p-value: Link to GraphPad
One-sample test, dependent samples, repeated measures:
  • two conditions,
  • longitudinal(t1,t2),
  • pretest & posttest
this approach is cheap, but the downside is the carry-on effects: 2nd measurement can be affected by first treatment, and order may influnce results.

types of effect size measue

difference: mean, deviation
statistical significance doen’t mean important, large, sizeable or meaningful. It means rejected the null, results are not likely due to chance (sampling error).
Cohen’s d measure standardized mean difference.

independent samples

standard deviation s^2 = s1^2 +s2^2
standard error = s/\sqrt n=\sqrt{s_1^2/n1+s_2^2/n2}

Sunday, February 19, 2017

Self-driving Car ND A1, finding lane

I am so excited to enroll in Udacity’s flagship program: self-driving car!
This is term one, which includes 5 projects and 24 lessons (from 2.16 to 5.29):
  • Lane-Finding Project (due 2.25)
  • Traffic Sign Classifier Project (3.27)
  • Behavioral Cloning Project (4.17)
  • Advanced Lane Finding Project(5.1)
  • Vehicle Tracking Project (5.15)
C++ will be used in term 2 and term 3, with topics in sensor fusion and path planning.
For a quick glance, I notice quite a few overlapping contents with other nanodegree programs. As a result of the overlap, so far I already finish 2/3 of the total 24 lessons.
  • 3 lessons are exactly the same from the free udacity course (intro to machine learning) by Sebastian.
  • 5 lessons are career services (resume, link-in, github, interview), pretty much the same with MLND.
  • 7 lessons are deep learning basis. Actually, the original TensorFlow Deep learning by Vincent has just been overhauled.
So after subtracting the obove courses, the pure, authentic lessons that directly teach self-driving car are only 9. Obvious, these courses are never enough to build a self-driving car. These courses are just a window to a new world. Be prepared and actively learn much more!

Available sources

David Silver Princeton graduate in CS (2004), previously worked on automous car at Ford for 3 months.
Ryan Keenan recovering astrophysicist (2007-2015)
By the way, the team in CMU to lead in DARPA Grand Challenge (1st in 2004, 2nd in 2005 due to home-assembled hardware, 1st in 2007) was lead by Whittaker, who has all his degrees in Civil Engineering in 1970s. He said,
If you haven’t done everything, you haven’t done a thing.
Your mentor, Pratheerth Padman, Martijn de Boer
  • Check in with you weekly to make sure that you are on track
  • Help you to set learning goals
  • Guide you to supplementary resources when you get stuck
  • Respond to any questions you have about the program.
Self-driving car is taught by both approaches:
  • Robotics
  • Deep Learning
A free Udacity course, Introduction to Computer Vision.


Environment setup. The problem is opencv3 only compatible with python 3.5 while the latest python is 3.6. So I build a new env for this project.
conda create --name=car python=3.5 anaconda
pip install pillow
source activate car
conda install -c menpo opencv3=3.1.0
pip install moviepy

import imageio
Read a image file has many ways:
  1. scipy.ndimage.imread() # rgb format
  2. matplotlib.image.imread() # rgb format
  3. cv2.imread() # bgr format
Be careful if you use cv2 to read image.

Finding Lane lines

5 region masking

triangle mask
ysize, xsize = image.shape[0:2] # size is (y,x) format
region_select = np.copy(image)
left_bottom = [0, 539] # point is (x,y) format
right_bottom = [900, 300]
apex = [400, 0]

# Fit lines (y=Ax+B) to identify the  3 sided region of interest
# np.polyfit() returns the coefficients [A, B] of the fit
fit_left = np.polyfit((left_bottom[0], apex[0]), (left_bottom[1], apex[1]), 1)
fit_right = np.polyfit((right_bottom[0], apex[0]), (right_bottom[1], apex[1]), 1)
fit_bottom = np.polyfit((left_bottom[0], right_bottom[0]), (left_bottom[1], right_bottom[1]), 1)

# Find the region inside the lines
XX, YY = np.meshgrid(np.arange(0, xsize), np.arange(0, ysize))
region_thresholds = (YY > (XX*fit_left[0] + fit_left[1])) &  (YY > (XX*fit_right[0] + fit_right[1])) & (YY < (XX*fit_bottom[0] + fit_bottom[1]))

# Color pixels red which are inside the region of interest
region_select[region_thresholds] = [255, 0, 0]

10 Canny Edge detection

John F. Canny developed this algorithm in 1986.
# Define a kernel size for Gaussian smoothing / blurring
kernel_size = 5 # Must be an odd number (3, 5, 7...)
blur_gray = cv2.GaussianBlur(gray,(kernel_size, kernel_size),0)
# Define our parameters for Canny and run it
low_threshold = 50
high_threshold = 100
edges = cv2.Canny(blur_gray, low_threshold, high_threshold)

13 Hough Transform

In 1962, Paul Hough devised a method for representing lines in parameter space, which is called Hough space.
y = mx + b
Usually, we use (x,y) space, but Hough space is (m,b) or (\rho, \theta)
# Define the Hough transform parameters
# Make a blank the same size as our image to draw on
rho = 1
theta = np.pi/180
threshold = 1
min_line_length = 10
max_line_gap = 1
line_image = np.zeros_like(image) #creating a blank to draw lines on
# Run Hough on edge detected image
lines = cv2.HoughLinesP(edges, rho, theta, threshold, np.array([]),min_line_length, max_line_gap)
# Iterate over the output "lines" and draw lines on the blank
for line in lines:
    for x1,y1,x2,y2 in line:
documentation link:

project pipeline

My pipeline consisted of following steps:
step Action code functions
1 convert image file to 3D np array cv2.imread
2 convert image to grayscale cv2.cvtColor
3 smooth image to suppress noise cv2.GaussianBlur
4 get edges by canny’s gradient method cv2.Canny
5 propose a mask region cv2.fillPoly
6 retain masked edges cv2.bitwise_and
7 probalistic hough transform cv2.HoughLinesP
8 draw lines on a blank image cv2.line
9 merge original image with lines cv2.addWeighted
The highlighted part is key steps.
The corresponding result for each step is:
Note that the Hough line points are drawn inconsistent with the normal mathematical coordinates.
I used reviewer’s suggestions to tune the parameters of cv2.HoughLinesP() and get much better results. The problem is still false positive. The idea I have is to use a queue to store the point positions that are closest to car (max of y value), and use this to pick the “right points”. Then np.polyfit() the points and plot 2 nice lines.

for video

from moviepy.editor import VideoFileClip
from IPython.display import HTML
white_output = 'white.mp4'
clip1 = VideoFileClip("solidWhiteRight.mp4")
white_clip = clip1.fl_image(process_image) 
%time white_clip.write_videofile(white_output, audio=False)
HTML("""<video width="960" height="540" controls><source src="{0}"></video>""".format(white_output))
where process_image is a user-define function that process a 3-channel image and output an image with lines drawn on.

Friday, February 17, 2017


On 2017.2.16 at Dev Summit 2007, TensorFlow 1.0 is announced! It has been 15 months since it was open sourced.
Tensorflow is also working on high-level APIs for better user experience. tf.layersis already available but without further detail of how to train.
tf.keras will be available around TensorFlow 1.2.
I love what Lily Peng said about the career-changing:
In a previous life I was a doctor, and I’ve been repurposed as a product manager at google.

how to use Tensorboard

In Dev Summit, Dandelion demonstrated the magic of TensorBoard. The highlighted codes in the slides are very impressive. video, source code and slides
The use of tensorboard is actually 2 steps:
  1. use a tf.summary.FileWriter(folder_name) object to add everything you want to show, which will be stored in a folder in lcoal disk.
  2. In terminal, tensorboard --logdir=folder_name which will output data to something like “”. Open it in a browser.
So the major work in 1st step. Example code is here.
Several tricks:
  1. use with tf.name_scope() to name a group of tensors or operations.
  2. use name= to name a single tensor
  3. use writer = tf.summary.FileWriter( folder_name)to create writer
  4. use writer.add_graph(sess.graph) to add graph. Note: if you revise the graph, remember to reset it to avoid ghost graph.
  5. use writer.add_summary(scalar/histogram/image tensor,step) to add a point for the plotting data. Each tensor will be wrapped by tf.summary.scalar/histogram/image and evaluated at step.
  6. tf.summary.merge_all() is supposed to simplified the previous codes. but it is currently buggy.
  7. model saving/restoring is 2 lines of code:
    saver = tf.train.Saver()
    saver.restore(sess, "mymodel.ckpt") # after session begin, "mymodel.ckpt") # before session ends
similarly, you can save/restore dataset in 2 lines of code
   from sklearn.externals import joblib
   joblib.dump(data, 'dataset.pkl') 
   data = joblib.load('dataset.pkl')

Monday, February 13, 2017

Deep Learning ND 1, Intro to Neural Network

I totally understand why Udacity developed Deep Learning Foundation Nanodegree. Because its previous course: intro to deep learning by Vincent is too rush to “sell” TensorFlow without laying out a solid foundation.
However, I am still skeptical about Siraj’s rap-style lecture. The fly-in video clips are actually very distracting for learners. Well, he is trying to show “these deep concepts are really fun”.
Luckily, other lecturers are much more patient to guide you step by step, with crafted exercises to make sure you get it. These are the solid effort that worth the money.
I am now one capstone apart from graduation in Machine Learning Engineer Nanodegree. I already know quite a few knowledge. So the learning notes here are only complementary, which fill my knowledge gap. The most valuable thing is the projects and project feedbacks, which get you familiar how it actually works.
Lesson 1-9

7 Intro to Neural Network

gradient decent

The derivation is very nice here.
There are many error functions, one is square of difference: E=\frac{1}{2}[y-f(\sum_i w_ix_i)]^2
There are many activation functions, one is sigmoid: \sigma(z)=\frac{1}{1+exp(-z)}
weight update: \Delta w_i =- \eta \frac{dE}{dw_i}= \eta (y-\sigma)\frac{d\sigma}{dw_i}=\eta (y-\sigma)(1-\sigma)\sigma x_i
where \eta is learning rate.
For convenience, the middle term is defined as error term: \delta=(y-f(z))\frac{df}{dz}
so the weight update function is more universal in case of different activation functions: \Delta w_i =- \eta \frac{dE}{dw_i}= \eta \delta x_i


If there is one hidden layer, the weight update of the hidden layer is similar, just change input x to the hidden sigmoid a \Delta w_h =- \eta \frac{dE}{dw_h}= \eta \delta_o h
The weight update of the input layer will need another chain rule:
\Delta w_i =- \eta \frac{dE}{dw_i}= \eta \delta_o \frac{\partial w_h \cdot h }{\partial w_i}=\eta \delta_o w_h (1-h)hx_i
Because the maximum derivative of the sigmoid function is 0.25, the decent power will diminish quickly with more hidden layers.
bonus: A great video from Frank Chen about the history of deep learning. It’s a 45-minute video, sort of a short documentary, starting in the 1950s and bringing us to the current boom in deep learning and artificial intelligence.

8 Project 1: your first neural network

Note for bike-sharing project: I spend some time to figure the right dimensional shape for each term. The shape is opposite to the previous exercise. Each time only feeds one instance.

9 model evaluation

# Classification Accuracy
from sklearn.metrics import accuracy_score
# Regression Metrics
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
form skelarn.linear_model import LinearRegression
# K-Fold cv
from sklearn.model_selection import KFold
kf = KFold(tota_size, test_size, shuffle=True)
  • underfitting: error due to bias, oversimplify the problem
  • overfitting: error due to variance, overcomplicating the problem, try to remember data not generalize
model complexity graph

4 Apply Deep Learnig

This is a fancy lesson, a collection of some interesting examples.

style transfer

conda create -n style python=2
source activate style
conda install -c conda-forge tensorflow=0.11.0
conda install scipy pillow

python --checkpoint ./rain_princess.ckpt --in-path ./examples/content/chicago.jpg --out-path ./output_image.jpg

Deep Traffic