Friday, April 21, 2017

Competitors will need to process LIDAR and Camera frames to output a set of obstacles, removing noise and environmental returns. Participants will be able to build on the large body of work that has been put into the Kitti datasets and challenges, using existing techniques and their own novel approaches to improve the current state-of-the-art.

Specifically, students will be competing against each other in the Kitti Object Detection Evaluation Benchmark

New datasets for both testing and training will be released in a format that adheres to the Kitti standard,

https://discussions.udacity.com/c/didi-udacity-challenge-2017

Udacity Open Source Self-Driving Car: https://www.udacity.com/self-driving-car

Round 1 - Vehicles

2017.3.22-4.21. Top 50 qualified teams will be announced on 5.1 to next round.

The first round will provide data collected from sensors on a moving car, and competitors must identify position as well as dimensions of multiple stationary and moving obstacles.

Round 2 - Vehicles, Pedestrians

The second round will also challenge participants to identify estimated orientation, in addition to added cyclists and pedestrians.

Project Submissions

Round 1: Single Vehicle Obstacles

Datasets for Round 1 feature a single vehicle obstacle that competitors will need to locate in 3D space using camera, radar, and LIDAR data. Download:

sensor calibration files (torrent)
1st dataset (torrent),30 GB
2nd dataset (torrent), 20GB

I noticed in the forum partipants threw an conspiracy theory on the deadline changes:

there are organizations that live by their rules, and organizations that make up the rules as they go along.
With 2000 teams and 4000 people.
It is clear they will pay you less than you are worth, hire more cheap labor than they can manage,

responses by david:

These mistakes are not a reflection on Didi. They are a reflection on Udacity. We’re running the program, and any mistakes are our responsibility.
The Voyage team, which was running this competition, spun out of Udacity in the middle of the competition, which left us in a difficult position. They are good people and continuing to help us get the competition into a good state, but it’s requiring extra time.

starter code or tutorial

https://github.com/jokla/didi_challenge_ros

http://ronny.rest/blog/ point cloud, lidar data, ROS and ROS bags

https://github.com/omgteam/Didi-competition-solution

https://github.com/hengck23/didi-udacity-2017

https://github.com/mjshiggins/ros-examples

https://github.com/udacity/didi-competition/tree/master/tracklets

You can either build ROS nodes to process messages as they are played back from a bag, or you can process bags directly with the rosbag API

Notes

I decided to give up after I downloaded the 30 GB dataset. This dataset (50GB after unzipped) has 12 .bag files and a READ.ME, which says Udacity is developing a new dataset production approach that enables datasets to be released immediately after they are recorded. That’s why release a 2nd 30GB dataset.

Hengck’s starter_kit.pptx is the place to start. However, I just don’t have time to learn ROS and point cloud. It’s very interesting though. I may come back to indulge myself on this after I secure a job.

Wednesday, April 19, 2017

Self-driving Car ND A4, Advanced lane line, ending summary

Course 16, Project 4: Advanced Lane Finding

The following are my notes. Each section is labeled by the lesson number.

1 computer vision

Robotics can be broken down into a 3-step cycle:

sense or perceive the world
what to do based on the perception
perform an action to carry out that decision

Sebastian,

80% of the challenge of building a self-driving car is perception.

Use camera because it has better spatial resolution and much cheaper, although it lacks the 3D information that can ba captured by Lidar.

4 why correct image distortion?

apparent size
apparent shape
appearance change depending on the position
make object closer or farther than they actually are.

pinhole camera images are free from distortion, but lenses tend to introduce image distortion.

9 Finding corners

setup

sudo pip install matplotlib --upgrade

documentation: cv2.findChessboardCorners() and cv2.drawChessboardCorners().

image = cv2.imread(fname)  # bgr, (960, 1280, 3)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) #(960, 1280)
ret, corners = cv2.findChessboardCorners(gray, (nx, ny)) # bool, shape(48, 1, 2)
if ret == True:
    cv2.drawChessboardCorners(image, (nx, ny), corners, ret)
    plt.imshow(img)
    plt.show()

10 camera calibration

Course material: https://github.com/udacity/CarND-Camera-Calibration

The basic workflow:

take multiple pictures for the same image, e.g., chessboard, which serves as the ground truth. These ordered 3D points can be easily created by
```
objp = np.zeros((6*8,3), np.float32)
objp[:,:2] = np.mgrid[0:8, 0:6].T.reshape(-1,2)
```
For each image, find corners by cv2.findChessboardCorners(gray, (nx, ny)) and append to imgpoints.
use objpoints-imgpoints pairs, gray shape to extract calibrate parameters such as mtx (camera matrix, shape (3,3)) and dist (distortion coefficient, shape(1,5)).
undistort image and compare them side by side.

ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera (objpoints, imgpoints, gray.shape,None,None)
undistort = cv2.undistort(original, mtx, dist, None, mtx)

17 perspective transform

The goal is to get lane curvature

import cv2
from sklearn.externals import joblib
f= "calibration_wide/wide_dist_pickle.p"
mtx, dist = joblib.load(f)
img = cv2.imread('calibration_wide/GOPR0070.jpg')
nx, ny =8, 6  
undist = cv2.undistort(img, mtx, dist, None, mtx)
ret, corners = cv2.findChessboardCorners(undist[:,:,0], (8,6), None) 
if ret == True:
    cv2.drawChessboardCorners(img, (nx, ny), corners, ret)
    src = np.float32([corners[0], corners[nx-1], corners[-1], corners[-nx]]) # clockwise
    dst = np.float32([[0, 0],[1280, 0], [1280, 960],[0, 960]]) 
    M = cv2.getPerspectiveTransform(src, dst)
    top_down = cv2.warpPerspective(undist, M, (1280,960))

20 get gradience by sobel operator

Canny Edge detection algorithm is great that we can find all the edges and lines with the help of Hough space. To filter out the annoying undesirable edges, we narrow our focus on gradient along the x-direction. This is what the sobel operation comes in.

gradient of x: sobelx = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize =3)
gradient magnitude: gradmag = np.sqrt(sobelx**2 + sobely**2)
direction of magnitude: np.arctan2(np.absolute(sobely), np.absolute(sobelx)) Direction alone is not particularly useful because there are directions everythere in every direction
combine.

# get the derivative in the x direction denoted by 1,0
sobelx = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize = 3) 
sobely = cv2.Sobel(gray, cv2.CV_64F, 0, 1)
# normalize into (0,255)
abs_sobelx = np.absolute(sobelx)
scaled_sobel = np.uint8(255*abs_sobelx/np.max(abs_sobelx)) 
# select pixels based on the x gradient strength
sxbinary = np.zeros_like(scaled_sobel)
sxbinary[(scaled_sobel >= 20) & (scaled_sobel <= 100)] = 1 
# gradient magnitude
gradmag = np.sqrt(sobelx**2 + sobely**2)
gradmag = np.zeros_like(scaled_sobel)
gradmag[(gradmag >= 0) & (gradmag <= 200)] = 1 
# direction of the gradient
absgraddir = np.arctan2(np.absolute(sobely), np.absolute(sobelx))
dir_binary =  np.zeros_like(absgraddir)
dir_binary[(absgraddir >= 0)&(absgraddir <= np.pi)] = 1
# combine the selection threshold
combined = np.zeros_like(dir_binary)
combined[((gradx == 1) & (grady == 1)) | ((mag_binary == 1) & (dir_binary == 1))] = 1

26 color space

RGB issues: B channel does not detect yellow lane line. All channels vary under different levels of brightness.

wiki of HLS and HSV color spaces here.

(Hue, Saturation, Value) and (Hue, Lightness, Saturation) are very similar.

Hue, the range of (0,179), represents color independent of any change in brightness.

Lightness and Value represent different ways to measure the relative lightness or darkness of a color. For example, a dark red will have a similar hue but much lower value for lightness than a light red.

Saturation is a measurement of colorfulness: the brighter a single color, the higher saturation value; closer to white is lower saturation value.

H and S channels stay fairly consistently in shadow or excessive brightness. S seems to detect lane line pretty well, as well as the dark section of H channel.

28 color threshold

Red: (200,255)

S: (90,255)

H: (15,100)

After some experiments, S is better for yellow lines, R is better for white lines.

gradient x of lightness + raw value of saturation to create the binary image. threshold setting: (20,100), (170,255)

33 finding the lines

np.copy() RGB image, and by setting the low-value pixels to zero, only the high-value pixels are preserved
np.sum(img_cut[:,:,0], axis = 0) to get y intensity count over x-direction, such to estimate the lane line position by 2 peaks
use 9 windows to narrow down the position of lane lines. In each loop, define a window by center and margin, identify nonzeros pixels, which will be used for fitting parabolic equation, use np.mean() to get new centers for current windows
use np.concatenate() to merge points and use np.polyfit(y,x,2) to get coefficient of function x=f(y)
use np.linspace(min,max,num) to generate y points and use fitted coefficient to get the parabolic curves.
with initial fitting parameters at hand, search possible pixels in the neighborhood of fitting curves for subsequent images.
alternatively, use initial center points, np.convolve, and np.argmax to get the new centers and windows. But this approach doesn’t collect enough points to fit parabolic curves.

34 sliding window search: convolution

use np.sum and np.convolve to get the new centers for each window. This way is more mathematically and may be more dynamically robust.
b,g,r = cv2.split(img) and img = cv2.merge((b,g,r)) to deal with different color channels.
cv2.addWeighted() to add mask, this is equal to ax1+bx2

35 measure curvature

The radius of curvature (awesome tutorial here).

$x=Ay^2+By+C$ , A has unit of inverse length

$R_{curve}= \frac{(1+(2Ay+B)^2)^{3/2}}{∣2A∣}$

def curverature(fit, y):
    A, B = fit[0], fit[1]
    return ((1+2*A*y+B)**2)**1.5/np.absolute(2*A)
# Define conversions in x and y from pixels space to meters
ym_per_pix = 30/720 # meters per pixel in y dimension
xm_per_pix = 3.7/700 # meters per pixel in x dimension

# Fit new polynomials to x,y in world space
left_fit_cr = np.polyfit(ploty*ym_per_pix, leftx*xm_per_pix, 2)
right_fit_cr = np.polyfit(ploty*ym_per_pix, rightx*xm_per_pix, 2)
# Calculate the new radii of curvature
left_curverad = ((1 + (2*left_fit_cr[0]*y_eval*ym_per_pix + left_fit_cr[1])**2)**1.5) / np.absolute(2*left_fit_cr[0])
right_curverad = ((1 + (2*right_fit_cr[0]*y_eval*ym_per_pix + right_fit_cr[1])**2)**1.5) / np.absolute(2*right_fit_cr[0])
#

36 project tips

the project repository

camera calibration is a different setting: 9x6 chessboard. It is stored in “camera_cal” folder.
expect the curvation to be around 1 km
keep track of line base and curvatures from frame to frame.
smooth out by average
use cv2.warpPerspective(color_warp, Minv, image.shape[0:2]) to generate filled area that represent the found laneline and add to the original image

My implementation: https://github.com/jychstar/NanoDegreeProject

Produced video is uploaded to https://youtu.be/xZK199K9jwk

I want to highlight 3 points:

perspective transform needs some ground-truth facts to deal with the depth perception
know when to use sobel operation and color space. For this project, Red channel is enough for the “not challenged” video
know how to use a histogram to estimate the laneline base, use window search to collect qualified points, use initial fitting coefficient to quickly localize points for the subsequent images.

End of term 1 summary

First, in the 5 projects:

2 projects are about finding lanelines, either a straight line or curved lines. These projects are about computer vision.
2 projects are about identifying objects, either traffic signs or other vehicles. These projects are about deep learning.
1 projects are about end-to-end learning. This is still deep learning.

Second, this nanodegree helps me demystify how an autonomous car works. Although there are some tricky bugs that bother me a lot, I learn a lot.

Thank you, Udacity and Sebastian!

Saturday, April 15, 2017

Self-driving Car ND A5, Vehicle detection

course 20, Project 5: Vehicle Detection and Tracking

Detailed implementation is in my github.

The following are my notes. Each section is labelled by the lesson number.

5 draw a blue box over the image

cv2.rectangle(image_to_draw_on, (x1, y1), (x2, y2), color=(0, 0, 255), thick=6)

(x1, y1) and (x2, y2) are the x and y coordinates of any two opposing corners of the bounding box you want to draw

6 feature intuition

features	characteristics
raw pixel intensity	color and shape
histogram of pixel intensity	color only
gradients of pixel intensity	shape only

9 template matching

compare a picture with an known image, the area that passes threshold is selected and its location is output.

result = cv2.matchTemplate(img, ref, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2. minMaxLoc(result)  
height, weight, color = ref.shape
top_left = min_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
bbox_list.append((top_left, bottom_right))

However, template matching is only useful for things that do not vary in their appearance much.

12 histogram of color

import matplotlib.image as mpimg
import numpy as np
image = mpimg.imread('cutout1.jpg') # RGB
rhist = np.histogram(image[:,:,0], bins=32, range=(0, 256))  # counts, edges: lengths are 32, 33
ghist = np.histogram(image[:,:,1], bins=32, range=(0, 256))
bhist = np.histogram(image[:,:,2], bins=32, range=(0, 256))
# Generating bin centers, one size for three
bin_edges = rhist[1]
bin_centers = (bin_edges[1:]  + bin_edges[:-1])/2
# Plot a figure with all three bar charts
plt.figure(figsize=(12,3))
plt.subplot(131)
plt.bar(bin_centers, rhist[0])
plt.xlim(0, 256)
plt.title('R Histogram')

15 explore color spaces

3d plot in rgb space and vhs space. code snippet:

https://gist.github.com/jychstar/0daa6ea1a8a759a279092042f396049b

16 spatial binning of color

cv2.resize(rgb_image, (32,32)).ravel()

reduce the size and flatten, nothing fancy.

20 histogram of orientation gradient (HOG)

The dataset for practice is vehicles_smallset and non-vehicles_smallset, each has about 1.2k jpeg files with size 64x64 and memory size of 5M in total.

skimage.feature.hog() is nice to calculate the hog of a gray image at one line of code.

There are several parameters to tune the hog:

orientations: number of directions you want to calculate.
pixels_per_cell: cell size over which each gradient histogram is computed.
cells_per_block: the local area over which the histogram counts in a given cell will be normalized.
visualise=True flag tells the function to output a visualization called hog_image, which shows a representation that shows the dominant gradient direction.
feature_vector=True flag is supposed to unroll the features, like ravel() But some bug cause it fail to unroll. So features has a shape of (7, 7, 2, 2, 9). Note that the length of gradient is always one less than orginal length.

By the way, the current version of skimage is 0.13. But my 0.12 version faisl to upgrade to this version.

from skimage.feature import hog
features, hog_image = hog(gray, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(2, 2), visualise=True, feature_vector=False)

22 combine and normalize

rgb = mpimg.imread(path)  # (64, 64, 3)
spatial_features = cv2.resize(rgb, (32,32)).ravel() #(3072,)
rhist = np.histogram(rgb[:,:,0], bins=32, range=(0,256)
ghist = np.histogram(rgb[:,:,1], bins=32, range=(0,256)
bhist = np.histogram(rgb[:,:,2], bins=32, range=(0,256)
hist_features = np.concatenate((rhist[0], ghist[0], bhist[0]))  # (96,)
feature = np.concatenate((spatial_features, hist_features))  # (3168,)
scaler = sklearn.preprocessing.StandardScaler(copy=True, with_mean=True, with_std=True) # define standard scaler, mean 0
scaler.fit(X) # separate fit/transform to process unseen data
scaled_X = scaler.transform (X)

28 svc classifier by raw color

input feature is raw color + histogram of rgb channel, so input dimension is 2321x3169. Note that X is built by np.vstack, y is built by np.hstack

using sklearn.svm.LinearSVC() and train_test_split, it takes 0.6 s to train and get 0.98 test accuracy.

29 svc classifier by HOG

input feature is HOG of r channel, so input size: 2321x1764, takes 0.11 s, 0.975 accuracy.

if use HOG of rgb channels, input size: 2321x5292, takes 0.24s, 0.98 accuracy

32 sliding window

build a list of sliding windows. window size (128,128)

https://gist.github.com/jychstar/7eef27dbb3b7ac5e77a3f7e9724bd70f

34 search and classify

stack raw color, histogram of color, histogram of gradient as input features. so feature length = $16*16*3+16*3+1764*3=6108$ . So the HOG features are dominant.
train a classifier
crop the input image to a series of 64x64 images by sliding windows and resize it to the train image shape, do the same preprocessing and apply classifier
if true, save the window positions and draw on the image

35 subsample

some optimizations/tricks:

use YCrCb color space
use cells per step instead of overlap
crop lower half of the image,y=(400,656) and apply scale factor=1.5
directly slice from HOG data

37 multiple detection and false positive

create a heatmap initialized with 0,
the area covered by detected boxes adds value 1, multiple detection areas will have value more than 1
set threshold and reset small value area back to 0
labels,num = scipy.ndimage.measurements.label(heatmap) identify the objects by locations and numbers. Any non-zero values in input are counted as features and zero values are considered the background.
use labels to cut out each object and draw

Yuchao's blogspot