Saturday, April 15, 2017

Self-driving Car ND A5, Vehicle detection

course 20, Project 5: Vehicle Detection and Tracking
Detailed implementation is in my github.
The following are my notes. Each section is labelled by the lesson number.

5 draw a blue box over the image

cv2.rectangle(image_to_draw_on, (x1, y1), (x2, y2), color=(0, 0, 255), thick=6)
(x1, y1) and (x2, y2) are the x and y coordinates of any two opposing corners of the bounding box you want to draw

6 feature intuition

features characteristics
raw pixel intensity color and shape
histogram of pixel intensity color only
gradients of pixel intensity shape only

9 template matching

compare a picture with an known image, the area that passes threshold is selected and its location is output.
result = cv2.matchTemplate(img, ref, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2. minMaxLoc(result)  
height, weight, color = ref.shape
top_left = min_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
bbox_list.append((top_left, bottom_right))
However, template matching is only useful for things that do not vary in their appearance much.

12 histogram of color

import matplotlib.image as mpimg
import numpy as np
image = mpimg.imread('cutout1.jpg') # RGB
rhist = np.histogram(image[:,:,0], bins=32, range=(0, 256))  # counts, edges: lengths are 32, 33
ghist = np.histogram(image[:,:,1], bins=32, range=(0, 256))
bhist = np.histogram(image[:,:,2], bins=32, range=(0, 256))
# Generating bin centers, one size for three
bin_edges = rhist[1]
bin_centers = (bin_edges[1:]  + bin_edges[:-1])/2
# Plot a figure with all three bar charts
plt.subplot(131), rhist[0])
plt.xlim(0, 256)
plt.title('R Histogram')

15 explore color spaces

3d plot in rgb space and vhs space. code snippet:

16 spatial binning of color

cv2.resize(rgb_image, (32,32)).ravel()
reduce the size and flatten, nothing fancy.

20 histogram of orientation gradient (HOG)

The dataset for practice is vehicles_smallset and non-vehicles_smallset, each has about 1.2k jpeg files with size 64x64 and memory size of 5M in total.
skimage.feature.hog() is nice to calculate the hog of a gray image at one line of code.
There are several parameters to tune the hog:
  1. orientations: number of directions you want to calculate.
  2. pixels_per_cell: cell size over which each gradient histogram is computed.
  3. cells_per_block: the local area over which the histogram counts in a given cell will be normalized.
  4. visualise=True flag tells the function to output a visualization called hog_image, which shows a representation that shows the dominant gradient direction.
  5. feature_vector=True flag is supposed to unroll the features, like ravel() But some bug cause it fail to unroll. So features has a shape of (7, 7, 2, 2, 9). Note that the length of gradient is always one less than orginal length.
By the way, the current version of skimage is 0.13. But my 0.12 version faisl to upgrade to this version.
from skimage.feature import hog
features, hog_image = hog(gray, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(2, 2), visualise=True, feature_vector=False)

22 combine and normalize

rgb = mpimg.imread(path)  # (64, 64, 3)
spatial_features = cv2.resize(rgb, (32,32)).ravel() #(3072,)
rhist = np.histogram(rgb[:,:,0], bins=32, range=(0,256)
ghist = np.histogram(rgb[:,:,1], bins=32, range=(0,256)
bhist = np.histogram(rgb[:,:,2], bins=32, range=(0,256)
hist_features = np.concatenate((rhist[0], ghist[0], bhist[0]))  # (96,)
feature = np.concatenate((spatial_features, hist_features))  # (3168,)
scaler = sklearn.preprocessing.StandardScaler(copy=True, with_mean=True, with_std=True) # define standard scaler, mean 0 # separate fit/transform to process unseen data
scaled_X = scaler.transform (X)

28 svc classifier by raw color

input feature is raw color + histogram of rgb channel, so input dimension is 2321x3169. Note that X is built by np.vstack, y is built by np.hstack
using sklearn.svm.LinearSVC() and train_test_split, it takes 0.6 s to train and get 0.98 test accuracy.

29 svc classifier by HOG

input feature is HOG of r channel, so input size: 2321x1764, takes 0.11 s, 0.975 accuracy.
if use HOG of rgb channels, input size: 2321x5292, takes 0.24s, 0.98 accuracy

32 sliding window

build a list of sliding windows. window size (128,128)

34 search and classify

  1. stack raw color, histogram of color, histogram of gradient as input features. so feature length = 16*16*3+16*3+1764*3=6108. So the HOG features are dominant.
  2. train a classifier
  3. crop the input image to a series of 64x64 images by sliding windows and resize it to the train image shape, do the same preprocessing and apply classifier
  4. if true, save the window positions and draw on the image

35 subsample

some optimizations/tricks:
  1. use YCrCb color space
  2. use cells per step instead of overlap
  3. crop lower half of the image,y=(400,656) and apply scale factor=1.5
  4. directly slice from HOG data

37 multiple detection and false positive

  1. create a heatmap initialized with 0,
  2. the area covered by detected boxes adds value 1, multiple detection areas will have value more than 1
  3. set threshold and reset small value area back to 0
  4. labels,num = scipy.ndimage.measurements.label(heatmap) identify the objects by locations and numbers. Any non-zero values in input are counted as features and zero values are considered the background.
  5. use labels to cut out each object and draw