course 20, Project 5: Vehicle Detection and Tracking
Detailed implementation is in my github.
The following are my notes. Each section is labelled by the lesson number.
5 draw a blue box over the image
cv2.rectangle(image_to_draw_on, (x1, y1), (x2, y2), color=(0, 0, 255), thick=6)
(x1, y1)
and (x2, y2)
are the x and y coordinates of any two opposing corners of the bounding box you want to draw6 feature intuition
features | characteristics |
---|---|
raw pixel intensity | color and shape |
histogram of pixel intensity | color only |
gradients of pixel intensity | shape only |
9 template matching
compare a picture with an known image, the area that passes threshold is selected and its location is output.
result = cv2.matchTemplate(img, ref, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2. minMaxLoc(result)
height, weight, color = ref.shape
top_left = min_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
bbox_list.append((top_left, bottom_right))
However, template matching is only useful for things that do not vary in their appearance much.
12 histogram of color
import matplotlib.image as mpimg
import numpy as np
image = mpimg.imread('cutout1.jpg') # RGB
rhist = np.histogram(image[:,:,0], bins=32, range=(0, 256)) # counts, edges: lengths are 32, 33
ghist = np.histogram(image[:,:,1], bins=32, range=(0, 256))
bhist = np.histogram(image[:,:,2], bins=32, range=(0, 256))
# Generating bin centers, one size for three
bin_edges = rhist[1]
bin_centers = (bin_edges[1:] + bin_edges[:-1])/2
# Plot a figure with all three bar charts
plt.figure(figsize=(12,3))
plt.subplot(131)
plt.bar(bin_centers, rhist[0])
plt.xlim(0, 256)
plt.title('R Histogram')
15 explore color spaces
3d plot in rgb space and vhs space. code snippet:
16 spatial binning of color
cv2.resize(rgb_image, (32,32)).ravel()
reduce the size and flatten, nothing fancy.
20 histogram of orientation gradient (HOG)
The dataset for practice is vehicles_smallset and non-vehicles_smallset, each has about 1.2k jpeg files with size 64x64 and memory size of 5M in total.
skimage.feature.hog()
is nice to calculate the hog of a gray image at one line of code.
There are several parameters to tune the hog:
- orientations: number of directions you want to calculate.
- pixels_per_cell: cell size over which each gradient histogram is computed.
- cells_per_block: the local area over which the histogram counts in a given cell will be normalized.
visualise=True
flag tells the function to output a visualization calledhog_image
, which shows a representation that shows the dominant gradient direction.feature_vector=True
flag is supposed to unroll the features, likeravel()
But some bug cause it fail to unroll. So features has a shape of (7, 7, 2, 2, 9). Note that the length of gradient is always one less than orginal length.
By the way, the current version of skimage is 0.13. But my 0.12 version faisl to upgrade to this version.
from skimage.feature import hog
features, hog_image = hog(gray, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(2, 2), visualise=True, feature_vector=False)
22 combine and normalize
rgb = mpimg.imread(path) # (64, 64, 3)
spatial_features = cv2.resize(rgb, (32,32)).ravel() #(3072,)
rhist = np.histogram(rgb[:,:,0], bins=32, range=(0,256)
ghist = np.histogram(rgb[:,:,1], bins=32, range=(0,256)
bhist = np.histogram(rgb[:,:,2], bins=32, range=(0,256)
hist_features = np.concatenate((rhist[0], ghist[0], bhist[0])) # (96,)
feature = np.concatenate((spatial_features, hist_features)) # (3168,)
scaler = sklearn.preprocessing.StandardScaler(copy=True, with_mean=True, with_std=True) # define standard scaler, mean 0
scaler.fit(X) # separate fit/transform to process unseen data
scaled_X = scaler.transform (X)
28 svc classifier by raw color
input feature is raw color + histogram of rgb channel, so input dimension is 2321x3169. Note that X is built by
np.vstack
, y is built by np.hstack
using
sklearn.svm.LinearSVC()
and train_test_split, it takes 0.6 s to train and get 0.98 test accuracy. 29 svc classifier by HOG
input feature is HOG of r channel, so input size: 2321x1764, takes 0.11 s, 0.975 accuracy.
if use HOG of rgb channels, input size: 2321x5292, takes 0.24s, 0.98 accuracy
32 sliding window
build a list of sliding windows. window size (128,128)
34 search and classify
- stack raw color, histogram of color, histogram of gradient as input features. so feature length = . So the HOG features are dominant.
- train a classifier
- crop the input image to a series of 64x64 images by sliding windows and resize it to the train image shape, do the same preprocessing and apply classifier
- if true, save the window positions and draw on the image
35 subsample
some optimizations/tricks:
- use YCrCb color space
- use cells per step instead of overlap
- crop lower half of the image,y=(400,656) and apply scale factor=1.5
- directly slice from HOG data
37 multiple detection and false positive
- create a heatmap initialized with 0,
- the area covered by detected boxes adds value 1, multiple detection areas will have value more than 1
- set threshold and reset small value area back to 0
labels,num = scipy.ndimage.measurements.label(heatmap)
identify the objects by locations and numbers. Any non-zero values in input are counted as features and zero values are considered the background.- use labels to cut out each object and draw