Saturday, September 16, 2017

Robot ND A3, Perception

Update: At Apple’s launch event on 2017.9.12, iPhone X used RGBD camera in the front panel for FaceID.
In term 2, we’ll be sending RGBD camera as part of the hardware package.
An example of semantic segmentation (per-pixel object classification) in the Cityscapes Dataset. See Cordts et al., 2016 for details.
In robotics, we face real world limitations when it comes to things like power consumption, weight, volume and other physical parameters of the system. The Google Tensor Processing Unit (TPU) and the NVIDIA Drive PX2 are examples of some of the cutting edge compute hardware driving onboard processing in robotics to a new level.

3D sensor

Active sensors:
  • Lidar
  • Time of flight camera: illuminating an area with an infrared light source. It captures entire field of view without any moving parts.. Based on the working principle, ToF sensors have 2 categories: pulse runtime and phase shift continuous wave.
  • Ultrasonic sensor: sends out a high-frequnecy sound pulse. It has 2 piezoelectric crystals for transmitter and receiver.
Passive sensors:
  • Monocular camera: reconstruct a 3D model from the video stream using technique called Structure from Motion. Depth is calculated via triangulation technique, which requires accurate measurement of camera pose throughout its movement. Check out this awesome paper by Saxena, Chung and Ng 2007 for the details.
  • Stereo camera: two monocular cameras separated by an accurately known distance. Depth information is obtained by comparing image frames obtained from both cameras viewing the same object or scene. The difference between position of a given object in the scene as perceived by the two cameras is called disparity.
RGB-D camera: similar to TOF camera, but the return wave modulation is performed spatially instead of temporally. The predefined patterns range from simple stripes to unique and convoluted speckle patterns. Unlike stereo cameras, they save a lot of computational resources by providing per-pixel depth values directly instead of inferring the depth information from raw image frames. It is low cost, short range and low resolution.
Most RGB-D cameras use a technique called Structured Light to obtain depth information from a scene.
By comparing between the known projected pattern and the reflected pattern, a depth map is generated much like a stereo camera.

calibration, filtering, segmentation

With 3D point cloud, the majority of the data actually corresponds to unwanted background objects or other things that must be filtered out.
No sensor is perfect. Calibration at its core is the process of comparing 2 measurements, a ground truth and a fresh measurement.
While python has OpenCV library to calibrate camera, ROS also has its own camera_calibration package that allows easy calibration of Monocular, Stereo, and RGB-D cameras.Check out this tutorial.

install python-pcl

python-pcl is a library of Python bindings for the powerful Point Cloud Library.
$ sudo pip install cython
$ cd ~/RoboND-Perception-Exercises/python-pcl
$ python build
$ sudo python install
Note the you can’t pip install pcl because the library has a lot of dependency of local environment. So you have to compile the library locally. What’s more, the RAM for VM should be at least 4GB.
if install locally in Mac,
brew install pcl
brew link --overwrite pcl
# install various other dependancies
pip install .
However, fatal error: 'pcl/point_cloud.h' file not found seems due to that I don’t have an ROS environment.

point cloud data format

Point cloud library was first released on 2010.3.
A typical .pcd file looks like this:
# .PCD v0.7 - Point Cloud Data file format
FIELDS x y z rgb
SIZE 4 4 4 4
COUNT 1 1 1 1
WIDTH 202627
VIEWPOINT 0 0 0 1 0 0 0
POINTS 202627
DATA ascii
0.038113248 -1.6091453 1.0870478 1.0624617e-39
0.035177067 -1.6083764 1.0873107 1.0624617e-39
  • latest version is 1.7.2 in 2014
  • fields can be several types:
    FIELDS x y z       # XYZ data
    FIELDS x y z rgb   # XYZ + colors
    FIELDS x y z normal_x normal_y normal_z     # XYZ + surface normals
    FIELDS j1 j2 j3    # moment invariants
  • size: size in bytes
  • type: I for int8(char), in16(short), int32(int), u for unsigned, f for float
  • for unordered point cloud, width is the number of points and height is set to 1.
  • view point: used for building transforms between different coordinate systems. translation(tx,ty,tz) + quaternion(qw,qx,qy,qz), default is (0,0,0,1,0,0,0)
  • points: number of points
  • data: storage type: ascii or binary. advantage of ascii is simplicity and speed

PointCloud vs PointCloud2

The sensor_msgs/PointCloud2 format was designed as a ROS message, and is the preferred choice for ROS applications.
The pcl/PointCloud<T> format represents the internal PCL point cloud format. For modularity and efficiency reasons, the format is templated on the point type, and PCL provides a list of templated common types which are SSE aligned.
$ rosmsg info sensor_msgs/PointCloud
$ rosmsg info sensor_msgs/PointCloud2
We can see PointCloud2 has much more meta-information for inter-communication.

Exercise-1: point cloud filtering

input: pcl.load_XYZRGB('tabletop.pcd')
output:,"xx.pcd"), pcl_viewer xx.pcd
Most commonly used filters from the Point Cloud Library:
  • VoxelGrid Downsampling Filter
  • ExtractIndices Filter
  • PassThrough Filter
  • RANSAC Plane Fitting: Random Sample Consensus. estimate what pieces of the point cloud set belong to that shape by assuming a particular model, such as a plane.The number of iterations is a trade-off between compuer time vs. model detection accuracy.
  • Outlier Removal Filter
Depth camera point cloud is very dense. To increase the processing speed without sacrificing results, we downsample the data by Voxel Grid Downsampling Filter.
Pixel: picture element. Voxel: Volume element.
The majority of the data is not useful for identifying the target (the hammer). Things like ground, table and other objects do not aid in recognizing your target and can be regarded as noise for the most part. Processing the complete point cloud including this excessive data is inefficient and leads to wastage of compute cycles. is used to process the PCD (point cloud data). visualize the pcd by pcl_viewer filename.pcd
Basic operations for pcd are implemented in
import pcl
cloud = pcl.load_XYZRGB('tabletop.pcd') # <type 'pcl._pcl.PointCloud_PointXYZRGB'>
## downsizing filter
vox = cloud.make_voxel_grid_fiter() # <type pcl._pcl.VoxelGridFilter_PointXYZRGB'>
leaf_size = 0.1
cloud_new = vox.filter() # <type 'pcl._pcl.PointCloud_PointXYZRGB'>,"cloud_name.pcd")
# PassThrough filter
passthrough = cloud.make_passthrough_filter()
filter_axis = 'z'
axis_min = 0
axis_max = 2
passthrough.set_filter_limits(axis_min, axis_max)
cloud_new = passthrough.filter()
# segmenter filter: RANSAC
seg = cloud.make_segmenter()
max_distance = 1
inliers, coefficients = seg.segment()
extract_in= cloud.extract(inliers, negative=False)
extract_out= cloud.extract(inliers, negative=True)
# Guassian noise filter, Any point with a mean distance larger than global (mean distance+x*std_dev) will be considered outlier
outlier = cloud.make_statistical_outlier_filter()
x = 1.0
cloud_new = outlier.filter()

Excercise 2: clustering for segmentation

input: camera data from a ROS node by subscribing to a topic called /sensor_stick/point_cloud.
output: after point cloud is segmented by Euclidean clustering, publish data back to topics called /pcl_table and /pcl_objects
$ cp -r ~/RoboND-Perception-Exercises/Exercise-2/sensor_stick ~/catkin_ws/src/
$ cd ~/catkin_ws
$ rosdep install --from-paths src --ignore-src --rosdistro=kinetic -y
$ catkin_make
# add following to .bashrc file
# export GAZEBO_MODEL_PATH=~/catkin_ws/src/sensor_stick/models
# source ~/catkin_ws/devel/setup.bash
$ roslaunch sensor_stick robot_spawn.launch
# what it actually launch: robot_description.launch, robot_control.launch,empty_world.launch,perception.rviz
$ cd ~/catkin_ws/src/sensor_stick/scripts/
$ ls
$ cp  # make a duplicate
$ ./
clustering tries to anwser the simple question, which components of a dataset naturally belong together?
  • k-means clustering: an interative, unsupervised learning algorithm. Begin by randomly selecting k initial centroids, then create k clusters by associate each data points with the nearest centroid. However, it may fail due to highly overlapping, complex shapes or highly sensitive to random initialization.
  • Euclidean clustering, or DBSCAN (Density-Based Spatial Clustering of Application with Noise). You may not know how many clusters to expect, but you do know something about how the points should be clustered (min_samples, max_dist). Each cluster has 3 levels of members: core member, edge member, outlier.

publish point cloud to new topics

In sensor_stick/scripts/, duplicate file to
from pcl_helper import *
rospy.init_node('clustering', anonymous=True)
pcl_sub = rospy.Subscriber("/sensor_stick/point_cloud", pc2.PointCloud2, pcl_callback, queue_size=1)
pcl_objects_pub = rospy.Publisher("/pcl_objects", PointCloud2, queue_size=1)
pcl_table_pub = rospy.Publisher("/pcl_table", PointCloud2, queue_size=1)
while not rospy.is_shutdown():
def pcl_callback(pcl_msg):
  cloud = ros_to_pcl(pcl_msg)
  # process cloud data as last excercise
  # Euclidean clustering as in next section
  ros_cloud_object = pcl_to_ros(cloud_data)
After executing ./, you should see your new topics pcl_objects and pcl_tablebeing published in RViz.
A typical type ros message is sensor_msgs.msg.PointCloud2, which has a lot of metadata such as header,height, width, fields, is_bigendian,point_step, row_step. But the major part is stored in data, which is a list of string, each string is packed by struct with up to 11 field.

Euclidean Clustering:

# Euclidean Clustering
white_cloud = XYZRGB_to_XYZ(extracted_outliers) # <type 'pcl._pcl.PointCloud'>
tree = white_cloud.make_kdtree() # <type 'pcl._pcl.KdTree'>
ec = white_cloud.make_EuclideanClusterExtraction()
ec.set_ClusterTolerance(0.02) # for hammer
cluster_indices = ec.Extract() # indices for each cluster (a list of lists)
# Assign a color to each cluster
cluster_color = get_color_list(len(cluster_indices))
color_cluster_point_list = []
for j, indices in enumerate(cluster_indices):
  for i, indice in enumerate(indices):             
[white_cloud[indice][0], white_cloud[indice][1],  white_cloud[indice][2], rgb_to_float(cluster_color[j])])
# Create new cloud containing all clusters, each with unique color
cluster_cloud = pcl.PointCloud_PointXYZRGB()
# publish to cloud
ros_cluster_cloud = pcl_to_ros(cluster_cloud)
# save to local, 'cluster.pcd')
colorlist is a static variable you should define it outside the function getcolorlist, I defined it after if _name == ‘__main‘: get_color_list.color_list = []

exercise 3: object recognizer

input: cloud data from ROS
output: clustered cloud with recognized label
Possible Features for identifying an object:
  1. object color and shape
  2. detailed color information
  3. detailed shape information
color histogram
import numpy as np
import cv2
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
image = mpimg.imread('Udacican.jpeg')    
# Define a function to compute color histogram features  
def color_hist(img, nbins=32, bins_range=(0, 256)):
    hsv_img = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
    h_hist = np.histogram(hsv_img[:,:,0], bins=nbins, range=bins_range)
    s_hist = np.histogram(hsv_img[:,:,1], bins=nbins, range=bins_range)
    v_hist = np.histogram(hsv_img[:,:,2], bins=nbins, range=bins_range)
    # Concatenate the histograms into a single feature vector
    hist_features = np.concatenate((h_hist[0], s_hist[0], v_hist[0])).astype(np.float64)
    norm_features = hist_features / np.sum(hist_features)
    return norm_features   
feature_vec = color_hist(image, nbins=32, bins_range=(0, 256))


warning: because the folder name “sensor_stick” is the same with that in exercise 2, it’s best to back up your old folder and cp -r new folder
$ cd RoboND-Perception-Exercises/Exercise-3
$ cp -r sensor_stick/ ~/catkin_ws/src/
$ cd ~/catkin_ws
$ rosdep install --from-paths src --ignore-src --rosdistro=kinetic -y
$ catkin_make
$ roslaunch sensor_stick training.launch
# actual: robot_description.launch, robot_control.launch, empty_world.launch, cloud_transformer, feature_extractor
$ cd ~/catkin_ws/src/sensor_stick/scripts
$ ls
$ rosrun sensor_stick
# generate color_histogram and normal_histogram for 7 classes, pickle.dump into "training_set.sav" at current dir
$ rosrun sensor_stick  
# load dataset, StandardScaler() for X, LabelEncoder for y, KFold cross validation for score, plot confusion marix, use whole data to train again and pickle.dump to 'model.sav'
# this code can be run independent of ros

data generation:

  1. use spawn_model() and capture_sample() from sensor_stick/ to generate ROS PointCloud2.
  2. ROS PointCloud2 data will be parsed by pc2.read_points.use compute_color_histograms() and compute_normal_histograms() from sensor_stick/ to divide each color channel and each normal direction into 32 bins, so the total features are 192. The exact number of RGB_D points are not important, because histogram measures the distribution of values, or the relative number.
  3. concantenate 192 features into one sample. There are 7 classes, each is repeated 5 times, so the total training set is (35,192).
  4. pickle.dump(labeled_features, open('training_set.sav', 'wb'))
improve :
  • in the for loop that begins with for i in range(5):. Increase this value to increase the number of times you capture features for each object.
  • To use HSV, find the line in where you’re calling compute_color_histograms() and change the flag to using_hsv=True.
  • try different binning schemes.

machine learning:

  • training_set = pickle.load(open('training_set.sav', 'rb'))
  • preprocess: StandardScaler() for X, LabelEncoder for y,
  • KFold cross validation for score, plot confusion marix
  • use whole data to train again and get the final model
  • model = {'classifier': clf, 'classes': encoder.classes_, 'scaler': X_scaler}
  • pickle.dump to 'model.sav'
  • If you get the 192 features and want to make a prediction, first you need to reformat the data with 3 operations : X_test = X_scaler.transform(np.array(features).reshape(1,192)), then you can clf.predict(X_test)

put them all together

$ cd ~/catkin_ws/src/sensor_stick/scripts/
$ cp
# finish all TO-DO
$ roslaunch sensor_stick robot_spawn.launch
$ chmod +x
$ ./


Amazon robotics challenge, 2017.7.30:
$ cd ~/catkin_ws/src/RoboND-Perception-Project/pr2_robot/scripts
$ chmod u+x
$ ./ # what it actually does:
# roslaunch pr2_robot pick_place_demo.launch
# roslaunch pr2_moveit pr2_moveit.launch
# rosrun pr2_robot pr2_motion
  • major window + cllision map
  • click on the Next button on botton left of RViz window
  • robots starts in an idle state
  • add 2 dropbox
  • raise 2 arms
  • rotate in plane
  • identify 3 tabletop objects, plan pick-place trajectories
test your code:
#compare pick_list: (the model folder has 12 models)
models = ['beer', 'bowl', 'create', 'disk_part', 'hammer', 'plastic_cup','soda_can']  # original
models = ['biscuits', 'soap', 'soap2'] # pick_list_1.yaml
models = ['biscuits', 'soap', 'book', 'soap2', 'glue'] # pick_list_2.yaml
models = ['sticky_notes', 'book','snacks', 'soap', 'biscuits','eraser', 'soap2', 'glue'] # pick_list_3.yaml
$ cd ~/catkin_ws
$ roslaunch sensor_stick training.launch
# change the 'models' list, output file name "training_set.sav"
$ rosrun sensor_stick
# change load file name, output file name  'model.sav'
$ rosrun sensor_stick  
# change  "", "pick_list" to corresponding numer
$ roslaunch pr2_robot pick_place_project.launch

# change load file name, output file name "output_1.yaml", test_scene_num
$ rosrun pr2_robot
3 scenarios to work with: .worldfiles locate at /pr2_robot/worlds/. Corresponding pick lists are .yaml files at /pr2_robot/config.
Execution plan:
  1. according to the pick_list, modify models list in the
  2. generate training set
  3. train SVM mode
  4. add trained mode to perception pipeline

output to yaml file

The task is to write a ros_client to send a request to the pr2_pick_place_server. The file srv/PickPlace.srv defines the request/response variables in terms of message type. However, each message type is an object with attributes such as data. Check details by rosmsg info xx
from std_msgs.msg import Int32
test_scene_num = Int32() = 1

from std_msgs.msg import String
object_name = String() = object_list_param[i]['name']

from geometry_msgs.msg import Pose
pick_pose = Pose()
pick_pose.position = xx # Point: x,y,z
pick_pose.orientation = xx # Quaternion: x,y,z,w
create a .yaml output file. The way it will actually work within your code is that you’ll iterate over each item in the pick list, see whether you found it in your perception analysis, then if you did, populate the pick_pose message with the centroid. Since you’ll do this with each object one at a time, a convenient way to save the messages for each object is in a list of dictionaries by make_yaml_dict()
PR2 collision avoidance: change config/sensors.yaml‘s point_cloud_topic from /pr2/voxel_grid/points to /pr2/3d_map/points.
publishing joint_angle values directly to the world_joint_controller.