Monday, June 26, 2017

T-SQL(SQL Server 2008) and SSIS

T-SQL is a dialect Microsoft SQL Server shares with Sybase SQL Server due to its legacy. The query declaratively specifies what is to be retrieved and query processor chooses the shortest possible time called query optimization. It can be seen as an extension to SQL server with some specialization.
A database modeling is not a place to express your inner creativity and find wild and crazy new ways of doing things. If you want to do wild and crazy, do it in your user interface in your application. In your database, I am afraid you want to be patient methodical step by step.
  1. what’s it for? what’s the point of this database? In most case you’re building a database to support an application. Be as specific as possible.
  2. what do you have already? Don’t just import. The existing database may have some problem. What’s the existing process? Understanding the data is essential.
  3. what tables do you need? Real ones and abstract ones.
  4. what columns do you need? what data type? so it can be efficient about storing it. Flexibility is your friend but it’s not what you’re looking for in your database. email address pattern? Be as exact as possible means sequel server will enforce rules on those columns.
  5. what’s your primary key?
  6. what relationships do you need?
Attached a database:
Launch SSMS.
Connect to your SQL Server Instance.
Right-click on Databases in the Object Explorer.
Click Attach.
In the Attach Databases window, click the Add button.
Navigate to the directory containing the.MDF and.LDF files.
Select the.MDF file, and press OK.
Press OK again to attach the database.
trouble shooting is
  • The keyword GO separates statements when more than one statement is submitted in a single batch. GO is optional when the batch contains only one statement.

data type

  • float(n), n is number of bytes for storage, 1<=n<=53
  • real, equal to float(4), value on the scale of 1e38


MS SQL Server 2008 R2 SP2 Express Edition
make sure to download the large one (1GB) which has management tool.
install SQL Server 2008 in a non-clustered environment
instance: jychstar
instance root directory: C:\Program Files\Microsoft SQL Server\MSSQL10_50.JYCHSTAR
mix mode, password: 8920113
open SQL Server management studio
log in
right click on DataBases -> New DataBases -> SampleDB
This is equivalent to
( NAME = N'SampleDB', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.JYCHSTAR\MSSQL\DATA\SampleDB.mdf' , SIZE = 3072KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
( NAME = N'SampleDB_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.JYCHSTAR\MSSQL\DATA\SampleDB_log.ldf' , SIZE = 1024KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
-- list table
create a table
dbo stands for database owner. If you are an administrator, dbo is the default schema.
create table dbo.table_test(
Person_ID int identity(1,1),
First_Name varchar(250),
Last_Name varchar(250),
MIddle_Name varchar(75),
Created_Data datetime not null default(getdate()),
Updated_Data datetime,
Active bit not null default(1)
identity(1,1) ensure an unique ID, which automatically increases 1 from 1.
insert into table
insert into table_test(First_Name, Last_Name, MIddle_Name)
values('Yuchao', 'Jiang','')
-- column list can be skipped if the values are in order
-- null allowed column can be dropped in column list

insert into table_test(First_Name, Last_Name, MIddle_Name)
select 'Yuchao1', 'Jiang2',''
union all
select 'Yuchao3', 'Jiang4','' 

update table_test
set First_Name = 'John'
where Last_Name = 'Jiang'
select * from table_test
where First_name like 'Yucha%'
there is no keyword “limit” in sql server, you have to do like:
with r as 
(select ROW_NUMBER() over(order by First_Name desc) AS row_num,
First_Name, Last_Name from table_test)

select * from r
where row_num <=10

select top 10 * from table_test
group and count
select a,b,cout (*) as cnt
from table_test
group by a,b
having cnt = 2
order by cnt desc
partition and rank
partition give different number in the same group, rank gives same number for the same group
select row_number() over (order by cusotermerName) as ordernumber
row_number() over (partition by vendorName order by vendorName) as vendorname
dense_rank() over (order by cusotermerName) as customernumber
cusotermerName, productName,Amount,vendorName
from sales


SSIS(sql server integration services), along with SSAS(analysis), SSRS(report), forms the so-called “business intelligence”. SSIS has a lot of overlap with ETL(Extraction, Transform and load). The main goal is to collect data from various sources, consolidate or gather data at on location for analyzing.
start -> sql server 2008 R2 -> sql server BI development studio -> new project -> template: integration services project, name: SSIS tutorial, location:C:\Users\Yuchao_Macbook\Documents\Visual Studio 2008\projects, clear “create directory for solution”
better tutorial:
SSIS is kind of visual approach to import raw data into database. You use various modules to connect things together. It has control flow, which is composed of different data flow.
However, the 2008 version seems to have some bugs. I can’t go through every detail right now.

Sunday, June 18, 2017

Self-driving Car ND B5, vehicle model

Kinematic models are simplifications of dynamic models that ignore tire forces, gravity, and mass.
Dynamic models aim to embody the actual vehicle dynamics as closely as possible.
Not all dynamic models are created equal! Some may consider more of these factors than others.
A state is represented by (x,y,\phi,v). Actuators are represented by steering wheel \delta and throttle/brake a. Simple physics. The interesting thing is that a car is not a point but has a finite size. So the actual rotation angle satisfies: L_f\Delta\psi= v\delta\Delta t.
#include <Eigen/dense>
double pi() { return M_PI; }
double deg2rad(double x) { return x * pi() / 180; }
double rad2deg(double x) { return x * 180 / pi(); }
const double Lf = 2;
using namespace Eigen;
// NOTE: state is [x, y, psi, v]
// NOTE: actuators is [delta, a]
VectorXd globalKinematic(VectorXd state,VectorXd actuators, double dt) {
  VectorXd next_state(state.size());
    double x = state[0];
    double y = state[1];
    double psi = state[2];
    double v = state[3];
    next_state[0] = x + v * cos (psi) * dt;
    next_state[1] = y + v * sin (psi) * dt;
    next_state[2] = psi + v / Lf * actuators[0] * dt;
    next_state[3] = v + actuators[1] * dt;
  return next_state;


#include <math.h>
#include <iostream>
#include <Eigen/dense>
using namespace Eigen;
// Evaluate a polynomial.
double polyeval(VectorXd coeffs, double x) {
    double result = 0.0;
    for (int i = 0; i < coeffs.size(); i++) {
        result += coeffs[i] * pow(x, i);
    return result;
// Fit a polynomial.
VectorXd polyfit(VectorXd xvals, VectorXd yvals,int order) {
    assert(xvals.size() == yvals.size());
    assert(order >= 1 && order <= xvals.size() - 1);
    MatrixXd A(xvals.size(), order + 1);
    for (int i = 0; i < xvals.size(); i++) {
        A(i, 0) = 1.0;
    for (int j = 0; j < xvals.size(); j++) {
        for (int i = 0; i < order; i++) {
            A(j, i + 1) = A(j, i) * xvals(j);
    auto Q = A.householderQr();
    auto result = Q.solve(yvals);
    return result;

int main() {
    VectorXd xvals(6),yvals(6);
    xvals << 9.261977, -2.06803, -19.6663, -36.868, -51.6263, -66.3482;
    yvals << 5.17, -2.25, -15.306, -29.46, -42.85, -57.6116;
    VectorXd coeffs = polyfit(xvals,yvals,3);
    for (double x = 0; x <= 20; x += 1.0) {
        std::cout << polyeval(coeffs,x) << std::endl;

Model predictive control

cost function:
  • cte: cross track error
  • epsi: error of heading orientation
  • v - speed_limit
  • change rate of input
Model Predictive Control (MPC) uses an optimizer to find the control inputs that minimize the cost function.We actually only execute the very first set of control inputs. This brings the vehicle to a new state and then we repeat the process.
MPC algorithm:
  1. Define the length of the trajectory, N, and duration of each timestep, dt.
  2. Define vehicle dynamics and actuator limitations along with other constraints.
  3. Define the cost function.
  1. We pass the current state as the initial state to the model predictive controller.
  2. We call the optimization solver. Given the initial state, the solver will return the vector of control inputs that minimizes the cost function. The solver we’ll use is called Ipopt.
  3. We apply the first control input to the vehicle.
In a real car, there will a latency (~100 ms) from the actuation command to the system. So the advantage of MPC over PID is that MPC try to compute a control input based on a future error, which will lead to more stability.


brew install ipopt --with-openblas
Warning: homebrew/science/ipopt dependency gcc was built with a different C++ standard library (libstdc++ from clang). This may cause problems at runtime.
🍺  /usr/local/Cellar/ipopt/3.12.5_2: 66 files, 6.4MB, 
brew install cppad
🍺  /usr/local/Cellar/cppad/20150000_3: 1,483 files, 11.8MB
core code:
#include <cppad/cppad.hpp>
#include <cppad/ipopt/solve.hpp>
#include <Eigen/QR>
auto coeffs = polyfit(ptsx, ptsy, 1);  // linear fit
double cte = polyeval(coeffs, x) - y;
double epsi = psi - atan(coeffs[1]);
Eigen::VectorXd state(6);
state << x, y, psi, v, cte, epsi; // initialize state
auto vars = mpc.Solve(state, coeffs); 
state << vars[0], vars[1], vars[2], vars[3], vars[4], vars[5];  // update state
mpc.Solve has to be specifically modified to incorporate constraints and target information.
size_t N = 25 ;  // number of timesteps
double dt = 0.05 ; // timestep
size_t n_vars = N*8 - 2; // number of independent variables
size_t n_constraints = N * 6;   // Number of constraints
Dvector vars(n_vars), vars_lowerbound(n_vars), vars_upperbound(n_vars), constraints_lowerbound(n_constraints), constraints_upperbound(n_constraints);
// ... initial values for these vectors 
FG_eval fg_eval(coeffs); // constructor
CppAD::ipopt::solve_result<Dvector> solution; // get space
// solve the problem
CppAD::ipopt::solve<Dvector, FG_eval>(options, vars, vars_lowerbound, vars_upperbound, constraints_lowerbound, constraints_upperbound, fg_eval, solution);
Note that the cost function is defined in FG_eval class method operator

Friday, June 16, 2017

Self-driving Car ND B4, PID control

PID: proportional, integral, derivative.
It is a combination of 3 feedback mechanisms. Each is at different time scale: immediate, one interval, accumulative.
def run(robot, p,i,d, n=100, speed=1.0):
    x_trajectory = []
    y_trajectory = []
    pre = robot.y
    integral = 0
    for j in range(n):
        cte = robot.y
        delta = cte -pre
        pre = cte
        integral += cte
        steer = -p*cte - i*integral - d*delta 
        robot.move(steer, speed)
    return x_trajectory, y_trajectory
A method to find the best hyperparameter is called “twiddle”. Each time apply a change to a single parameter if the error is not reduced, reduce the change amount.
def run(robot, params, n=100, speed=1.0):
    x_trajectory = []
    y_trajectory = []
    err = 0
    prev_cte = robot.y
    int_cte = 0
    p , d , i = params
    for j in range(2 * n):
        cte = robot.y
        diff_cte = cte - prev_cte
        int_cte += cte
        prev_cte = cte
        steer = -p * cte - d * diff_cte - i * int_cte
        robot.move(steer, speed)
        if j >= n:  # discard the first n points to get more stable value
            err += cte ** 2
    return x_trajectory, y_trajectory, err / n

def twiddle(tol=0.2): 
    p = [0, 0, 0]
    dp = [1, 1, 1]
    robot = make_robot()
    x_track, y_track, best_err = run(robot, p)
    it = 0
    while sum(dp) > tol:
        it += 1
        print("Iteration {}, best error = {}".format(it, best_err))
        for i in range(len(p)):
            for j in range(2):
                if j==0:
                    p[i] += dp[i]  # positive attempt
                if j==1:
                    p[i] -= 2*dp[i]  # negative attempt
                robot = make_robot()
                x_track, y_track, err = run(robot, p)
                 # works, record result,increase delta
                if err < best_err: 
                    best_err = err
                    dp[i] *= 1.1
                    print p, dp
            # neither works, roll back, decrease delta    
            p[i] += dp[i] 
            dp[i] *= 0.9
    print p, dp        
    return p, best_err


In the actual project, there is no fixed line for the car to approach. So we have to manually implement the twiddle method.

Thursday, June 15, 2017

Self-driving Car ND B3, localization

GPS only has a precision of 1~50 meters, the target is to get an accuracy of 0.1 m.
SLAM, simultaneous localization and mapping, is the computational problem of constructing or updating a map of an unknown environment while simultaneously keeping track of an agent’s location within it.
A much simpler version is pure localization posterior.
Markov Assumption, implement in a recursive structure.

11.2 Posterior distribution

bel(x_t) = p(x_t|z_{1:t},u_{1:t},m)
x is state, z is observation/measurement, u is control/motion, m is map
6 hours lidar data is 430 GB!

11.6 Code structure of input data

Be familiar with the input data: map, motion and observation.
several data files:
  • “data/map_1d.txt”
  • in_file_name_ctr: data/example01/control_data.txt
  • in_file_name_obs:data/example01/observations/observations_000001.txt there are 14 files, but empty inside.
  • in_file_name_gt: data/example01/gt_example01.txt

11.19 implement motion model

no observation data yet, only practice motion update.
for (int i=0; i< bel_x.size(); ++i){ //100
    // motion posterior:
    float posterior_motion = 0.0f;
    //loop over state space x_t-1 (convolution):
    for (int j=0; j< bel_x.size(); ++j){
        float distance_ij = i-j;
        //transition probabilities: normalized distribution
        float transition_prob = helpers.normpdf (distance_ij, controls.delta_x_f, control_std) ;
        posterior_motion +=transition_prob * bel_x_init[j];
    //update believe
    bel_x[i] = posterior_motion;
bel_x = helpers.normalize_vector(bel_x);
bel_x_init = bel_x;

11.25 observation update

1D Markov Localization, Kalman filter and particle filters are realization of the Bayes Filter.
for (int i=0; i< bel_x.size(); ++i){ //100
// motion update:
    float posterior_motion = 0.0f;
    for (int j=0; j< bel_x.size(); ++j){
        float distance_ij = i-j;
        float transition_prob = helpers.normpdf (distance_ij, controls.delta_x_f, control_std) ;
        posterior_motion +=transition_prob * bel_x_init[j];
// observation upate:  
  std::vector<float> pseudo_ranges ;
  for (unsigned int l=0; l< map_1d.landmark_list.size(); ++l){ float range_l = map_1d.landmark_list[l].x_f - pose_i;
      if(range_l > 0.0f)
      pseudo_ranges.push_back(range_l) ;
  sort(pseudo_ranges.begin(), pseudo_ranges.end());
  //define observation posterior:
  float posterior_obs = 1.0f ;
  //run over current observation vector:
  for (int z=0; z< observations.distance_f.size(); ++z){
      float pseudo_range_min;
      if(pseudo_ranges.size() > 0){
          pseudo_range_min = pseudo_ranges[0];
          pseudo_range_min = 100 ; //max range
      //estimate the posterior for observation model: 
      posterior_obs*= helpers.normpdf (observations. distance_f[z], pseudo_range_min, observation_std); 
  //update = observation_update* motion_model
  bel_x[i] = posterior_obs*posterior_motion ;
There are several interesting differences:
  1. you have only one controls but multiple observations.
  2. posterior_motion initialize with 0 but posterior_obs initializes with 1
  3. motion propablility is additive, observation propability is multiplicity.

13 particle filters

Particle filter is like a abudant version of sigma points. You randomly generate many states as your starting points, then use measurement/sense to update your believe. To reduce the randomness, gps signal can be used to have a rough location.
class robot:
    def __init__(self):
        self.x = random.random() * world_size
        self.y = random.random() * world_size
        self.orientation = random.random() * 2.0 * pi
        self.forward_noise = 0.0;
        self.turn_noise    = 0.0;
        self.sense_noise   = 0.0;
    def sense(self):
        # a list of distance to landmarks
        Z = []
        for i in range(len(landmarks)):
            x0, y0 = landmarks[i]
            dist = sqrt((self.x - x0)**2 + (self.y-y0)**2)
            dist += random.gauss(0.0, self.sense_noise)
        return Z
    def Gaussian(self, mu, sigma, x):
        # mean, variance, value
        # calculates the probability of x for 1-dim Gaussian with mean mu and var. sigma
        return exp(- ((mu - x) ** 2) / (sigma ** 2) / 2.0) / sqrt(2.0 * pi * (sigma ** 2))

    def measurement_prob(self, measurement):     
        # calculates how likely a measurement should be
        prob = 1.0;
        for i in range(len(landmarks)):
            x0, y0 = landmarks[i]
            dist = sqrt((self.x - x0) ** 2 + (self.y - y0) ** 2)
            prob *= self.Gaussian(dist, self.sense_noise, measurement[i])
        return prob

    def __repr__(self):
        return '[x=%.6s y=%.6s orient=%.6s]' % (str(self.x), str(self.y), str(self.orientation))
Importance weight and resampling wheel
myrobot = robot()
Z = myrobot.sense() # use as measurement data
N = 1000
p = []  # store each particle
for i in range(N):
    x = robot()
    x.set_noise(0.05, 0.05, 5.0)
w = [i.measurement_prob(Z) for i in p]  # get weights  
index = 0
beta = 0
mw = max(w)
p_w =[]  # store particle by its importance
for i in range(N):
    beta += random.random()*2*mw
    while beta > w[index]:
        beta -= w[index]
        index =(index+1)%N

14.4 gaussian sampling

#include <random> // Need this for sampling from distributions
#include <iostream>
using namespace std;
int main() {
    default_random_engine gen;
    double std_x = 1;
    normal_distribution<double> dist_x(0, std_x);//define function
    for (int i=0; i<5;i++)
        cout << dist_x(gen) <<endl; //not so random
    return 0;


Self-Driving Car Project Q&A | Kidnapped Vehicle
15:30 shows how to resample
26:38 shows how multiplier is used
30:33 shows about multiplier and associations

sebastian Q&A

There’s no universal definition for AI?
  • There’s no universal definition for every concept.
Difference between AI, ML, DL and Data science?
  • There is more and more overlap these days.
Advice on original research?
  • The research for me is a conjunction of solving a problem and discovering the problem you’re trying to solve. In bachelor, someone gives you a problem. In Ph.D., no one gives you a problem. The professor shows up smiles at you and says doing something interesting. If you start working on something complete, which you always have to do, otherwise you’ll fail. You find the solution not quite the answer to the problem you posed.
  • Every piece of research I have done, I started out building something. We didn’t quite know what questions were answering. It took us a while to know the most difficult part of research is to understand what question are you really asking?
the process from an AI idea to a working prototype?
  • drop everything you know. never be arrogant that your methods are the right methods. Question that.
  • build the simplest system you could imagine. Don’t do complicated stuff. Do something simple and see how far you get and then understand why you fail