Tuesday, August 30, 2016

Data Analysis with R

Take-home thought

I only learned R for about 12 hours and finished half of this Udactiy course. I get a quick feeling of what R really is.
For me, R is very similar to Matlab, but more focus and highly specialized on data visualization. The console is very powerful, you can install package in console, like a terminal.
R has many built-in functions specialized for statistics. So it is very handy to get values like median, mean, correlation, deviation.
RStudio is a very nice IDE. It allows Rmd, similar to ipython notebook.
However, the syntax of R is highly specialized for certain drawing. And there are some syntax changes for ggplot2. I could pick up these details quickly later if I have to.

basics

constructs

concepts difficult to define and measure:
  • Memory
  • Happiness
  • Guilt
  • Love

operational definition

anger: number of profanities uttered per min
happiness: ratio of minutes spent smiling to minutes not smiling

what’s EDA

Initial data analysis
check assumptions required for model fitting and hypothesis testing and handling missing values and making transformation of variables.
Exploratory data analysis
summarize their main characteristics, generate better hypothesis, determine which variables have the most predictive power, and select appropriate Statistical tools
Develop a mindset of curious and skeptical.
install Studio: https://www.rstudio.com/
Swirl (statistics with interactive R learning). Swirl is a software package for the R statistical programming language. Its purpose is to teach statistics and R commands interactively.
Type the following commands in the Console, pressing Enter or Return after each line:
install.packages("swirl")
library(swirl)
swirl()

package

textcat
ggplot2

learning sources

basic command

ctrl+ L: clear the console
students <- c(“John”,”Kate”) # assignment, vector, 1-based, chr instead of string
numbers <- c(1:10)
numberOfChar = nchar(students)
data(mtcars) # load built-in data mtcars
names(mtcars)
str(mtcars)
dim(mtcars)
getwd()
setwd(“/Users/yuchaojiang/Downloads/EDA_Course_Materials/lesson2”)
statesInfo <- read.csv(“stateData.csv”)
stateSubset <-subset(statesInfo, state.region==1)
stateSubset <- statesInfo[statesInfo$state.region==1,] # equavilent method
head(stateSubset,2)
dim(stateSubset)

ggplot2

install.packages(‘ggplot2’, dep = TRUE)
Sean Taylor
Good data science comes from good questions, not from fancy techniques, or having the right data. It comes from motivating your research with an idea that you care about, and that you think other people will care about.
Gene Wilder
“Success is a terrible thing and a wonderful thing… Just do what you love.”
John Turkey
An ==approximate answer to the right problem== is worth a good deal more than the exact answer to an approximate problem

data wrangling

tidyr
dplyr

pseudo_facebook

setwd("/Users/yuchaojiang/Downloads/EDA_Course_Materials/lesson3")

pf <- read.csv("pseudo_facebook.tsv", sep = '\t')
names(pf)
library(ggplot2)
qplot(x =dob_day, data = pf) +
  scale_x_continuous(breaks=1:31)

ggplot(data = pf, aes(x = dob_day)) + 
  geom_histogram(binwidth = 1) + 
  scale_x_continuous(breaks = 1:31) + 
  facet_wrap(~dob_month)

qplot(x=friend_count,data=pf, xlim=c(0,1000))

qplot(x = friend_count, data = pf, binwidth = 25) + 
  scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50))

ggplot(aes(x = friend_count), data = subset(pf, !is.na(gender))) + 
  geom_histogram() + 
  scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) + 
  facet_wrap(~gender)

table(pf$gender)
by(pf$friend_count,pf$gender,summary)

qplot(x=tenure/365, data = pf, binwidth = .25, 
      color = I("black"),fill = I('#F79420'))+
  scale_x_continuous(breaks=seq(1,7,1),limits=c(0,7))+
  xlab('Number of years using Facebook') + 
  ylab('Number of users in sample')

qplot(x=age, data = pf, binwidth = 1, 
      color = I("black"),fill = I('#F79420'))

summary(pf$age)

# lesson 4
qplot(age, friend_count,data=pf)

ggplot(aes(x=age, y=friend_count),data=pf)+
  geom_jitter(alpha=1/20)+
  xlim(13,90)

install.packages('dplyr')
library(dplyr)
age_groups <- group_by(pf,age)
pf.fc_by_age <-summarise(age_groups,
            friend_count_mean= mean (friend_count),
            friend_count_median= median(friend_count),
            n=n())
head(pf.fc_by_age)

ggplot(aes(x=age, y=friend_count),data=pf)+
  xlim(13,90)+
  geom_point(alpha=0.05,
             position= position_jitter(h=0),
             color= 'orange')+
  coord_trans(y='sqrt')+
  geom_line(stat='summary', fun.y=mean)+
  geom_line(stat='summary',fun.y=quantile,fun.args=list(probs= 0.1),linetype=2,color='blue')+
  geom_line(stat='summary',fun.y=quantile,fun.args=list(probs= 0.5),linetype=2,color='red')+
  geom_line(stat='summary',fun.y=quantile,fun.args=list(probs= 0.9),linetype=2,color='black')

cor.test(pf$age,pf$friend_count,method="pearson")

with(subset(pf,age<=70),cor.test(age,friend_count,method="pearson"))

with(subset(pf,age<=70),cor.test(age,friend_count,method="spearman"))

Monday, August 22, 2016

Introduction to Java programming

learning source

two types of error

compiler-time error: syntax error
run-time error: logic error

algorithms

  1. unambiguous
  2. executable
  3. terminating

methods

  1. accessor ( don’t change property)
  2. mutator

object

you ask object to do work. you don’t know how they do that
what, not how
comment //

documentation and api

stringObject.replace(char target, char replacement)
import java.awt.Graphics2D;
import java.awt.geom.Rectangle2D;

code mold

public class XX
{
  public static void main(String args[])
  {
    System.out.println();
  }
}

public interface

public void addFriend(Person friend)
public void unFriend(Person nonfriend)
public String getFriend()

Variables

Instance variables (non-static variables)
Class variables( Static variables)
Local variables

example

/**
   A simulated car that consumes gas as it drives.
*/
public class Car
{
    private double milesDriven;
    private double gasInTank;
    private double milesPerGallon;


    /**
       Constructs a car with a given fuel efficiency.
    */
    public Car(double mpg)
    {
        milesDriven = 0;
        gasInTank = 0;
        milesPerGallon = mpg;
    }


    /**
      add gas to the Tank of car
      @param amount the amount of gas added into the Tank
    */

    public void addGas(double amount)
    {
        gasInTank = gasInTank + amount;
    }

    /**
      Gets the current amount of gas in the tank of this car.
      @return the current gas level
    */
    public double getGasInTank()
    {
        return gasInTank;
    }

    /**
      Drives this car by a given distance.
      @param distance the distance to drive
    */
    public void drive(double distance)
    {
        this.milesDriven = this.milesDriven + distance;
        double gasConsumed = distance / this.milesPerGallon;
        this.gasInTank = this.gasInTank - gasConsumed;
    }  

    /**
      Gets the current mileage of this car.
      @return the total number of miles driven
    */
    public double getMilesDriven()
    {
        return milesDriven;
    }
}
public class Friend
  {
  private String name;
  private String friends;
  public void addFriend(Person friend)
  public void unFriend(Person nonfriend)
  public String getFriend()
  }

fundamental data type

overflow
doubles are fuzzy
cast: (int)(3.35)
grayscale: Y=0.2126R+0.7152G+0.0722*B
sunset effect: +25
final int MAX_RED=255;
System.out.printf(“%8.2f\n”, price); // 8 character, 2 decimal points, float type

import java.util.Scanner;

Scanner in= new Scanner(System.in);
int age= in. nextInt();

java.lang.Math

Math.pow(a, n);
Math.sqrt(100);
Math.max(a,b);

decision

if ()
  {}
else if()
  {}
else
  {}
public static int SECONDS_PER_MINUTE=60;
final int SECONDS_PER_MINUTE=60;

they are never exactly the same

String.equal(); // not ==
final double EPSILON=1e-12;
Math.abs(x-y)<EPSILON; // not ==

loop

for (int i=1;i<=6;i++){} // i is local variable
for (int value:values){} // values is an array

Debugger

  1. break point
  2. single step
  3. inspect variables

ArrayList vs Array

ArrayList values= new ArrayList();
method: get(), set(),add()
double[] values=new double[10];
double[] values={32,54,67.5,29,35};

create a package

Basically, there are two ways:
  1. use package statement in the first line in the source file,then
    javac -d . file_name.java
  2. use javac -d Destination_folder file_name.java
To use the classes in a package:
import Destination_folder.*
import java.util.
​ Scanner, ArrayList, Arrays, Random

Interface

public interface Drawable
{
  void draw();  // automatically pubic, no implementation
}
public class house implements Drawable

Sunday, August 14, 2016

IPND, stage5, iOS

Apple invented a language called swift in 2014. So it’s a pretty new one and includes many good merits from other languages. Basically, its grammer is very similar to JavaScript. The nice part is swift introduces “struc“ that groups relevant data. And its IDE Xcode provide immediate feedback.
I am not interesting in developing a game now. But it’s good to know it.

learning sources:

support both inferred typing and explicit typing

Once the type is declared or set, it can’t change.
var mylove = “w” //inferred typing, will guess it as string
var mylove: Character =”w”//explicit typing
var islove: Bool = true
var secretNumber = 7
var price: Double = 2.50
let pi = 3.14 //constant

Naming

keyword:
var, let, class, import, private, operator.
Hungarian Notation:
  • `intNumberOfLives
  • countNumberOfLives
  • sumTotalScore
Camel Casing:
  • totalCumulativeScore
  • secondsSinceLastUpdate
  • minutesTillLaunch
Naming constant:
  • PointsPerLife
  • DefaultGreeting
  • MaxLength
  • POINTS_PER_LIFE
  • DEFAULT_GREETING
  • MAX_LENGTH
name for a struct a capital letter:
struct Student {
    let name: String
    var age: Int
    var school: String
}
var ayush = Student(name: "Ayush Saraswat", age: 19, school: "Udacity")

Saturday, August 13, 2016

IPND, stage 5, front-end

Update on 2017-3-6

7 months I had a quick feeling of JavaScript. I didn’ t process because I am more interested in the content than the face. And lots of tricky details and fine tunes are somewhat overwhelming to me.
Now I have the content, the knowledge discovered by applying machine learning models to lots of data. I realize the last mile is the data presentation, the packaging, the user experience. It’s worth to spend time on polishing the high-quality content, which helps spread the insight to more viewers.

JavaScript

Some basic syntax.
console.log("Hello World!");
var a="yourname";
var b= a.replace("your","my");
var y=function(x){ return z};
array.length;
array.slice;
a.toUpperCase();
array.pop();
array.push();
.split(" ");
.join(" ");
no class but object {};
var bio={"name":"James",
"age":32};
bio.city="Norman";  // dot notation
bio["city"]="Norman"; // square bracket notation is better because special characters or space is tolerent.
if (condiction){}
else{}

jQuery

jQuery is a popular JavaScript library for reading and making changes to the Document Object Model (DOM). The DOM is a tree that contains information about what is actually visible on a website.
While HTML is a static document, the browser converts HTML to the DOM and the DOM can change. In fact, JavaScript’s power comes from its ability to manipulate the DOM, which is essentially a JavaScript object. When JavaScript makes something interesting happen on a website, it’s likely the action happened because JavaScript changed the DOM. jQuery is fast and easy to use, but it doesn’t do anything you can’t accomplish with vanilla (regular) JavaScript.
jQuery was first released in 2006. The latest version is 3.1.1, released on 2016.9.22. It occupies 96% of the JavaScript library market share and is deployed in 70 M websites. The runner-up, “Bootstrap”, has 7 M websites. To my surprise, D3 is only deployed in 12 K websites, probably because it is new.
jQuery’s syntax is designed to make it easier to navigate a document, select DOM elements, create animations, handle events), and develop Ajax) applications.

the basics

<!doctype html>
<html>
<head>
    <meta charset="utf-8">
    <title>Demo</title>
</head>
<body>
    <a href="http://jquery.com/">jQuery</a>
    <script src="jquery.js"></script>
    <script>
    // Your code goes here.
    </script>
</body>
</html>
To run code as soon as the document is ready to be manipulated,
$( document ).ready(function() {});
$ is simply an alias for jQuery because it is shorted and faster to write. It is essentially a window object.
if other JavaScript library wants to use the $ namespace, you can redefine an alias $j for jQuery:var $j = jQuery.noConflict();
Alternatively, you can use a locally-scoped $
jQuery.noConflict();
jQuery( document ).ready(function( $ ) {
    // locally-scoped $ as an alias to jQuery.
    $( "div" ).hide();
});

// The $ variable now has the prototype meaning, which is a shortcut for
// document.getElementById(). mainDiv below is a DOM element, not a jQuery object.
window.onload = function(){
    var mainDiv = $( "main" );
}
.attr() method is a setter for 2 inputs, and a getter for 1 input:
$( "a" ).attr( "href", "allMyHrefsAreTheSameNow.html" );
$( "a" ).attr({
    title: "all titles are the same too!",
    href: "somethingNew.html"
});
$( "a" ).attr( "href" );
  • .html() – Get or set the HTML contents.
  • .text() – Get or set the text contents; HTML will be stripped.
  • .attr() – Get or set the value of the provided attribute.
  • .width() – Get or set the width in pixels of the first element in the selection as an integer.
  • .height() – Get or set the height in pixels of the first element in the selection as an integer.
  • .position() – Get an object with position information for the first element in the selection, relative to its first positioned ancestor. This is a getter only.
  • .val() – Get or set the value of form elements.
jQuery Object
working directly with DOM elements is awkward. By wrapping it in a jquery object, life is much easier. Following are equivalent codes that are implemented by raw JavaScript and jQuery.
var target = document.getElementById( "target" );
target.innerHTML = "<td>Hello <b>World</b>!</td>";
$( target ).html( "<td>Hello <b>World</b>!</td>" );

var target = document.getElementById( "target" );
var newElement = document.createElement( "div" );
target.parentNode.insertBefore( newElement, target.nextSibling );
$( target ).after( newElement );
jQuery UI
This is really cool. With a single line of code, you have a pull-down calendar. Detailed introduction deserves another post.

JSON

JavaScript Object Notation. JSON is very handy to store hierarchal information. The highly flexible means vulnerable to bugs. Use http://jsonlint.com/ to correct bugs.
{
  "Schools":[
    {
      "name":"Beijing University of Posts and Communications",
      "city":"Beijing, China",
      "degree":"BS",
      "major":["Applied Physics"]
    },
    {
      "name":"University of Oklahoma",
      "city":"Norman, OK, US",
      "degree":"PhD",
      "major":["Electrical and Computer Engineering"]
    }
  ]
}

Project: online resume

In the head block, loads style.css and possible script from google map api(You can obtain your own Google Maps API key here)
In the body block, the “main” block has 5 sub tags: with id = header, workExperience, projects, education, mapDiv, lets-connect. Then comes 3 script files: jQuery.js, helper.js, resumeBuilder.js. At last comes the real script, and if certain element is empty, set .style.display = "none";. Alternatively, we can also set .style.backgroundColor = "black";
To avoid script attack, you may use regular expression to catch all the <and > and replace:
var charEscape = function(_html) {
    var newHTML = _html;
    newHTML = _html.replace(/</g, "&lt;");
    newHTML = newHTML.replace(/>/g, "&gt;");
    return newHTML;
};
frustrated by tons of errors, install a local JavaScript IDE: webStorm
script path: /usr/local/bin/webstorm
google map API is cool!. “initializeMap” function is 100 lines of code, really a monster, and many imbedded functions like locationFinder, createMapMarker,callback, pinPoster.
More details see my updated post.