Thursday, March 23, 2017

SAS, University version

Why SAS?

SAS is short for “Statistical Analysis System”.
Timeline:
  • 1966, prototype was developed by Barr and Goodnight, and funded by NIH
  • 1976, they moved from North Carolina State University and founded SAS Institute.
  • 1985, SAS was rewritten in C to allow it run on Unix, MS-DOS, and windows.
  • 2002, Text Miner component was introduced.
  • 2010, a free version for student was introduced.
  • 2010-12, sued world programming, but European Court of Justice ruled that “the functionality of a computer program and the programming language cannot be protected by copyright”
So SAS has a long history and its target customers are enterprise analytics.
Features:
  • It is web browser based. Although starting a local server by virtual machine seems a little complicated, it has the advantage of cross-platform
  • It can be seen as “advanced statistical version“ of Excel, which has rich GUI for people to learn quickly and provides brilliant technical support.
  • Big corporations like SAS because there’s a complete ecosystem that satisfies customers’ every need.
  • its direct competitors are Stata and SPSS (acquired by IBM).
  • You click on the front-end, the corresponding codes are automatically generated in the back-end. This means you can have the code to generate the exact same graph or make changes on that.
  • Integrate with SQL seamlessly.
And the usage differs by industry sectors:

University Edition

This version is free. check here. SAS University Edition includes SAS® Studio, Base SAS®, SAS/STAT®, SAS/IML®, SAS/ACCESS® and several time series forecasting procedures from SAS/ETS®.
There are 2 approaches to get SAS running:
  1. download a .ova file (2.2GB). use virtual box to start a local host and run SAS locally.
  2. use AWS AMI: SAS University Edition. You have to pay EC2 fee ranging from 0.012-0.047 /hr. It’s actually pretty cheap.
Open a new browser window with http://localhost:10080/ And you are good to go.

learn

SAS programs have a DATA step, which retrieves and manipulates data, usually creating an SAS data set, and a PROC step, which analyzes the data.
data highchol;
    set sashelp.heart;
    where Chol_Status = "High";
run;
proc print data = highchol;
run;
proc print data = sashelp.cars;    /*two-level name: library.table */
    by Make;
    var Make Model Type;
run;

create library/ import csv

libname libsas 'S:/datafiles'; /* physical location of the dataset, which can be found in file's property */
data titanic;
    infile '/folders/myfolders/train.csv' dlm=',' firstobs=2; 
    input PassengerId Survived Pclass Name Sex;
run;
use proc import is much more convenient, you don’t need to manually assign the column name. video guide which uses the snippets
/** FOR CSV Files uploaded from Unix/MacOS **/
FILENAME CSV "/folders/myfolders/train.csv" TERMSTR=LF;
/** Import the CSV file.  **/
PROC IMPORT DATAFILE=CSV
            OUT=WORK.MYCSV
            DBMS=CSV
            REPLACE;
run;
/** Print the results. **/
PROC PRINT DATA=WORK.MYCSV; RUN;
/** Unassign the file reference.  **/
FILENAME CSV;
run;
Alternatively, you can use tasks and utilities -> utilities -> import data. Then drag and drop the file from the “server files and folders”.

Graph

scatterplot

ods graphics / reset imagemap;
proc sgplot data=SASHELP.CARS;
    title "Vehicle Statistics";
    scatter x=Horsepower y=MPG_City / group=Origin 
        markerattrs=(symbol=CircleFilled size=12) transparency=0.7 name='Scatter';
    xaxis grid;
    yaxis grid;
    keylegend / location=Inside across=1;
run;
ods graphics / reset;
title;
Other plots like barplot, histogram are similar.

Certification training

The ad is for version 9.3, 2011, while the latest version is 9.4, 2013.
There are several certification packages:
  • Base programming: 3.1 k
  • Advanced programming: 3.8 k/2.45k
  • Predictive Modeling: 2.65 k
  • statistical analysis: 3.05 k

2 comments:

  1. SAS seems like a better alternative for Microsoft excel, especially in cases when complicated data analysis is required. I wonder if it is as user friendly as excel though.

    ReplyDelete
    Replies
    1. there is a bit of learning curve, but it is quite user friendly. It is somewhat expensive though.

      Delete

Note: Only a member of this blog may post a comment.