Why SAS?
SAS is short for “Statistical Analysis System”.
Timeline:
- 1966, prototype was developed by Barr and Goodnight, and funded by NIH
- 1976, they moved from North Carolina State University and founded SAS Institute.
- 1985, SAS was rewritten in C to allow it run on Unix, MS-DOS, and windows.
- 2002, Text Miner component was introduced.
- 2010, a free version for student was introduced.
- 2010-12, sued world programming, but European Court of Justice ruled that “the functionality of a computer program and the programming language cannot be protected by copyright”
So SAS has a long history and its target customers are enterprise analytics.
Features:
- It is web browser based. Although starting a local server by virtual machine seems a little complicated, it has the advantage of cross-platform
- It can be seen as “advanced statistical version“ of Excel, which has rich GUI for people to learn quickly and provides brilliant technical support.
- Big corporations like SAS because there’s a complete ecosystem that satisfies customers’ every need.
- its direct competitors are Stata and SPSS (acquired by IBM).
- You click on the front-end, the corresponding codes are automatically generated in the back-end. This means you can have the code to generate the exact same graph or make changes on that.
- Integrate with SQL seamlessly.
And the usage differs by industry sectors:
University Edition
This version is free. check here. SAS University Edition includes SAS® Studio, Base SAS®, SAS/STAT®, SAS/IML®, SAS/ACCESS® and several time series forecasting procedures from SAS/ETS®.
There are 2 approaches to get SAS running:
- download a .ova file (2.2GB). use virtual box to start a local host and run SAS locally.
- use AWS AMI: SAS University Edition. You have to pay EC2 fee ranging from 0.012-0.047 /hr. It’s actually pretty cheap.
Open a new browser window with http://localhost:10080/ And you are good to go.
learn
SAS programs have a DATA step, which retrieves and manipulates data, usually creating an SAS data set, and a PROC step, which analyzes the data.
data highchol;
set sashelp.heart;
where Chol_Status = "High";
run;
proc print data = highchol;
run;
proc print data = sashelp.cars; /*two-level name: library.table */
by Make;
var Make Model Type;
run;
create library/ import csv
libname libsas 'S:/datafiles'; /* physical location of the dataset, which can be found in file's property */
data titanic;
infile '/folders/myfolders/train.csv' dlm=',' firstobs=2;
input PassengerId Survived Pclass Name Sex;
run;
use
proc import
is much more convenient, you don’t need to manually assign the column name. video guide which uses the snippets/** FOR CSV Files uploaded from Unix/MacOS **/
FILENAME CSV "/folders/myfolders/train.csv" TERMSTR=LF;
/** Import the CSV file. **/
PROC IMPORT DATAFILE=CSV
OUT=WORK.MYCSV
DBMS=CSV
REPLACE;
run;
/** Print the results. **/
PROC PRINT DATA=WORK.MYCSV; RUN;
/** Unassign the file reference. **/
FILENAME CSV;
run;
Alternatively, you can use tasks and utilities -> utilities -> import data. Then drag and drop the file from the “server files and folders”.
Graph
scatterplot
ods graphics / reset imagemap;
proc sgplot data=SASHELP.CARS;
title "Vehicle Statistics";
scatter x=Horsepower y=MPG_City / group=Origin
markerattrs=(symbol=CircleFilled size=12) transparency=0.7 name='Scatter';
xaxis grid;
yaxis grid;
keylegend / location=Inside across=1;
run;
ods graphics / reset;
title;
Other plots like barplot, histogram are similar.
Certification training
The ad is for version 9.3, 2011, while the latest version is 9.4, 2013.
There are several certification packages:
- Base programming: 3.1 k
- Advanced programming: 3.8 k/2.45k
- Predictive Modeling: 2.65 k
- statistical analysis: 3.05 k