SAS is short for “Statistical Analysis System”.
- 1966, prototype was developed by Barr and Goodnight, and funded by NIH
- 1976, they moved from North Carolina State University and founded SAS Institute.
- 1985, SAS was rewritten in C to allow it run on Unix, MS-DOS, and windows.
- 2002, Text Miner component was introduced.
- 2010, a free version for student was introduced.
- 2010-12, sued world programming, but European Court of Justice ruled that “the functionality of a computer program and the programming language cannot be protected by copyright”
So SAS has a long history and its target customers are enterprise analytics.
- It is web browser based. Although starting a local server by virtual machine seems a little complicated, it has the advantage of cross-platform
- It can be seen as “advanced statistical version“ of Excel, which has rich GUI for people to learn quickly and provides brilliant technical support.
- Big corporations like SAS because there’s a complete ecosystem that satisfies customers’ every need.
- its direct competitors are Stata and SPSS (acquired by IBM).
- You click on the front-end, the corresponding codes are automatically generated in the back-end. This means you can have the code to generate the exact same graph or make changes on that.
- Integrate with SQL seamlessly.
And the usage differs by industry sectors:
This version is free. check here. SAS University Edition includes SAS® Studio, Base SAS®, SAS/STAT®, SAS/IML®, SAS/ACCESS® and several time series forecasting procedures from SAS/ETS®.
There are 2 approaches to get SAS running:
- download a .ova file (2.2GB). use virtual box to start a local host and run SAS locally.
- use AWS AMI: SAS University Edition. You have to pay EC2 fee ranging from 0.012-0.047 /hr. It’s actually pretty cheap.
Open a new browser window with http://localhost:10080/ And you are good to go.
SAS programs have a DATA step, which retrieves and manipulates data, usually creating an SAS data set, and a PROC step, which analyzes the data.
data highchol; set sashelp.heart; where Chol_Status = "High"; run; proc print data = highchol; run; proc print data = sashelp.cars; /*two-level name: library.table */ by Make; var Make Model Type; run;
create library/ import csv
libname libsas 'S:/datafiles'; /* physical location of the dataset, which can be found in file's property */ data titanic; infile '/folders/myfolders/train.csv' dlm=',' firstobs=2; input PassengerId Survived Pclass Name Sex; run;
proc importis much more convenient, you don’t need to manually assign the column name. video guide which uses the snippets
/** FOR CSV Files uploaded from Unix/MacOS **/ FILENAME CSV "/folders/myfolders/train.csv" TERMSTR=LF; /** Import the CSV file. **/ PROC IMPORT DATAFILE=CSV OUT=WORK.MYCSV DBMS=CSV REPLACE; run; /** Print the results. **/ PROC PRINT DATA=WORK.MYCSV; RUN; /** Unassign the file reference. **/ FILENAME CSV; run;
Alternatively, you can use tasks and utilities -> utilities -> import data. Then drag and drop the file from the “server files and folders”.
ods graphics / reset imagemap; proc sgplot data=SASHELP.CARS; title "Vehicle Statistics"; scatter x=Horsepower y=MPG_City / group=Origin markerattrs=(symbol=CircleFilled size=12) transparency=0.7 name='Scatter'; xaxis grid; yaxis grid; keylegend / location=Inside across=1; run; ods graphics / reset; title;
Other plots like barplot, histogram are similar.
The ad is for version 9.3, 2011, while the latest version is 9.4, 2013.
There are several certification packages:
- Base programming: 3.1 k
- Advanced programming: 3.8 k/2.45k
- Predictive Modeling: 2.65 k
- statistical analysis: 3.05 k