SAS Tutorial – An In-depth Guide

SAS Tutorial – An In-depth Guide

8 mins read985 Views Comment
Updated on Mar 2, 2022 09:58 IST

By Darshika Shah

2022_02_SAS-tutorial.jpg

This is a detailed SAS tutorial on the important topics of Base SAS and interview questions related to the same. This will help you to brush up on your Base SAS concepts and prepare you for the interviews.

  • Importing Excel/CSV Data into SAS

For this SAS tutorial, there are three ways in which one can Import Excel/CSV data into SAS:

  1. Using the Proc Import procedure
  2. Using the SAS Data Step in combination with the INFILE statement
  3. Using Import wizard

Now, let’s study each one of these in detail:

  1. Using the Proc Import procedure
  • PROC IMPORT helps to import an external data file to a SAS data set.
  • Can be used to import different files like csv, excel and txt.

Below is the screenshot of the dummy data (in CSV format) I have created to demonstrate practically:

Dummy data CSV

Please note that when you have commas in the data then type them in double-quotes. As you can see Salary has the commas so I have typed in double-quotes.

Below is the code for Proc Import:

proc import file="/home/darshikashah300/sasuser.v94/Dummy_data.csv"
    out=work. Output
    dbms=csv;
run;
 

Here, the first argument (file =argument) specifies the location, name of the file and file extension and this information is enclosed in the double-quotes.

The second argument (out=argument) specifies the name of the SAS output dataset.

The third argument (dbms=argument) specifies the type of file of the data to be imported. DBMS stands for the database management system.

Note: Since I have used SAS® OnDemand for Academics to run my code so first I had to upload the file (which I created as in the screenshot above) from my local system to the cloud in SAS® OnDemand for Academics using the upload option which you can easily find on the top left corner of the SAS interface.

Now to view the result we are using proc print, code for the same is below:

proc print data=work. Output;
run;
 

Please find the output below:

Output table SAS tutorial
  1. Using the SAS Data Step in combination with the INFILE statement

The Input File remains the same as in the above Proc Import step.

Code is below:

data work. output;
    infile "/home/darshikashah300/sasuser.v94/Dummy_data.csv"
	delimiter = ","
	missover 
	dsd
	firstobs=2;
	informat Region $50.;
	informat Company $50.;
	informat EmployeeName $50.;
	informat JobTitle $100.;
	informat Salary $50.;
 
             format Region $50.;
	format Company $50.;
	format EmployeeName $50.;
	format JobTitle $100.;
	format Salary $50. ;
 
	input
	Region $
              Company $
	EmployeeName $
	JobTitle $
	Salary $;
run;
 

Now to view the result we are using proc print, code for the same is below:

proc print data=work.output;
run;
 
  • Missover option tells SAS to continue reading the CSV file when it encounters the missing value.
  • Dsd option is used to tell SAS to ignore the delimiters that are enclosed between double quotes. For example, “2,34,890”.
  • Firstobs defines the first row with the data, data step does not recognize the header so we must set firstobs as 2 in our code.
  1. Using Import wizard

This is the easiest method, just need to select the Import option from the left top of the screen as shown in the below screenshot and select the file you are willing to import.

SAS Studio screenshot for SAS tutorial

Interview question

Is there any disadvantage of Proc Import?

Yes, the drawback is its lack of flexibility like you cannot specify the data type of the variables – numeric or character, for that extra step will be needed to convert them.

This problem can be solved using INFILE statement.

  • Read Character Variable of Varying Length

When we have the character variable of varying length in our data for example both first name and last name in our dataset then the usual method as discussed above doesn’t work, we can use any of the below methods:

  1. Use of COLON modifier
  2. Use LENGTH statement before INPUT statement
  3. Use of Ampersand (&) and put extra space
  1. Use of COLON modifier
data method1;
input ID Name :$30. Score;
cards;
1 DarshikaShah 22
2 OmPrakash 21
3 RavuriSaiKumar 33
;
proc print noobs;
run;
 

The output is :

Output SAS tutorial
  1. Use LENGTH statement before INPUT statement
data method2;
length Name $30.;
input ID Name $ Score;
cards;
1 DarshikaShah 22
2 OmPrakash 21
3 RavuriSaiKumar 33
;
proc print noobs;
run;
 

The output is:

Output SAS Tutorial
  1. Use of Ampersand (&) and put extra space

This is useful when we have space between two words like “Darshika Shah” instead of “DarshikaShah”.

The output:

Output SAS tutorial
  • Creating or Modifying a Variable
  1. Creating numeric variable
  2. Creating character variable
  1. Creating numeric variable

Below is the syntax :

Variable = expression

Suppose the price of the entity is changed and we need to create a variable as revised_price which is double of original_price. Here both the variables are numeric.

DATA new;
SET old;
revised_price =2* original_price;
run;
 

    2. Creating character variable

Suppose we have a character variable Type and its value should be ‘good’.

DATA new;
SET old;
Type =’good’;
run;
 
  • Dropping variables from a data set in SAS

Below are the two ways to delete the variables from the dataset :

1.DROP = dataset option

2.DROP statement

  1. DROP = dataset option
data new;
 set old (drop = obs1 obs2 obs3);
  totalsum = sum(obs1,obs2,obs3);
   run;
 

In these variables obs1,obs2 and obs3 are not available for use after the old dataset old is copied into the new dataset new. Hence,total sum will have a missing value.

  1. DROP statement
data new;
set old;
totalsum = sum(obs1,obs2,obs3);
drop obs1 obs2 obs3;
run;
 

In this, we have simply asked to sum up the values in variables obs1,obs2 and obs3 and create a new variable total sum and then drop the obs1,obs2 and obs3 variables.

  • Dealing with Missing Values in SAS

All the numeric missing values are identified by the dot (.) and all the character missing values are identified by blank( ).

CMISS can identify all the character missing values and NMISS can identify all the numeric missing values in the dataset.

  • Delete empty rows in SAS

Code to remove the row when all variables have missing/blank values :

options missing = ' ';
data new;
   set old;
   if missing(cats(of _all_)) then delete;
run;
 

1. The MISSING= system option is used to display the missing values as a single space rather than as the default period (.) options missing = ‘ ‘;

2. The CATS function concatenates the values. It also removes leading and trailing blanks. cats(of _all_) – Concatenate all the variables

3.  missing (cats(of _all_)) – Identifies all the rows in which missing values exist in all the variables.

  • INTCK vs INTNX Function with Examples

INTCK 

Whenever you want to know the number of intervals between two dates then use INTCK.

An example for this section of the SAS tutorial is below:

Number_of_months = INTCK(Month’,today(),’24Sep2022’D)
 

It will give the number of months between today () and 24 Sep 2022.

INTNX

It helps you to know or get a particular data before or after n number of intervals.

An example is below:

Date of enrolment = INTNX(year’,today(),2)
 

It will give the date after 2 years from today()’s date.

  • Convert Character Variable to Date

INPUT function helps us to convert character variable to numeric and MMDDYY10. The format is used to assign the date format.

Data outcome;
Set input;
Dob_new = input(strip(dob),MMDDYY10.);
Format dob_new MMDDYY10.;
Run;
 
  • Convert Numeric Variable to Date

PUT function helps us to convert a numeric variable to character and yymmdd8. Informat is used to read date.

data outcome;
set input;
new_date = input(put(date,8.),yymmdd8.);
format new_date date10.;
run;
 
  • Proc Sort: Identifying and storing unique and duplicate values

Proc Sort is one of the most frequently used procedures in SAS. We will study for this SAS tutorial in detail:

By using two options in Proc Sort we can remove duplicates:

1.NODUPKEY Option – The NODUPKEY option removes duplicate observations where value of a variable listed in BY variable is repeated.

2.NODUP Option – The NODUP option removes duplicate observations where values in all the variables are repeated.

DUPOUT – It is used to store all the duplicated records.

Below is the input data set :

data in;
input ID Name $ Score;
cards;
1     Darshika   95
1     Darshika   64
2     Sama       45
2     Ramya      54
3     Bunny      87
3     Mary       92
3     Bunny      87
4     Daniel     23
5     Jaya       87
5     Kia        87
6     Simran     63
8     Priya      72
;
run;
 
 
 
 

Code1 :

PROC SORT DATA = in NODUPKEY;
BY ID;
RUN;
 

Output1 :

Table 1 SAS tutorial

So, here we see that those rows are removed where the id is duplicated.

Code 2:

PROC SORT DATA = in NODUP;
 BY ID;
 RUN;
 

Output :

Why no value has been deleted when NODUP option is used?

Although ID 3 has two identical records (See observations 5 and 7), NODUP option has not removed them. It is because they are not next to one another in the dataset and SAS only looks at one record back.

To fix this issue, sort on all the variables in the dataset READIN. 

To sort by all the variables without having to list them all in the program, you can use the keyword ‘_ALL_’ in the BY statement.

Code:

PROC SORT DATA = in NODUP;
 BY _ALL_;
 RUN;
 

Output:

Table 2 SAS tutorial
  • First dot and Last dot Variables

FIRST.VARIABLE assigns the value as 1 for the first observation in a BY group and the value of 0 for all other observations in a BY group.

The dataset must be sorted in this order.

  • WHERE vs. IF Statements

Where

  • Where statement is more efficient.
  • Where is used for only filtration purpose.
  • Where can be used in Procs.

If 

  • Not very efficient.
  • Used for conditional computation.
  • Can be used only in Data step.
  • Joining and merging in SAS
  • In Joining it is not necessary that tables should be sorted but in SAS merging both the tables should be sorted.
  • In Joining it is not necessary that the joining variables should be of same name but in Merging it is necessary that variables in both the tables should have the same name. If not same, rename them.
  • Merging in SAS do not have many to many merge.

Apply This SAS Tutorial For Your Interview

Hope this comprehensive SAS tutorial has helped you understand the major concepts. Apply them so that you can impress your recruiter with practical knowledge. Also, do read the top SAS interview questions and answers.

Recently completed any professional course/certification from the market? Tell us what you liked or disliked in the course for more curated content.

Click here to submit its review with Shiksha Online.

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio