SAS-Efficient Coding Techniques-2:SUBSTRING FUNCTION


059-30: A Clever Demonstration of the SAS® SUBSTR Function


SUBSTRING FUNCTION


Here's an example of the SUBSTRING function. Find additional info on this function below (the example).
1. In this step I create a data which has only one column, column “PACKAGE” which has “SAS” written in it.
059-30: A Clever Demonstration of the SAS® SUBSTR Function
DATA ABC_NEW_COLUMN;
PACKAGE="SAS";
RUN;
2. And then I create a column which is a substring of the column “PACKAGE”. (Step 1 and Step 2 can be combined i.e. I can create the column and substring it in the same step.)
059-30: A Clever Demonstration of the SAS® SUBSTR Function
DATA ABC_SUBSRTING_COLUMN;
SET ABC_NEW_COLUMN;
LETTER_1 = SUBSTR(PACKAGE,1,1);
LETTER_2= SUBSTR(PACKAGE,2,1);
LETTER_3= SUBSTR(PACKAGE,3,1);
PACKAGE_SUBSTR = LETTER_1|| LETTER_2|| LETTER_3;
RUN;

Now the question is what would be the column “PACKAGE_SUBSTR”?
Solution:
PACKAGE has a length of 3.
The other variables, LETTER_1-LETTER_3, also have a length of 3. When the SUBSTR function creates a new variable, the length assigned to that variable is the same as the length of the incoming character string.
LETTER_1 , therefore, is an “S” followed by two blanks.
LETTER_1 is an “A” followed by two blanks.
LETTER_1 is an “S” followed by two blanks.
The fully concatenated string is nine characters long(without the "|" ,which are there just to show the length):
|S  A  S  |


059-30: A Clever Demonstration of the SAS® SUBSTR Function
Additional Info about Substring Function
 The syntax for the SUBSTR function is
SUBSTR:( string, starting position ,< length> ) .Syntax within the angle brackets (‘<’ and ‘>’) is optional.  is an optional number of characters to extract. By default the value of is such that substr  operates on the remainder of the string. For this reason, length can also be thought of as the end position of the string.
SUBSTR becomes particularly useful when it is used with INDEX function here’s an illustration:
First the syntax of INDEX:
Syntax
INDEX(source,excerpt). Arguments: source- specifies the character expression to search. Excerpt- specifies the string of characters to search for in the character expression.
Suppose I have a name field in a SAS Dataset. I have Full names in this field. Now I want to split it into First name and last name. The data is being used is created by running this part:

DATA FULL_NAMES;
INPUT  FULL_NAME$ 1-20 ;
CARDS;
PETE SAMPRAS
KATE HUDSON
BARAK OBAMA
;

RUN;


DATA FIRST_LAST_NAME;
SET FULL_NAMES;
FORMAT FIRST_NAME $15. LAST_NAME $15.;
FIRST_NAME=SUBSTR(FULL_NAME,1,INDEX(FULL_NAME," "));
LAST_NAME=SUBSTR(FULL_NAME,INDEX(FULL_NAME," "));
RUN;


So the output for “PETE SAMPRAS” would be First name=”PETE” and Last name=”SAMPRAS”.



Blog Widget by LinkWithin

Search this blog..

Loading