Wednesday, April 18, 2012

Hash a SAS Value

Sometimes, it is good to be able to hash a value so that a unique key can be made into the data. For example, say you were looking at a system performance log. You have a PID, a process name, and a user. PIDs are reused by a system all of the time so trying to narrow down uniqueness throughout a day is hard.

It order to get a unique value, you could concatenate the values into one:

000789654 || WeeklyProcess || gertre5

We are assuming that there is no need to ever reverse the values. This is a key assumption.

There is an undocumented function in SAS called CRCXX1 that can create a unqiue hash. Here is some code illustrating it:

data A;
input name :$200. gender :$8. state :$20.;
x = compress(name||gender||state);
y = CRCXX1(x);
put x= y=32. ;
datalines;
Churchill,Alan Male Colorado
Churchill,John Male Colorado
;
run;

The results:

data A;
884  data A;
885  input name :$200. gender :$8. state :$20.;
886  x = compress(name||gender||state);
887  y = CRCXX1(x);
888  put x= y=32. ;
889  datalines;

x=Churchill,AlanMaleColorado y=1558070123
x=Churchill,JohnMaleColorado y=837584169
NOTE: The data set WORK.A has 2 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


892  ;
893  run;

This could be very valuable for situations where you need to tighten up processing and have some throwaway field values. The person who mentioned the undocumented function says it is good to about 1 million unique values before it starts to have collisions. Above that, go with the MD5 function.

3 comments:

Andrew Z said...

Thank you. This was handy because I needed a numeric, and MD5 was giving me character.

Stephen said...

This information was very useful in understanding what is SAS, thanks for sharing.

SAS training in Chennai

Raju Kumar said...

Thank you for sharing this knowledge in a blogpost.Really simple and even more effective and this worked great, very useful tips
sas online training

CTRL+Z does not generate EOF in Windows 10

In Windows 10, when I was trying to generate an EOF for a Java program, the CTRL+Z did not work. After doing some research (and help from f...