Wednesday, April 18, 2012

Hash a SAS Value

Sometimes, it is good to be able to hash a value so that a unique key can be made into the data. For example, say you were looking at a system performance log. You have a PID, a process name, and a user. PIDs are reused by a system all of the time so trying to narrow down uniqueness throughout a day is hard.

It order to get a unique value, you could concatenate the values into one:

000789654 || WeeklyProcess || gertre5

We are assuming that there is no need to ever reverse the values. This is a key assumption.

There is an undocumented function in SAS called CRCXX1 that can create a unqiue hash. Here is some code illustrating it:

data A;
input name :$200. gender :$8. state :$20.;
x = compress(name||gender||state);
y = CRCXX1(x);
put x= y=32. ;
Churchill,Alan Male Colorado
Churchill,John Male Colorado

The results:

data A;
884  data A;
885  input name :$200. gender :$8. state :$20.;
886  x = compress(name||gender||state);
887  y = CRCXX1(x);
888  put x= y=32. ;
889  datalines;

x=Churchill,AlanMaleColorado y=1558070123
x=Churchill,JohnMaleColorado y=837584169
NOTE: The data set WORK.A has 2 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

892  ;
893  run;

This could be very valuable for situations where you need to tighten up processing and have some throwaway field values. The person who mentioned the undocumented function says it is good to about 1 million unique values before it starts to have collisions. Above that, go with the MD5 function.


Andrew Z said...

Thank you. This was handy because I needed a numeric, and MD5 was giving me character.

Stephen said...

This information was very useful in understanding what is SAS, thanks for sharing.

SAS training in Chennai

Raju Kumar said...

Thank you for sharing this knowledge in a blogpost.Really simple and even more effective and this worked great, very useful tips
sas online training

SAS throwing RPC error

If you are doing code in C#  and get this error when creating a LanguageService: The RPC server is unavailable. (Exception from HRESULT:...