SAS Admin: Find and Kill runaway compute session (Linux)

Managing resources on SAS Compute server is very essential and certainly require some monitoring and cleaning to keep it functioning properly.

Once of such task is to find any runaway SAS jobs which might be orphaned or running for a unusual longer period of time nad keeps utilizing resources. We can do this in many ways but one method I liked is using perl script to monitor and kill such run away jobs. One good thing about this script is, it will try to gracefully terminate process. If not successful, it will use kill -9 to remove that process.

While scheduling in crontab, you can specify number of days as argument.

PS: It was fun learning and writing perl for a change 🙂

15 23 * * * /sasadmin/script/kill_job/kill_long_running_sas_processes.pl 2



# 
#!/bin/perl



use strict;

use warnings;

use POSIX;

use Scalar::Util qw(looks_like_number);



my $daylimit = 8;



# append process info to a log file

open LOG,  ">>/sashome/admin/script/kill_job/log/kill_long_running_sas_processes.log" or die;



# if parm is formatted like a number and between 1 and 100, then use as the number of days that processes are allowed to run, otherwise default to 14

if (looks_like_number($ARGV[0]) && $ARGV[0] >= 1 && $ARGV[0] <=100 )

{

        $daylimit = $ARGV[0]

}



# try to terminate, then sleep for a period, then run again with kill

&ProcessPS(15);

sleep(5);

&ProcessPS(9);



close PSOUTPUT;

close LOG;

#########################################################

sub ProcessPS

{ # run PS and kill pids that have run to long.  Accept parameters to kill with various signals.



        my $signal = $_[0];



        print LOG "\n***** Long-running process kill procedure run at ",strftime('%Y-%m-%d %H:%M:%S',localtime),"  with day limit = ",$daylimit," and signal ",$signal,"\n";



        # retrieve the processes including elapsed time.  Filter for those with the 'sas' command.

        open PSOUTPUT, 'ps -e -o  user,pid,comm,etime,cmd | grep "sas " |' or die;



        while (<PSOUTPUT>)

        {

                # parse the output on blanks

                my ($user,$pid,$comm,$etime,$cmd) = split(' ',$_);



                # parse the elapsed time into days and hh:mm:ss.  Days is optional and only present if the etime is greater than 24 hours.

                # the format of etime is (DD-)HH:MM:SS

                my ($days,$time) = split('-',$etime);



                # if $days is formatted as a number, then the above parse yielded days as well as time

                # if the number of days the process has been running is greater than or equal to the day limit, perform action

                # also don't process anything with user = root

                # also process only sas commands

                if (looks_like_number($days) && ($days >= $daylimit) && ($user ne 'root') && ($comm eq 'sas'))

                {

                        # 15 should be a graceful terminate, 9 supposed to be kill

                        kill $signal, $pid; #**** THIS HAS TO BE UNCOMMENTED TO KILL PROCESSES  *****

                        # log what was killed and when the kill was issued

                        print LOG $pid,'  ',$user,'  ',$etime,'  ',$cmd,"\n";

                }

        }

}