Friday, June 7, 2013

Application log monitoring and Oracle particular Database monitoring thru bash scripts

Day before yesterday, i have been assigned one task to write a script which will do below..
Task
1. Monitor Log say, java_abc.log of the java application
2. If error reported in java_abc.log regarding db connection
3. check db connection remotely using sqlplus login command in three intervals 5 sec,10 sec, 15 sec
4. if db connection is restored then restart the java app and notify thru email to respective teams
5. If db connection not restored then simply notify dba and other respective teams.

Required :
Password free ssh between DBSERVER and APPSERVER.

                                                Password less ssh
DBSERVER <------------------------------------------------------> APPSERVER

Scripts On APPSERVER(click on script name to open)
check_log (nagios plugin script, can be found at nagios site)
java_log_monitor
java_app_restart <--- your application restart script

Script On DBSERVER(click on script name to open)
dbMONITOR

Flow
1. APPSERVER Crontab will execute java_log_monitor every 5 mins
2. java_log_monitor will call check_log script to check java_abc.log to search below error pattern
ORA (to check all oracle errors like ORA-12528, TNS:listener: all appropriate instances are blocking new connections) and
java.lang.RuntimeException: Couldn't get the connection
3. If above patterns reported in java_abc.log file, it will execute the dbMonitor script on DBSERVER
4. dbMonitor will return 0 (connection ok) or 1 (connection not ok) to java_log_monitor script
5. If dbMonitor returns 0 (connection ok), then java_abc_monitor will call java_app_restart script to restart java application and will notify respective teams thru email.
6. If dbMonitor returns 1 (connection not ok), then java_log_monitor will again call dbMonitor on DBSERVER three times (5 sec, 10 sec, 15 sec) to check db status.
7. if dbMonitor still returns 1 (connection not ok) then java_log_monitor will notify respective teams.

Hope this info will help someone...
Please email me if you have any query....

Database MONITOR script

Please refer this first-To Understand the flow of script.

#!/bin/bash
#below line added to provide oracle user env if scriptis executed remotely thru ssh
source /home/oracle/.bash_profile
echo "exit" | sqlplus -L dbowner/dbownerpwd@dbname | grep Connected > /dev/null

Java Log Monitoring script

Please refer this first-To Understand the flow of script.

#!/bin/bash
dt=`date +%r`
logFile="/u01/monitor/batchcheck.log"
cd /u01/monitor/

checkLogResult=`sh check_log -F /u01/java_abc/apache-tomcat-5.5.31/applications/logs/java_abc.log -O /u01/monitor/java_abc.log.old  -q "ORA|java.lang.RuntimeException: Couldn't get the connection"`

if [ $? -eq 0 ]
then
        echo -ne "[ $dt ] Pattern not found. Controller.log looks [   OK   ]\n" >> $logFile
        exit 0
else
        echo "[ $dt ]Oops! DB Connection error occured..." >>  $logFile
        echo "$checkLogResult" >>  $logFile
        echo "Checking db connection.." >>  $logFile
        ssh oracleuser@DBSERVER 'sh /home/oracle/scripts/dbMonitor'
        if [ $? -eq 0 ]
        then
                echo -ne "[ $dt ] dbCheck says, DB connection looks [ OK ]\n" >> $logFile
                echo -ne "Restarting JAVA APPS ...\n" >> $logFile
                sh /u01/monitor/java_app_restart
                echo -ne "JAVA Apps restarted.You must have received email notification for same.\n" >> $logFile
                exit 0
        else
                echo -ne "[ $dt ] \t dbCheck says, DB connection looks [ NOT OK ]\n" >> $logFile
                echo "Retrying in 5 sec..."  >> $logFile
                sleep 5
                ssh oracleuser@DBSERVER 'sh /home/oracle/scripts/dbMonitor'
                if [ $? -eq 0 ]
                then
                        echo -ne "[ $dt ] dbCheck says, DB connection looks [ OK ]\n" >> $logFile
                        echo -ne "Restarting JAVA APPS ...\n" >> $logFile
                        sh /u01/monitor/java_app_restart
                        echo -ne "JAVA Apps restarted.You must have received email notification for same.\n" >> $logFile
                        exit 0
                else
                        echo -ne "[ $dt ] \t dbCheck says, DB connection looks [ NOT OK ]\n" >> $logFile
                        echo "Retrying in 10 sec..."  >> $logFile
                        sleep 10
                        ssh oracleuser@DBSERVER 'sh /home/oracle/scripts/dbMonitor'
                        if [ $? -eq 0 ]
                        then
                                echo -ne "[ $dt ] dbCheck says, DB connection looks [ OK ]\n" >> $logFile
                                echo -ne "Restarting JAVA APPS ...\n" >> $logFile
                                sh /u01/monitor/java_app_restart
                                echo -ne "JAVA Apps restarted.You must have received email notification for same.\n" >> $logFile
                                exit 0
                        else
                                echo -ne "[ $dt ] \t dbCheck says, DB connection looks [ NOT OK ]\n" >> $logFile
                                echo "Retrying in 15 sec..."  >> $logFile
                                sleep 15
                                ssh oracleuser@DBSERVER 'sh /home/oracle/scripts/dbMonitor'
                                if [ $? -eq 0 ]
                                then
                                        echo -ne "[ $dt ] dbCheck says, DB connection looks [ OK ]\n" >> $logFile
                                        echo -ne "Restarting JAVA APPS ...\n" >> $logFile
                                        sh /u01/monitor/java_app_restart
                                        echo -ne "JAVA Apps restarted.You must have received email notification for same.\n" >> $logFile
                                       exit 0
                                else
                                        echo -ne "[ $dt ] \t dbCheck says, DB connection looks [ NOT OK ]\n" >> $logFile
                                        echo -ne "We have a serious problem with db...Contact DBA\nSending email notification...\n" >> $logFile
                                        sh /u01/monitor/CriticleEmail.sh
                                        echo -ne "Email notification sent.\n" >> $logFile
                                        exit 1
                                fi
                        fi
                fi
        fi


fi
exit 0

Log checker scrpt

chech_log scripts
Please refer this first-To Understand the flow of script.
#! /bin/sh
#
# Log file pattern detector plugin for Nagios
# Usage: ./checkLog <log_file> <old_log_file> <pattern>
#
# Description:
#
# This plugin will scan a log file (specified by the <log_file> option)
# for a specific pattern (specified by the <pattern> option).  Successive
# calls to the plugin script will only report *new* pattern matches in the
# log file, since an copy of the log file from the previous run is saved
# to <old_log_file>.
#
# Output:
#
# On the first run of the plugin, it will return an OK state with a message
# of "Log check data initialized".  On successive runs, it will return an OK
# state if *no* pattern matches have been found in the *difference* between the
# log file and the older copy of the log file.  If the plugin detects any
# pattern matches in the log diff, it will return a CRITICAL state and print
# out a message is the following format: "(x) last_match", where "x" is the
# total number of pattern matches found in the file and "last_match" is the
# last entry in the log file which matches the pattern.
#
# Examples:
#
# Check for login failures in the syslog...
#
#   check_log /var/log/messages ./check_log.badlogins.old "LOGIN FAILURE"
#
#

# Paths to commands used in this script.  These
# may have to be modified to match your system setup.
# TV: removed PATH restriction. Need to think more about what this means overall
#PATH=""

ECHO="/bin/echo"
GREP="/bin/egrep"
DIFF="/usr/bin/diff"
TAIL="/usr/bin/tail"
CAT="/bin/cat"
RM="/bin/rm"
CHMOD="/bin/chmod"
TOUCH="/bin/touch"

PROGNAME=`/bin/basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION=`echo '$Revision: 1.8 $' | sed -e 's/[^0-9.]//g'`

STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3

#. $PROGPATH/utils.sh

print_usage() {
    echo "Usage: $PROGNAME -F logfile -O oldlog -q query"
    echo "Usage: $PROGNAME --help"
    echo "Usage: $PROGNAME --version"
    echo "./checkLog -F /u01/java_abc/apache-tomcat-5.5.31/applications/logs/java_abc.log -O /u01/monitor/java_abc.log.old -q Error"
}

print_help() {
    #print_revision $PROGNAME $REVISION
    echo ""
    print_usage
    echo ""
    #echo "Log file pattern detector plugin for Nagios"
    echo "Log file pattern detector plugin for batch service"
    echo ""
}

# Make sure the correct number of command line
# arguments have been supplied

if [ $# -lt 1 ]; then
    print_usage
    exit $STATE_UNKNOWN
fi

# Grab the command line arguments

#logfile=$1
#oldlog=$2
#query=$3
exitstatus=$STATE_WARNING #default
while test -n "$1"; do
    case "$1" in
        --help)
            print_help
            exit $STATE_OK
            ;;
        -h)
            print_help
            exit $STATE_OK
            ;;
        --version)
            print_revision $PROGNAME $VERSION
            exit $STATE_OK
            ;;
        -V)
            print_revision $PROGNAME $VERSION
            exit $STATE_OK
            ;;
        --filename)
            logfile=$2
            shift
            ;;
        -F)
            logfile=$2
            shift
            ;;
        --oldlog)
            oldlog=$2
            shift
            ;;
        -O)
            oldlog=$2
            shift
            ;;
        --query)
            query=$2
            shift
            ;;
        -q)
            query=$2
            shift
            ;;
        -x)
            exitstatus=$2
            shift
            ;;
        --exitstatus)
            exitstatus=$2
            shift
            ;;
        *)
            echo "Unknown argument: $1"
            print_usage
            exit $STATE_UNKNOWN
            ;;
    esac
    shift
done

# If the source log file doesn't exist, exit

if [ ! -e $logfile ]; then
    $ECHO "Log check error: Log file $logfile does not exist!\n"
    exit $STATE_UNKNOWN
elif [ ! -r $logfile ] ; then
    $ECHO "Log check error: Log file $logfile is not readable!\n"
    exit $STATE_UNKNOWN
fi

# If the old log file doesn't exist, this must be the first time
# we're running this test, so copy the original log file over to
# the old diff file and exit

if [ ! -e $oldlog ]; then
    $CAT $logfile > $oldlog
    $ECHO "Log check data initialized...\n"
    exit $STATE_OK
fi

# The old log file exists, so compare it to the original log now

# The temporary file that the script should use while
# processing the log file.
if [ -x /bin/mktemp ]; then
    tempdiff=`/bin/mktemp /tmp/check_log.XXXXXXXXXX`
else
    tempdiff=`/bin/date '+%H%M%S'`
    tempdiff="/tmp/check_log.${tempdiff}"
    $TOUCH $tempdiff
    $CHMOD 600 $tempdiff
fi

$DIFF $logfile $oldlog | $GREP -v "^>" > $tempdiff

# Count the number of matching log entries we have
count=`$GREP -c "$query" $tempdiff`

# Get the last matching entry in the diff file
lastentry=`$GREP "$query" $tempdiff | $TAIL -1`

$RM -f $tempdiff
$CAT $logfile > $oldlog

if [ "$count" = "0" ]; then # no matches, exit with no error
    $ECHO "Log check ok - 0 pattern matches found\n"
    exitstatus=$STATE_OK
else # Print total matche count and the last entry we found
    $ECHO "($count) $lastentry"
    exitstatus=$STATE_CRITICAL
fi

exit $exitstatus

Password less SSH

Simple thing, i am not going too deeeeeeep in it.
If I want to ssh password free from Client A to Server B.
I will do below thing ............
On Client A
create RSA public key

#ssh-keygen -t rsa

no need to feed any info, just enter and enter and enter..

this will create id_rsa.pub file under your user's home directory .ssh hidden folder.
Copy the contents of id_rsa.pub file to Server B's your intended user's home directory under .ssh hidden folder in authorized_keys file.
Use below command to accomplish this

#ssh-copy-id -i .ssh/id_rsa.pub USERABC@SERVER_B

it will ask for the password.just enter the USERABC's password. once completed.

Just test with normal ssh command

#ssh USERABC@SERVER_B