GoldenGate progress monitoring script

As noted in earlier posts, we are in the midst of building out an active-active site using Oracle RAC and Oracle GoldenGate. We want to ensure we know when either the extract or replicat processes are lagging. We found that at times, the extract or replicat processes would indicate they are not behind, but found the Oracle redo logs had not been scanned for a period of time, even though there was a small amount of changes that should have been captured.

As such, we not only want to know when the extract or replicat is lagging, but also the most recent time the redo logs have been considered for “replicable” changes.

Below is what we are starting with in terms of accomplishing this goal.

#!/usr/bin/awk -f
#----------------------------------------------------------------------------------------------------------
#Author:        Steve Howard
#Date:          March 12, 2012
#Purpose:       Simple parser for ggsci output in GoldenGate.
#----------------------------------------------------------------------------------------------------------

{
  if ($1 ~ "EXTRACT") {
    EXT = $2
    getline;
    LAG=$3
    getline;
    if ($0 ~ "Oracle Redo Logs") {
      DO = "y"
      while (DO == "y") {
        getline;
        TIME=$1" "$2
        gsub(",","",$4)
        THREAD=$4
        split($2,t,".")
        gsub("-"," ",$1)
        gsub(":"," ",t[1])
        LAST_READ=systime() - mktime($1" "t[1])
        printf("%-10s %-20s %s %s redo thread %i last read %i seconds ago\n","Extract",EXT,LAG,TIME,THREAD,LAST_READ)
        if (LAST_READ > 200) {
          print "You are "LAST_READ" seconds behind on thread "THREAD"..."
        }
        getline;
        if (length($0) == 0) {
          DO = "n"
        }
      }
    }
    else if ($0 ~ "File") {
      getline;
      TIME=$1" "$2
      split($2,t,".")
      gsub("-"," ",$1)
      gsub(":"," ",t[1])
      LAST_READ=systime() - mktime($1" "t[1])
      printf("%-10s %-20s %s Extract trail file last read %i seconds ago\n","DataPump",EXT,LAG,LAST_READ)
    }
  }
  else if ($1 ~ "REPLICAT") {
    EXT = $2
    getline;
    LAG=$3
    getline;
    if ($0 ~ "File") {
      getline;
      TIME=$1" "$2
      split($2,t,".")
      gsub("-"," ",$1)
      gsub(":"," ",t[1])
      LAST_READ=systime() - mktime($1" "t[1])
      printf("%-10s %-20s %s replicat trail file last read %i seconds ago\n","Replicat",EXT,LAG,LAST_READ)
    }
  }
}

…with sample output shown below…

expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# echo "info *" | ggsci | awk -f k.awk
Extract    CMHEXT               00:00:00 2012-03-12 11:40:13 redo thread 1 last read 350 seconds ago
You are 350 seconds behind on thread 1...
Extract    CMHEXT               00:00:00 2012-03-12 11:45:13 redo thread 2 last read 15 seconds ago
DataPump   CMHPMP               00:00:00 Extract trail file last read 8571 seconds ago
Replicat   CMHREP               00:00:00 replicat trail file last read 50517 seconds ago
expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# cat k.awk

We will end up either pushing this to a web server that is accessible from something like Nagios or SolarWinds, and generating alerts from that. Conversely, we could make a system() call in awk that would email when the redo logs hadn’t been read in more than 300 seconds, or whatever.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.