Who is generating all the redo?

Recently I was tasked with answering why we generated over 500GB of redo in a little over one day, when we normally generate an average of 100GB. We are in the midst of building out a remote data center which will house a RAC protected by Dataguard. Our initial bandwidth to send redo from the production database is severely limited, so for a few months we are stuck with a link that will allow a maximum of 400GB of throughput on this link daily.

Since 500GB are more than 400GB, we needed to understand who had generated all of this redo, and whether it may reoccur. Of course, we found that this was done over two months ago! I immediately mentioned that we would have to utilize the Oracle AWR to tell us who and/or what had done this. Thankfully, we retain 90 days of AWR with half hour snapshots.

Confidently, I strode back to my desk and begin to crack through the DBA_HIST views. To my surprise, I found no statistic that would tell me who was plowing through our redo logs like a creature in the movie “Tremors”.

I did find a way to roughly identify this information, however, that I hope will be beneficial to you. I first checked DBA_HIST_SQL_STAT for any statement that had a far greater number of executions for the period in question, than they did on average.

I found none.

I then looked into the dba_hist_seg_stat view, and found the PHYSICAL_WRITES_DELTA column. Ah hah! I had my answer. To my dismay, I found that this didn’t have the information, either.

After mulling around for another hour or so, I went back to the dba_hist_seg_stat view. It was then I noticed the db_block_changes_delta column. I then identified the segment that experienced the most changes during the period, using what is below…

select to_char(begin_interval_time,'YYYY_MM_DD HH24:MI') snap_time,
        dhsso.object_name,
        sum(db_block_changes_delta)
  from dba_hist_seg_stat dhss,
         dba_hist_seg_stat_obj dhsso,
         dba_hist_snapshot dhs
  where dhs.snap_id = dhss.snap_id
    and dhs.instance_number = dhss.instance_number
    and dhss.obj# = dhsso.obj#
    and dhss.dataobj# = dhsso.dataobj#
    and begin_interval_time between to_date('2008_05_02 12','YYYY_MM_DD HH24')
                                           and to_date('2008_05_04 12','YYYY_MM_DD HH24')
  group by to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'),
           dhsso.object_name

The answer immediately jumped out at me. It was a table that I recognized, but I didn’t know why it would bubble to the top of the list.

To confirm what I thought, I then ran the following query to examine the changes during the period to what they were historically.

select to_char(begin_interval_time,'YYYY_MM_DD HH24:MI') snap_time,
        sum(db_block_changes_delta)
  from dba_hist_seg_stat dhss,
         dba_hist_seg_stat_obj dhsso,
         dba_hist_snapshot dhs
  where dhs.snap_id = dhss.snap_id
    and dhs.instance_number = dhss.instance_number
    and dhss.obj# = dhsso.obj#
    and dhss.dataobj# = dhsso.dataobj#
    and dhsso.object_name = 'OT_PENDING'
  group by to_char(begin_interval_time,'YYYY_MM_DD HH24:MI')

I found that 90% of the block changes over a three month period *all* occurred in the day and half window I was examining. Now we were really on to something!

I gathered all SQL statements that were run during the period that referenced the table, using what is below…

select to_char(begin_interval_time,'YYYY_MM_DD HH24:MI'),
         dbms_lob.substr(sql_text,4000,1),
         dhss.instance_number,
         dhss.sql_id,executions_delta,rows_processed_delta
  from dba_hist_sqlstat dhss,
         dba_hist_snapshot dhs,
         dba_hist_sqltext dhst
  where upper(dhst.sql_text) like '%OT_PENDING%'
    and dhss.snap_id=dhs.snap_id
    and dhss.instance_Number=dhs.instance_number
    and dhss.sql_id = dhst.sql_id

I found a statement that was being executed an average number of times, relatively speaking. However, it was updating millions of rows, whereas it normally would update hundreds, if that.

I took the SQL_ID for the statement and searched through DBA_HIST_ACTIVE_SESS_HISTORY, using what is below…

select instance_number, to_char(sample_time,'yyyy_mm_dd hh24:mi:ss'),
         user_id,
         program
  from dba_hist_active_sess_history
  where sql_id = 'bxjz5d4p6x02u'
    and snap_id between 43212 and 43220

I found that a SQL*PLUS session from a development server was executing the statement, and the execution times exactly mirrored our jump in redo generation. Now I won’t even go into the lack of processes in this shop and how often developers directly access this database, as that is a topic for a different post.

However, the information for which we were searching had been found!

4 comments for “Who is generating all the redo?

  1. Keith Carter
    December 14, 2009 at 11:10 PM

    I need to shorten my interval time. Thanks for nugget of information!

  2. Manish Nashikkar
    December 26, 2013 at 3:33 PM

    Excellent Article !!!

  3. Vimal
    July 29, 2014 at 10:29 AM

    It has been a very helpful post. Thanks for writing it & sharing the information.

  4. Luis Bermudez
    February 18, 2015 at 12:11 PM

    Wonderful article !
    Thanks for sharing !

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.