Which transaction is being rolled back?

This morning, we had an issue in a development database where we couldn’t drop an UNDO tablespace as a transaction was currently using it. We saw nothing in the GV$TRANSACTION view, so we knew it must be rolling back a **failed** transaction. Sure enough, a quick run through GV$FAST_START_TRANSACTIONS showed that a large transaction was being “undone”. However, how could we identify what the culprit was? We looked at the session (as found in GV$FAST_START_SERVERS) , which was of course reading undo. We also ran logminer against the online redo logs to see what was being recorded. The work being done by the session did not result in any useful troubleshooting SQL, as may be expected.

We ended up writing a python script that would dump the datafile block the session was reading at a given point in time.

#!/home/oracle/local/bin/python
"""
--------------------------------------------------------------------------------------------
Author:         Steve Howard
Date:           November 21, 2011
Purpose:        Simple program to dump block for session waiting on single block read
--------------------------------------------------------------------------------------------
"""

import cx_Oracle, sys

conn = cx_Oracle.connect(mode = cx_Oracle.SYSDBA)
cursor = conn.cursor()
found=False
cursor.execute("select value from v$parameter where name = 'diagnostic_dest'")
for row in cursor:
  diag = row[0]

cursor.prepare("select p1,p2 from v$session where sid = :1 and event = 'db file sequential read'")

while found == False:
  cursor.execute("select p1,p2 from v$session where sid = :1 and event = 'db file sequential read'",[int(sys.argv[1])])

  for row in cursor:
    found = True
    cursor2 = conn.cursor()
    cursor2.execute("alter system dump datafile " + str(row[0]) + " block " + str(row[1]))
    cursor2.execute("select spid from v$process where addr = (select paddr from v$session where sid = sys_context('userenv','sid'))")
    for r in cursor2:
      print "Look for a tracefile under ",diag," with ",r[0]," in the name."

When we did this, we found it was in fact an UNDO block that it was reading (surprise, surprise).

It looked like what is below…

Block dump from disk:
buffer tsn: 5 rdba: 0x0143cd2b (5/249131)
scn: 0x0631.fb757259 seq: 0x04 flg: 0x04 tail: 0x72590204
frmt: 0x02 chkval: 0x2f3a type: 0x02=KTU UNDO BLOCK
Hex dump of block: st=0, typ_found=1

Of course, the SCN of the block itself was brand new, so we couldn’t reliably use it to determine how old the change was that was being undone. However, we did find that the segment being operated on could be found using what is below…

13:14:25 oracle@oradb03dxdu /u02/admin >grep objd ./diag/rdbms/oradev/oradev3/trace/oradev3_ora_31640.trc | awk '{s[$9]+=1} END {for (i in s) {print i,s[i]}}'
10645591 59
10645590 25
13:14:35 oracle@oradb03dxdu /u02/admin >

We correctly assumed that the objd (10645590 and 10645591, above) referred to the object_id of the segment with which the UNDO was associated. We simply used the ID’s to look up the entries in the DBA_OBJECTS view. At that point, we could at least identify the segments, which in our case was sufficient to be able to pinpoint the source by reviewing DBA_HIST_SQLTEXT for recent DML statements against the noted segments.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.