Poor man’s parallel DML

We were recently required to get rid of 250 million rows in a partitioned audit table. While the table is partitioned, we have global indexes that would be rendered unusable after dropping or truncating a partition. The UPDATE GLOBAL INDEXES clauses I have never found to be immensely useful, as all it does is issue the rebuild command for you. Searches are still incredibly slow during the rebuild if you have skip_unusable_indexes = true.

We are in the midst of designing a rebuild that use LOCAL indexes, but for now, there is a large impact to users from DDL on “old” partitions.

Parallel DML can be faster, but it’s an all or nothing operation. In other words, you sit there and twiddle your thumbs while the delete runs, chewing up UNDO space as it goes. Call me impatient, but I like immediate viewable results 🙂

As such, I ended up writing what is below using threads in python. I am not claiming by any means that this is the best way to do it, but it was incredibly useful for us.

What it does is create ten connections to the database instance, and then uses locks to communicate amongst each connection as to what is being deleted in 5000 row blocks at a time.

This works best when you can identify the range to be deleted by some key, so each connection can select a range. We could identify the starting JRN_SEQ value to delete, and the maximum value to delete. We just created threads to chew on each range of 5,000 rows between these two values, and printed progress to standard output.

We deleted 250 million rows in about three hours using this. You may get more bang for your buck by running more threads, or using higher row counts in each thread.

#!/home/oracle/local/bin/python

import cx_Oracle
import thread
import Queue
from threading import Lock
import sys

#-------------------------------------------------------------------------------------------

pool = cx_Oracle.SessionPool(user="*****",
                             password="*********",
                             dsn="*****",
                             min=10,
                             max=15,
                             increment=1,
                             threaded=True)

qp = Queue.Queue(0)
counter = 756808657

#-------------------------------------------------------------------------------------------

def deleteRows():
  con = pool.acquire()
  con.cursor().execute("alter session set db_file_multiblock_read_count = 128");
  con.cursor().execute("alter session set events '10046 trace name context forever, level 12'");
  #we are doing our own "parallel", so we don't want to kill the box :)
  con.cursor().execute("alter session disable parallel query");
  global counter
  while counter < 1017476540:
    try:
      id = getLock()
      cursor = con.cursor()
      cursor.arraysize = 1000
      try:
        cursor.execute("delete from xwc.xwc_change_history where jrn_seq between :b1 and :b2 and jrn_seq < 1017476540",b1=id,b2=id+4999);
        con.commit()
        cursor.close()
      except cx_Oracle.DatabaseError, exc:
        error, = exc.args
        print >> sys.stderr, tableName
        print >> sys.stderr, error.message
    except:
      print sys.exc_info()[1]

#-------------------------------------------------------------------------------------------

def doIt():

  for _ in range(10):
    thread.start_new_thread(deleteRows, ())

  thread.start_new_thread(printQueue(),())

#-------------------------------------------------------------------------------------------

def getLock():
  lock = Lock()
  global counter
  if lock.acquire(True):
    LOOP = False
    try:
      counter = counter + 5000
      qp.put(str(counter) + " " + str(counter + 4999))
      return counter
    finally:
      lock.release()

#-------------------------------------------------------------------------------------------

#this procedure adds and pops off a queue print statements as ID's are gotten by the delete workhorse 
#procedure above.  If we don't do this, our screen looks more "cluttered"

def printQueue():
  while True:
    s = qp.get();
    print s
  print "all have been processed"

#-------------------------------------------------------------------------------------------

if __name__ == '__main__':
  doIt()

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.