High latch free waits on Oracle XA transactions

We experienced large waits as well as the associated CPU spikes, when using distributed XA transactions in an ATG/JBOSS application server to Oracle database environment.

Oracle “support” was, per usual, of absolutely no help is diagnosing the situation. I honestly don’t know where they get their pool of support “analysts”. I guess they analyze how to quickly get something off their plate, collect your support payment and…ahhhh, never mind.

What is below is a subset of the class that creates the XA objects using the Oracle classes, and then simply does a select from an inventory table using a variable value for the primary key being fetched.

      XAConnection XACLocal = OXADSLocal.getXAConnection();
      Connection ConnLocal = XACLocal.getConnection();
      XAResource XARLocal = XACLocal.getXAResource();

      byte [] gid = new byte[64];
      gid[0] = (byte) 1;

      byte [] bqidLocal = new byte[64];
      bqidLocal[0] = (byte) 1;
      int x = testXaLatch.r.nextInt();
      Xid XidLocal = new OracleXid(x, gid, bqidLocal);

      //What is below will result in a latch free event, mostly due to the "global tx hash mapping" latch
      XARLocal.start(XidLocal, XAResource.TMNOFLAGS);
      PreparedStatement pst = ConnLocal.prepareStatement("select * from dcs_inventory where inventory_id = ?");
      String y = Integer.toString(testXaLatch.r.nextInt(INVENTORY_IDS.size()));
      pst.setString(1,y);
      ResultSet rst  = pst.executeQuery();
      while (rst.next()) {
        //nothing
      }
      XARLocal.end(XidLocal, XAResource.TMSUCCESS);

We saw far fewer of these events when we ran what is functionally the same code, but invoked the PL/SQL API as shown below…

import java.sql.*;
import javax.sql.*;
import java.util.*;
import java.math.*;
import javax.transaction.xa.*;
import oracle.jdbc.xa.OracleXid;
import oracle.jdbc.xa.client.*;

public class testXADB implements Runnable {

  Thread t;
  static Random r;
  static ArrayList INVENTORY_IDS;

  public static void main(String args[]) {
    try {
      Class.forName("oracle.jdbc.driver.OracleDriver");
      Connection conn = DriverManager.getConnection("jdbc:oracle:thin:***************************");
      Statement stm = conn.createStatement();
      stm.setFetchSize(10000);
      r = new Random();
      ResultSet rstInventoryIds = stm.executeQuery("select inventory_id from dcs_inventory");
      INVENTORY_IDS = new ArrayList();
      while (rstInventoryIds.next()) {
        INVENTORY_IDS.add(rstInventoryIds.getString(1));
      }
      rstInventoryIds.close();
      System.out.println(new java.util.Date());
      for (int i = 1; i <= 200; i++) {
        testXADB r1 = new testXADB();
      }
    }
    catch(Exception e) {
      e.printStackTrace();
    }
  }

  testXADB() {
    try {
      t = new Thread(this);
      t.start();
    }
    catch (Exception e) {
      e.printStackTrace();
    }
  }

  public void run() {
    try {

      Connection conn = DriverManager.getConnection("jdbc:oracle:thin:***********************************************");
      conn.setAutoCommit(false);
      CallableStatement pst = conn.prepareCall("declare " +
                                                 "l number;" +
                                                 "xid number := ?;" +
                                               "begin " +
                                                 "l := dbms_xa.xa_start(dbms_xa_xid(xid,hextoraw('1'),hextoraw('1')), dbms_xa.tmnoflags);" +
                                                 "for r in (select * from atg_core_prod.dcs_inventory where inventory_id = ?) loop " +
                                                 "  null;" +
                                                 "end loop;" +
                                                 "l := dbms_xa.xa_end(dbms_xa_xid(xid), dbms_xa.tmsuccess);" +
                                               "end;");
      long START_TIME = System.currentTimeMillis();

      while (System.currentTimeMillis() - START_TIME < (20 * 1000 * 60)) {
        int x = testXADB.r.nextInt();
        pst.setInt(1,x);
        String y = INVENTORY_IDS.get(testXADB.r.nextInt(INVENTORY_IDS.size()));
        pst.setString(2,y);
        pst.execute();
      }
    }
    catch(Exception e) {
      e.printStackTrace();
    }
  }
}

Our initial thought was that the performance discrepancy may be due to the fact the code is executing directly in the database, but that didn't make sense as the latch appears to be taken for the duration of time between the start and end procedures. In other words, you could sleep for a year between start and end, and the latch wouldn't be held for that duration (thank goodness). Conceptually, this means the software used to call the procedure shouldn't matter.

For those interested, we did find the following is the tree taken in a distributed transaction from straight Java JDBC code. We found this by decompiling the classes in the ojdbc6.jar file.

client driver calls java_xa database package via a CallableStatement in the client side driver
the java_xa PL/SQL API calls a java class stored in the database
The java class in the database calls the native library for handling XA in the database

Our assumption is that the native code is the same code invoked by dbms_xa (wrapped code).

More to come on this, these are just initial notes.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.