Skipping a transaction in GoldenGate

We are the midst of a mission critical implementation that involves an active-active setup. We have chosen GoldenGate to manage the distribution of data between the two sites (soon to be three).

This is the first of a series of articles and tips on our findings as we use the software.

We have spent a lot of time understanding the conflict detection and resolution capabilities of the software, and wanted to “break it” so we could document how to fix it. In our case, we simply inserted the exact same row on each database at the same time. Each failed with an ORA-0001 error, otherwise known as “unique constraint violated”.

Since this was a contrived test case, the fix was fairly simple. We simply wanted to skip the transaction on each database and continue with replicat processing. In real life, it will rarely if ever be this simple in terms of a “real” problem.

There are two ways to do this, the first of which is much easier. We show the second method for the sake of completeness, as well as shedding some light on another GoldenGate utility.

FIRST METHOD:

In this method, we run the replicat executable at the command line, rather than ussing ggsci to run it internally. We pass our replicat parameter file to it, as well as the skiptransaction argument. We could also skip several transactions if we knew the CSN (commit sequence number) at which we wanted to start.

After we wait a few seconds and are confident it has gotten past the transaction to be skipped, we issue a kill against our replicat process, and then restart it. Finally, we show the tail of our ggserr.log file to show the transaction was skipped and that our replicat has been successfully restarted.

expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# replicat paramfile dirprm/cmhrep.prm skiptransaction



Opened trail file /u01/app/oracle/acfsdata/ggate001/oracle/dirdat/r1000005 at 2012-03-11 15:42:13

2012-03-11 15:42:13  INFO    OGG-01370  User requested START SKIPTRANSACTION. The current transaction will be skipped. Transaction ID 44.16.324, position Seqno 5, RBA 377497.

MAP resolved (entry ATL.TEST):
  map ATL.TEST, target cmh.test, sqlexec (id check_conflict, on update, beforefilter, query "select d from cmh.test where c = :p1", params (p1 = c)), filter (on update, before.d <> check_conflict.d, raiseerror 9999);
Using following columns in default map by name:
  C, D

Using the following key columns for target table CMH.TEST: C.


Wildcard MAP resolved (entry ATL.*):
  map ATL.TEST, target cmh.TEST;
Detected duplicate MAP entry.  Using prior MAP specification.

Quit
expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.1.2 OGGCORE_11.1.1.1.2_PLATFORMS_111004.2100
Linux, x86, 32bit (optimized), Oracle 11g on Oct  4 2011 23:53:33

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.



GGSCI (expressdb1) 1> start cmhrep

Sending START request to MANAGER ...
REPLICAT CMHREP starting


GGSCI (expressdb1) 2> exit
expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# tail -f ggserr.log
2012-03-11 15:42:12  INFO    OGG-00995  Oracle GoldenGate Delivery for Oracle, cmhrep.prm:  REPLICAT starting.
2012-03-11 15:42:13  INFO    OGG-00996  Oracle GoldenGate Delivery for Oracle, cmhrep.prm:  REPLICAT started.
2012-03-11 15:42:13  INFO    OGG-01370  Oracle GoldenGate Delivery for Oracle, cmhrep.prm:  User requested START SKIPTRANSACTION. The current transaction will be skipped. Transaction ID 44.16.324, position Seqno 5, RBA 377497.
2012-03-11 15:42:39  INFO    OGG-00987  Oracle GoldenGate Command Interpreter for Oracle:  GGSCI command (oracle): start cmhrep.
2012-03-11 15:42:39  INFO    OGG-00963  Oracle GoldenGate Manager for Oracle, mgr.prm:  Command received from GGSCI on host 192.168.1.50 (START REPLICAT CMHREP ).
2012-03-11 15:42:39  INFO    OGG-00975  Oracle GoldenGate Manager for Oracle, mgr.prm:  REPLICAT CMHREP starting.
2012-03-11 15:42:39  INFO    OGG-00995  Oracle GoldenGate Delivery for Oracle, cmhrep.prm:  REPLICAT CMHREP starting.
2012-03-11 15:42:40  INFO    OGG-00996  Oracle GoldenGate Delivery for Oracle, cmhrep.prm:  REPLICAT CMHREP started.

SECOND METHOD:

We start by showing the error at the bottom of our ggserr.log file.

2012-03-07 20:42:44  WARNING OGG-00869  Oracle GoldenGate Delivery for Oracle, rep1.prm:  OCI Error ORA-00001: unique constraint (CMH.ORDER_PK) violated (status = 1), SQL .
2012-03-07 20:42:44  WARNING OGG-01004  Oracle GoldenGate Delivery for Oracle, rep1.prm:  Aborted grouped transaction on 'CMH.ORDERS', Database error 1 (OCI Error ORA-00001: unique constraint (CMH.ORDER_PK) violated (status = 1), SQL ).
2012-03-07 20:42:44  WARNING OGG-01003  Oracle GoldenGate Delivery for Oracle, rep1.prm:  Repositioning to rba 6343157 in seqno 0.
2012-03-07 20:42:44  WARNING OGG-01154  Oracle GoldenGate Delivery for Oracle, rep1.prm:  SQL error 1 mapping ATL.ORDERS to CMH.ORDERS OCI Error ORA-00001: unique constraint (CMH.ORDER_PK) violated (status = 1), SQL .
2012-03-07 20:42:44  WARNING OGG-01003  Oracle GoldenGate Delivery for Oracle, rep1.prm:  Repositioning to rba 6343157 in seqno 0.
2012-03-07 20:42:44  ERROR   OGG-01296  Oracle GoldenGate Delivery for Oracle, rep1.prm:  Error mapping from ATL.ORDERS to CMH.ORDERS.
2012-03-07 20:42:44  ERROR   OGG-01668  Oracle GoldenGate Delivery for Oracle, rep1.prm:  PROCESS ABENDING.

We then get the current RBA in use by the replicat. This is the actual byte position in the trail file at which the replicat process will issue an fseek() call and begin processing when started.


expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.1.2 OGGCORE_11.1.1.1.2_PLATFORMS_111004.2100
Linux, x86, 32bit (optimized), Oracle 11g on Oct  4 2011 23:53:33

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.



GGSCI (expressdb1) 1> info rep1

REPLICAT   REP1      Last Started 2012-03-07 19:32   Status ABENDED
Checkpoint Lag       00:06:26 (updated 00:24:59 ago)
Log Read Checkpoint  File /u01/app/oracle/acfsdata/ggate001/oracle/dirdat/r1000000
                     2012-03-07 20:36:18.000204  RBA 6343157

From the information above, we need the name of the current trail file (/u01/app/oracle/acfsdata/ggate001/oracle/dirdat/r1000000) and the RBA.

We need to start the replicat process at the next record after the RBA above (6343157), since this one has the problem. Please note that we can’t simply increment the RBA by 1, as the RBA is an address in the file that must start with a format that GoldenGate recognizes as valid. If we change the replicat to start from an invalid address, it will throw an exception similar to the following in the ggserr.log file.

Incompatible record in /u01/app/oracle/acfsdata/ggate001/oracle/dirdat/r1000004, rba 3926317 (getting header).

To get the “real” address of the next record at which we would like to start processing, we use the logdump executable provided as part of the stock GoldenGate installation.

expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# logdump

Oracle GoldenGate Log File Dump Utility
Version 11.1.1.1.2 OGGCORE_11.1.1.1.2_PLATFORMS_111004.2100

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.



Logdump 422 >open /u01/app/oracle/acfsdata/ggate001/oracle/dirdat/r1000000
Current LogTrail is /u01/app/oracle/acfsdata/ggate001/oracle/dirdat/r1000000
Logdump 423 >position 6343157
Reading forward from RBA 6343157
Logdump 424 >next

2012/03/07 20:36:18.000.204 Insert               Len    78 RBA 6343157
Name: ATL.ORDERS
After  Image:                                             Partition 4   G  s
 0000 0005 0000 0001 3100 0100 2800 0000 2437 3032 | ........1...(...$702
 6631 3463 642d 6233 3032 2d34 6261 632d 3863 6161 | f14cd-b302-4bac-8caa
 2d65 3565 6130 6336 6464 3939 3400 0200 1500 0032 | -e5ea0c6dd994......2
 3031 322d 3033 2d30 373a 3230 3a33 363a 3138      | 012-03-07:20:36:18

Logdump 425 >next

2012/03/07 20:54:29.001.164 Insert               Len   113 RBA 6343347
Name: ATL.ORDERS
After  Image:                                             Partition 4   G  b
 0000 0028 0000 0024 6563 3831 3331 3965 2d66 3234 | ...(...$ec81319e-f24
 372d 3438 6337 2d39 3831 662d 6266 3138 3637 3735 | 7-48c7-981f-bf186775
 3938 3366 0001 0028 0000 0024 3462 3462 6565 6330 | 983f...(...$4b4beec0
 2d63 6138 392d 3436 6639 2d38 6262 332d 6333 3665 | -ca89-46f9-8bb3-c36e
 3266 6537 6432 3736 0002 0015 0000 3230 3132 2d30 | 2fe7d276......2012-0
 332d 3037 3a32 303a 3534 3a32 39                  | 3-07:20:54:29

Logdump 426 >

Because we looked at the discard file we specified in the replicat parameter file, we knew the offending record had 1 for the primary key value…yeah, as I noted, the test case is contrived 🙂

In logdump, we first open the trail file obtained above. We then typed position RBA, where RBA is the RBA we obtained in the “info replicat” command above, and then “next”. This takes us to the position in trail file from which GoldenGate would start processing. As noted, we want to move to the next record and obtain its RBA. We simply type next, and see the RBA from which we would like to start processing is 6343347. We then exit from logdump, and alter our replicat to start from this RBA. Please note if your replicat is already running (it couldn’t be in our test case), you will have to stop it before issuing the alter command below.

expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.1.2 OGGCORE_11.1.1.1.2_PLATFORMS_111004.2100
Linux, x86, 32bit (optimized), Oracle 11g on Oct  4 2011 23:53:33

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.



GGSCI (expressdb1) 1> alter replicat rep1, extrba 6343347
REPLICAT altered.


GGSCI (expressdb1) 2> start rep1

Sending START request to MANAGER ...
REPLICAT REP1 starting


GGSCI (expressdb1) 3> exit

Using either method above, if you issue a “stats replicat yourrepname” you should see transactions being processed by the replicat.

13 comments for “Skipping a transaction in GoldenGate

  1. daz
    May 9, 2012 at 11:24 AM

    what if the error is the last transaction, what number do we use. or do we have to force another dummy txn through?

  2. May 9, 2012 at 12:09 PM

    Hi daz,

    Do you mean you have something like the following (syntactically incorrect)…

    insert parent values(1)
    insert child values(‘b’,1,sysdate)
    delete from gl_entries;

    …and you want to skip the last one (the delete from an important table) that you don’t want to go through? If that this the case and you want the first two transactions (the two inserts), you could go two routes:

    1)

    alter the replicat to end at a time (if that is granular enough). Get the timestamp from logdump on the trail file that contains the “bad” transaction. Something like…

    Logdump 824 >open dirdat/r1000071
    Current LogTrail is /u01/app/oracle/acfsmounts/ggate/oracle2/dirdat/r1000071
    Logdump 825 >detail on
    Logdump 826 >ghdr on
    Logdump 827 >reclen 2000
    Reclen set to 2000
    Logdump 828 >skip 50
    Logdump 829 >n
    ___________________________________________________________________
    Hdr-Ind : E (x45) Partition : . (x04)
    UndoFlag : . (x00) BeforeAfter: A (x41)
    RecLength : 73 (x0049) IO Time : 2012/05/08 23:16:32.001.120
    IOType : 5 (x05) OrigNode : 255 (xff)
    TransInd : . (x01) FormatType : R (x52)
    SyskeyLen : 0 (x00) Incomplete : . (x00)
    AuditRBA : 45 AuditPos : 11957120
    Continued : N (x00) RecCount : 1 (x01)

    2012/05/08 23:16:32.001.120 Insert Len 73 RBA 10467
    Name: ATGDB_CORE.SRCH_UPDATE_QUEUE
    After Image: Partition 4 G m
    0000 000c 0000 0008 3635 3334 3737 3543 0001 000a | ……..6534775C….
    0000 0006 3230 3030 3031 0002 000a 0000 0000 0000 | ….200001……….
    0000 006d 0003 000b 0000 0007 7072 6f64 7563 7400 | …m……..product.
    0400 0a00 0000 0000 0000 0000 00 | ………….
    Column 0 (x0000), Len 12 (x000c)
    Column 1 (x0001), Len 10 (x000a)
    Column 2 (x0002), Len 10 (x000a)
    Column 3 (x0003), Len 11 (x000b)
    Column 4 (x0004), Len 10 (x000a)

    Logdump 830 >

    …then…

    alter replicat yourreplicat, end 2012-05-08 23:16:32.001120

    2)

    Use logdump and save the “good” records (the ones leading up to but not including the “bad” transaction) to a file, and run your replicat from that. You could then alter the replicat to start from the RBA just after the “bad” transaction.

    Keep in mind that whenever you start skipping stuff, you take some risk, so make sure you understand the business data and constraints.

    Thanks,

    Steve

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.