Skipping a transaction in GoldenGate

We are the midst of a mission critical implementation that involves an active-active setup. We have chosen GoldenGate to manage the distribution of data between the two sites (soon to be three).

This is the first of a series of articles and tips on our findings as we use the software.

We have spent a lot of time understanding the conflict detection and resolution capabilities of the software, and wanted to “break it” so we could document how to fix it. In our case, we simply inserted the exact same row on each database at the same time. Each failed with an ORA-0001 error, otherwise known as “unique constraint violated”.

Since this was a contrived test case, the fix was fairly simple. We simply wanted to skip the transaction on each database and continue with replicat processing. In real life, it will rarely if ever be this simple in terms of a “real” problem.

There are two ways to do this, the first of which is much easier. We show the second method for the sake of completeness, as well as shedding some light on another GoldenGate utility.

FIRST METHOD:

In this method, we run the replicat executable at the command line, rather than ussing ggsci to run it internally. We pass our replicat parameter file to it, as well as the skiptransaction argument. We could also skip several transactions if we knew the CSN (commit sequence number) at which we wanted to start.

After we wait a few seconds and are confident it has gotten past the transaction to be skipped, we issue a kill against our replicat process, and then restart it. Finally, we show the tail of our ggserr.log file to show the transaction was skipped and that our replicat has been successfully restarted.

expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# replicat paramfile dirprm/cmhrep.prm skiptransaction



Opened trail file /u01/app/oracle/acfsdata/ggate001/oracle/dirdat/r1000005 at 2012-03-11 15:42:13

2012-03-11 15:42:13  INFO    OGG-01370  User requested START SKIPTRANSACTION. The current transaction will be skipped. Transaction ID 44.16.324, position Seqno 5, RBA 377497.

MAP resolved (entry ATL.TEST):
  map ATL.TEST, target cmh.test, sqlexec (id check_conflict, on update, beforefilter, query "select d from cmh.test where c = :p1", params (p1 = c)), filter (on update, before.d <> check_conflict.d, raiseerror 9999);
Using following columns in default map by name:
  C, D

Using the following key columns for target table CMH.TEST: C.


Wildcard MAP resolved (entry ATL.*):
  map ATL.TEST, target cmh.TEST;
Detected duplicate MAP entry.  Using prior MAP specification.

Quit
expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.1.2 OGGCORE_11.1.1.1.2_PLATFORMS_111004.2100
Linux, x86, 32bit (optimized), Oracle 11g on Oct  4 2011 23:53:33

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.



GGSCI (expressdb1) 1> start cmhrep

Sending START request to MANAGER ...
REPLICAT CMHREP starting


GGSCI (expressdb1) 2> exit
expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# tail -f ggserr.log
2012-03-11 15:42:12  INFO    OGG-00995  Oracle GoldenGate Delivery for Oracle, cmhrep.prm:  REPLICAT starting.
2012-03-11 15:42:13  INFO    OGG-00996  Oracle GoldenGate Delivery for Oracle, cmhrep.prm:  REPLICAT started.
2012-03-11 15:42:13  INFO    OGG-01370  Oracle GoldenGate Delivery for Oracle, cmhrep.prm:  User requested START SKIPTRANSACTION. The current transaction will be skipped. Transaction ID 44.16.324, position Seqno 5, RBA 377497.
2012-03-11 15:42:39  INFO    OGG-00987  Oracle GoldenGate Command Interpreter for Oracle:  GGSCI command (oracle): start cmhrep.
2012-03-11 15:42:39  INFO    OGG-00963  Oracle GoldenGate Manager for Oracle, mgr.prm:  Command received from GGSCI on host 192.168.1.50 (START REPLICAT CMHREP ).
2012-03-11 15:42:39  INFO    OGG-00975  Oracle GoldenGate Manager for Oracle, mgr.prm:  REPLICAT CMHREP starting.
2012-03-11 15:42:39  INFO    OGG-00995  Oracle GoldenGate Delivery for Oracle, cmhrep.prm:  REPLICAT CMHREP starting.
2012-03-11 15:42:40  INFO    OGG-00996  Oracle GoldenGate Delivery for Oracle, cmhrep.prm:  REPLICAT CMHREP started.

SECOND METHOD:

We start by showing the error at the bottom of our ggserr.log file.

2012-03-07 20:42:44  WARNING OGG-00869  Oracle GoldenGate Delivery for Oracle, rep1.prm:  OCI Error ORA-00001: unique constraint (CMH.ORDER_PK) violated (status = 1), SQL .
2012-03-07 20:42:44  WARNING OGG-01004  Oracle GoldenGate Delivery for Oracle, rep1.prm:  Aborted grouped transaction on 'CMH.ORDERS', Database error 1 (OCI Error ORA-00001: unique constraint (CMH.ORDER_PK) violated (status = 1), SQL ).
2012-03-07 20:42:44  WARNING OGG-01003  Oracle GoldenGate Delivery for Oracle, rep1.prm:  Repositioning to rba 6343157 in seqno 0.
2012-03-07 20:42:44  WARNING OGG-01154  Oracle GoldenGate Delivery for Oracle, rep1.prm:  SQL error 1 mapping ATL.ORDERS to CMH.ORDERS OCI Error ORA-00001: unique constraint (CMH.ORDER_PK) violated (status = 1), SQL .
2012-03-07 20:42:44  WARNING OGG-01003  Oracle GoldenGate Delivery for Oracle, rep1.prm:  Repositioning to rba 6343157 in seqno 0.
2012-03-07 20:42:44  ERROR   OGG-01296  Oracle GoldenGate Delivery for Oracle, rep1.prm:  Error mapping from ATL.ORDERS to CMH.ORDERS.
2012-03-07 20:42:44  ERROR   OGG-01668  Oracle GoldenGate Delivery for Oracle, rep1.prm:  PROCESS ABENDING.

We then get the current RBA in use by the replicat. This is the actual byte position in the trail file at which the replicat process will issue an fseek() call and begin processing when started.


expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.1.2 OGGCORE_11.1.1.1.2_PLATFORMS_111004.2100
Linux, x86, 32bit (optimized), Oracle 11g on Oct  4 2011 23:53:33

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.



GGSCI (expressdb1) 1> info rep1

REPLICAT   REP1      Last Started 2012-03-07 19:32   Status ABENDED
Checkpoint Lag       00:06:26 (updated 00:24:59 ago)
Log Read Checkpoint  File /u01/app/oracle/acfsdata/ggate001/oracle/dirdat/r1000000
                     2012-03-07 20:36:18.000204  RBA 6343157

From the information above, we need the name of the current trail file (/u01/app/oracle/acfsdata/ggate001/oracle/dirdat/r1000000) and the RBA.

We need to start the replicat process at the next record after the RBA above (6343157), since this one has the problem. Please note that we can’t simply increment the RBA by 1, as the RBA is an address in the file that must start with a format that GoldenGate recognizes as valid. If we change the replicat to start from an invalid address, it will throw an exception similar to the following in the ggserr.log file.

Incompatible record in /u01/app/oracle/acfsdata/ggate001/oracle/dirdat/r1000004, rba 3926317 (getting header).

To get the “real” address of the next record at which we would like to start processing, we use the logdump executable provided as part of the stock GoldenGate installation.

expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# logdump

Oracle GoldenGate Log File Dump Utility
Version 11.1.1.1.2 OGGCORE_11.1.1.1.2_PLATFORMS_111004.2100

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.



Logdump 422 >open /u01/app/oracle/acfsdata/ggate001/oracle/dirdat/r1000000
Current LogTrail is /u01/app/oracle/acfsdata/ggate001/oracle/dirdat/r1000000
Logdump 423 >position 6343157
Reading forward from RBA 6343157
Logdump 424 >next

2012/03/07 20:36:18.000.204 Insert               Len    78 RBA 6343157
Name: ATL.ORDERS
After  Image:                                             Partition 4   G  s
 0000 0005 0000 0001 3100 0100 2800 0000 2437 3032 | ........1...(...$702
 6631 3463 642d 6233 3032 2d34 6261 632d 3863 6161 | f14cd-b302-4bac-8caa
 2d65 3565 6130 6336 6464 3939 3400 0200 1500 0032 | -e5ea0c6dd994......2
 3031 322d 3033 2d30 373a 3230 3a33 363a 3138      | 012-03-07:20:36:18

Logdump 425 >next

2012/03/07 20:54:29.001.164 Insert               Len   113 RBA 6343347
Name: ATL.ORDERS
After  Image:                                             Partition 4   G  b
 0000 0028 0000 0024 6563 3831 3331 3965 2d66 3234 | ...(...$ec81319e-f24
 372d 3438 6337 2d39 3831 662d 6266 3138 3637 3735 | 7-48c7-981f-bf186775
 3938 3366 0001 0028 0000 0024 3462 3462 6565 6330 | 983f...(...$4b4beec0
 2d63 6138 392d 3436 6639 2d38 6262 332d 6333 3665 | -ca89-46f9-8bb3-c36e
 3266 6537 6432 3736 0002 0015 0000 3230 3132 2d30 | 2fe7d276......2012-0
 332d 3037 3a32 303a 3534 3a32 39                  | 3-07:20:54:29

Logdump 426 >

Because we looked at the discard file we specified in the replicat parameter file, we knew the offending record had 1 for the primary key value…yeah, as I noted, the test case is contrived 🙂

In logdump, we first open the trail file obtained above. We then typed position RBA, where RBA is the RBA we obtained in the “info replicat” command above, and then “next”. This takes us to the position in trail file from which GoldenGate would start processing. As noted, we want to move to the next record and obtain its RBA. We simply type next, and see the RBA from which we would like to start processing is 6343347. We then exit from logdump, and alter our replicat to start from this RBA. Please note if your replicat is already running (it couldn’t be in our test case), you will have to stop it before issuing the alter command below.

expressdb1:oracle:ecomm1:/u01/app/oracle/acfsdata/ggate001/oracle# ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.1.2 OGGCORE_11.1.1.1.2_PLATFORMS_111004.2100
Linux, x86, 32bit (optimized), Oracle 11g on Oct  4 2011 23:53:33

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.



GGSCI (expressdb1) 1> alter replicat rep1, extrba 6343347
REPLICAT altered.


GGSCI (expressdb1) 2> start rep1

Sending START request to MANAGER ...
REPLICAT REP1 starting


GGSCI (expressdb1) 3> exit

Using either method above, if you issue a “stats replicat yourrepname” you should see transactions being processed by the replicat.

13 comments for “Skipping a transaction in GoldenGate

  1. daz
    May 9, 2012 at 11:24 AM

    what if the error is the last transaction, what number do we use. or do we have to force another dummy txn through?

  2. May 9, 2012 at 12:09 PM

    Hi daz,

    Do you mean you have something like the following (syntactically incorrect)…

    insert parent values(1)
    insert child values(‘b’,1,sysdate)
    delete from gl_entries;

    …and you want to skip the last one (the delete from an important table) that you don’t want to go through? If that this the case and you want the first two transactions (the two inserts), you could go two routes:

    1)

    alter the replicat to end at a time (if that is granular enough). Get the timestamp from logdump on the trail file that contains the “bad” transaction. Something like…

    Logdump 824 >open dirdat/r1000071
    Current LogTrail is /u01/app/oracle/acfsmounts/ggate/oracle2/dirdat/r1000071
    Logdump 825 >detail on
    Logdump 826 >ghdr on
    Logdump 827 >reclen 2000
    Reclen set to 2000
    Logdump 828 >skip 50
    Logdump 829 >n
    ___________________________________________________________________
    Hdr-Ind : E (x45) Partition : . (x04)
    UndoFlag : . (x00) BeforeAfter: A (x41)
    RecLength : 73 (x0049) IO Time : 2012/05/08 23:16:32.001.120
    IOType : 5 (x05) OrigNode : 255 (xff)
    TransInd : . (x01) FormatType : R (x52)
    SyskeyLen : 0 (x00) Incomplete : . (x00)
    AuditRBA : 45 AuditPos : 11957120
    Continued : N (x00) RecCount : 1 (x01)

    2012/05/08 23:16:32.001.120 Insert Len 73 RBA 10467
    Name: ATGDB_CORE.SRCH_UPDATE_QUEUE
    After Image: Partition 4 G m
    0000 000c 0000 0008 3635 3334 3737 3543 0001 000a | ……..6534775C….
    0000 0006 3230 3030 3031 0002 000a 0000 0000 0000 | ….200001……….
    0000 006d 0003 000b 0000 0007 7072 6f64 7563 7400 | …m……..product.
    0400 0a00 0000 0000 0000 0000 00 | ………….
    Column 0 (x0000), Len 12 (x000c)
    Column 1 (x0001), Len 10 (x000a)
    Column 2 (x0002), Len 10 (x000a)
    Column 3 (x0003), Len 11 (x000b)
    Column 4 (x0004), Len 10 (x000a)

    Logdump 830 >

    …then…

    alter replicat yourreplicat, end 2012-05-08 23:16:32.001120

    2)

    Use logdump and save the “good” records (the ones leading up to but not including the “bad” transaction) to a file, and run your replicat from that. You could then alter the replicat to start from the RBA just after the “bad” transaction.

    Keep in mind that whenever you start skipping stuff, you take some risk, so make sure you understand the business data and constraints.

    Thanks,

    Steve

  3. daz
    May 10, 2012 at 3:53 AM

    sorry i realised i was too brief and wasn’t clear enough. your reply has great info though 🙂

    what i mean is if i want to skip a whole transaction using logdump, you say type “next” but if the final transaction was the transaction i want to skip. is that possible? as theres no following “extrba” number we can put in.

    on a test VM system i have set up 2 11g Oracle databases that are 2 way replicated, and deleted a row that didnt exist in one of the dbs .. so i see:
    GGSCI (darren-linux64) 1>
    EXTRACT EXT1 Last Started 2012-05-09 16:16 Status RUNNING
    Checkpoint Lag 00:00:00 (updated 00:00:02 ago)
    Log Read Checkpoint Oracle Redo Logs
    2012-05-10 09:48:40 Seqno 328, RBA 16411648
    SCN 1509.2381968671 (6483487618335)

    EXTRACT EXT2 Last Started 2012-05-09 16:47 Status RUNNING
    Checkpoint Lag 00:00:00 (updated 00:00:08 ago)
    Log Read Checkpoint Oracle Redo Logs
    2012-05-10 09:48:36 Seqno 78, RBA 3127296
    SCN 0.956824 (956824)

    REPLICAT REP1 Last Started 2012-05-09 17:22 Status ABENDED
    Checkpoint Lag 00:34:24 (updated 16:27:01 ago)
    Log Read Checkpoint File /u01/app/goldengate/11.2.1.0.0/dirdat/rt000000
    2012-05-09 16:48:18.037525 RBA 2451

    REPLICAT REP2 Last Started 2012-05-09 16:39 Status RUNNING
    Checkpoint Lag 00:00:00 (updated 00:00:07 ago)
    Log Read Checkpoint File /u01/oracle/software/goldengate/dirdat/r2000000
    First Record RBA 0

    GGSCI (darren-linux64) 2> info rep1

    REPLICAT REP1 Last Started 2012-05-09 17:22 Status ABENDED
    Checkpoint Lag 00:34:24 (updated 16:27:51 ago)
    Log Read Checkpoint File /u01/app/goldengate/11.2.1.0.0/dirdat/rt000000
    2012-05-09 16:48:18.037525 RBA 2451

    GGSCI (darren-linux64) 3>

    in logdump i try the “Next” method to skip the failed TXN and just get REP1 restarted:

    Logdump 16 >open /u01/app/goldengate/11.2.1.0.0/dirdat/rt000000
    Current LogTrail is /u01/app/goldengate/11.2.1.0.0/dirdat/rt000000
    Logdump 17 >position 2451
    Reading forward from RBA 2451
    Logdump 18 >next

    2012/05/09 16:48:18.037.525 Delete Len 42 RBA 2451
    Name: TEST.TEST_TAB
    Before Image: Partition 4 G e
    0000 000a 0000 0000 0000 0000 0001 0001 000a 0000 | ………………..
    0000 0000 0000 0001 0002 000a 0000 0000 0000 0000 | ………………..
    0002 | ..

    Logdump 20 >next
    Logdump 21 >

    but as that was the final replicated TXN ..i cant get a number to put in. how do we skip the transaction in this case, is the only way to put though a good transaction to follow it?

  4. May 10, 2012 at 1:34 PM

    Hi Darren,

    I *think* I see what you are asking. So you have a brand new replicat (I assume by the 00000 number), and the last transaction in the trail is the one you want to skip?

    I would issue an…

    alter extract pumpextractname, etrollover

    …on the source, and on the target…

    stop replicat replicatname
    alter replicat replicatname, extseqno 1
    start replicat replicatname

    That should start a brand new extract and replicat trail file to talk with each other, and from which you can start processing transactions.

    Thanks,

    Steve

  5. May 31, 2012 at 12:41 PM

    Great post.
    I posted the link on my blog.
    This was the only writeup that helped me fix a record problem over the weekend.

    Much Appreciated.

  6. power
    October 25, 2012 at 3:59 PM

    Hi steve,

    I follow your posts , they are good and makes sense and easy to understand.

    I have skipped many transactions using logdump and how can I see all those skipped trasactions now (and info about the tablename and schemaname relating to the skipped transactions), To what extent skipping the transactions will effect the gg replication.

    My replicat is abended due an GGS error 218 Error mapping sour.table to targe.table (the error is due to an update sql on the tabele)(both the schema and table names are identical in the source and target dbs)
    1.Supplemental logging is enable(db level(MIN) and table level)
    2.structure of the table is identical on both sides.
    3.No triggers or contraints are defined on table
    4.Present no.of row on both side are equal for the table
    5.discard file has many discards with the update SQL of this table. the discards are due to (Aborted compressed key update from sour.table to targe.table (target format).

    How can I see the complete sql which replicat is trying to replicate on target(Logdump does not give complete sql)

    Help me out to start the replicat back. and also let me know what are the things to follow to troubleshoot the abended extracts and replicats)

  7. October 25, 2012 at 9:11 PM

    **To what extent skipping the transactions will effect the gg replication.**

    It depends on what you skipped. You can break data on the target if you skip things that in fact should not be skipped. It really depends on your application. If you skip deletes that don’t exist on the target, that may not be a problem. If you skip an update to a quantity column in an inventory table, that will most likely be a problem.

    **My replicat is abended due an GGS error 218 Error mapping sour.table to targe.table (the error is due to an update sql on the tabele)(both the schema and table names are identical in the source and target dbs)
    1.Supplemental logging is enable(db level(MIN) and table level)
    2.structure of the table is identical on both sides.
    3.No triggers or contraints are defined on table
    4.Present no.of row on both side are equal for the table
    5.discard file has many discards with the update SQL of this table. the discards are due to (Aborted compressed key update from sour.table to targe.table (target format).**

    Post the error in the ggserr.log file as well as your replicat parameter file.

    **How can I see the complete sql which replicat is trying to replicate on target(Logdump does not give complete sql)**

    Add showsyntax to your replicat parameter file, and it should show the SQL for failed transactions.

    **Help me out to start the replicat back. and also let me know what are the things to follow to troubleshoot the abended extracts and replicats**

    Keep in mind you shouldn’t have that many issues to begin with. If you do, you really need to go back to the drawing board and ensure your design is sound.

    If you have in fact been manually skipping transactions, there really is no automated way to print those. If you have been using a filter in the replicat, you can configure it to print the reason when a rule is not met.

    I would encourage you to take a good GoldenGate class. It sounds like you know the basic syntax, and a good instructor would probably really get you over the hump.

  8. power
    October 25, 2012 at 11:57 PM

    2012-10-20 01:00:46 GGS INFO 320 Oracle GoldenGate Delivery for Oracle, replicatetar.prm: REPLICAT REPLICATETAR starting.
    2012-10-20 01:00:46 GGS INFO 320 Oracle GoldenGate Delivery for Oracle, replicatetar.prm: REPLICAT REPLICATETAR started.
    2012-10-20 01:00:46 GGS WARNING 109 Oracle GoldenGate Delivery for Oracle, replicatetar.prm: No unique key is defined for table tablename. All viable columns will be used to represent the key, but may not guarantee uniqueness. KEYCOLS may be used to define the key.
    2012-10-20 01:00:46 GGS WARNING 218 Oracle GoldenGate Delivery for Oracle, replicatetar.prm: Aborted grouped transaction on ‘sourc.table’, Database error 100 (retrieving bind info for query).
    2012-10-20 01:00:46 GGS WARNING 218 Oracle GoldenGate Delivery for Oracle, replicatetar.prm: Repositioning to rba 176692764 in seqno 101.
    2012-10-20 01:00:46 GGS WARNING 218 Oracle GoldenGate Delivery for Oracle, replicatetar.prm: SQL error 1403 mapping source.table to target.table.
    2012-10-20 01:00:46 GGS WARNING 218 Oracle GoldenGate Delivery for Oracle, replicatetar.prm: Repositioning to rba 176692764 in seqno 101.
    2012-10-20 01:00:46 GGS ERROR 218 Oracle GoldenGate Delivery for Oracle, replicatetar.prm: Error mapping from source.table to target.table.
    2012-10-20 01:00:46 GGS ERROR 190 Oracle GoldenGate Delivery for Oracle, replicatetar.prm: PROCESS ABENDING.

    Thansks for your info.

  9. power
    October 26, 2012 at 10:44 AM

    Replicat paramfile
    replicat replicatetar
    assumetargetdefs
    discardfile /location/10.4/dirdsc/replicatetar.dsc, MEGABYTES 50, APPEND
    discardrollover at 10:00 on monday
    userid usergg , password *********
    DDL INCLUDE MAPPED &
    EXCLUDE INSTR ‘TRIGGER’ &
    EXCLUDE INSTR ‘@’ &
    INCLUDE ALL objtype PUBLICSYNONYM &
    INCLUDE objtype ROLE
    DDLSUBST ‘ON DELETE CASCADE’ WITH ‘DISABLE’ INCLUDE ALL
    MAP SOURCE.* , TARGET SCHEMA.*;
    APPLYNOOPUPDATES

  10. October 26, 2012 at 11:54 AM

    Add showsyntax to the replicat parameter file, and it should show what the failing SQL is.

    See page 16 of the following URL…

    http://docs.oracle.com/cd/E35209_01/doc.1121/e35180.pdf

  11. Dave
    January 2, 2013 at 7:09 PM

    Hey Steve,

    Good post. However, I have a question about the following

    “alter replicat yourreplicat, end 2012-05-08 23:16:32.001120”

    I don’t see this in the OGG reference as one of the alter replicat options. Please advise.

    Thanks
    Dave

  12. January 3, 2013 at 6:57 PM

    Hi Dave,

    You are correct, it is a parameter, but it is only applicable for the parameter file and not a runtime change.

    See page 193 at http://docs.oracle.com/cd/E35209_01/doc.1121/e29399.pdf for details.

    Thanks,

    Steve

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.