Distributed Transactions Trouble Shooting Jp
分布トランザクションのトラブルシューティング

Reference:(Chinese)
http://hi.baidu.com/wa0362/blog/item/93e400efddd841ebcf1b3e7d.html

distruibuted 2pc分散TXをシミュレーションする。

分散処理をコミットした後、速すぎでシミュレーションができないため、コミット前にシミュレーションする。

environment orcl01(ORCL01.TEST.COM),orcl02(ORCL02.TEST.COM) version 10.2.0.2環境orcl01 ( ORCL01.TEST.COM ) 、 orcl02 ( ORCL02.TEST.COM )バージョン10.2.0.2

--- orcl01 dblink:
create public database link pu_link connect to hr identified by hr using 'orcl02';
 
-- orcl01 execute distributed sql:
update hr.employees set salary=salary+1000 where employee_id=200;
update employees@pu_link set salary=salary+1000 where employee_id=200;
-- orcl02 simulate nic failure
ifconfig eth0 down

orcl01 is hanged before commit, and auto-recover when network is recovered.
but the administrator runs shutdown

-- orcl01
shutdown abort

the following is orcl01 alert log and trace file after startup again:

-- altert log
 
Thu Mar 27 17:48:24 2008
SMON: enabling cache recovery
Thu Mar 27 17:48:25 2008
DISTRIB TRAN ORCL01.TEST.COM.8705ca3e.5.32.251
is local tran 5.32.251 (hex=05.20.fb)
insert pending prepared tran, scn=498537 (hex=0.00079b69)
Successfully onlined Undo Tablespace 1.
Thu Mar 27 17:48:26 2008
SMON: enabling tx recovery
Thu Mar 27 17:48:26 2008
Database Characterset is UTF8
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=16, OS id=7938
Thu Mar 27 17:48:34 2008
Completed: ALTER DATABASE OPEN
Thu Mar 27 17:49:22 2008
Errors in file /u01/app/oracle/oracle/product/10.2.0/db_1/admin/orcl01/bdump/orcl01_reco_7032.trc:
ORA-12560: TNS:protocol adapter error
Thu Mar 27 17:49:54 2008
Errors in file /u01/app/oracle/oracle/product/10.2.0/db_1/admin/orcl01/bdump/orcl01_reco_7032.trc:
ORA-12560: TNS:protocol adapter error
 
--- trace file saids cannot connect database, trans 5.32.251
[oracle@dg1 bdump]$ more /u01/app/oracle/oracle/product/10.2.0/db_1/admin/orcl01/bdump/orcl01_reco_7032.trc
。。。
 
*** SERVICE NAME:(SYS$BACKGROUND) 2008-03-27 17:49:22.716
*** SESSION ID:(108.1) 2008-03-27 17:49:22.716
*** 2008-03-27 17:49:22.716
ERROR, tran=5.32.251, session#=1, ose=0:
ORA-12560: TNS:protocol adapter error
*** 2008-03-27 17:49:54.732
ERROR, tran=5.32.251, session#=1, ose=0:
ORA-12560: TNS:protocol adapter error
*** 2008-03-27 17:50:43.748
ERROR, tran=5.32.251, session#=1, ose=0:
ORA-12560: TNS:protocol adapter error
 
-- error is raised when query hr.employees:
SQL> select employee_id,salary from hr.employees;
select employee_id,salary from hr.employees
                                  *
ERROR at line 1:
ORA-01591: lock held by in-doubt distributed transaction 5.32.251

Detail of error message:

ORA-01591: lock held by in-doubt distributed transaction string 
Cause: Trying to access resource that is locked by a dead two-phase commit 
transaction that is in prepared state.
Action: DBA should query the pending_trans$ and related tables, and attempt 
to repair network connection(s) to coordinator and commit point. If timely 
repair is not possible, DBA should contact DBA at commit point if known or 
end user for correct outcome, or use heuristic default if given to issue a 
heuristic commit or abort command to finalize the local portion of the 
distributed transaction.

status of dba_2pc_pending dict
state of 5.32.251 is prepared.
we can find the database is pu_link.test.com with dba_2pc_neighbors dict.

SQL> col local_tran_id format a13
SQL> col global_tran_id format a30
SQL> col state format a8
SQL> col mixed format a3
SQL> col host format a10
SQL> col commit# format a10
SQL> select local_tran_id, global_tran_id, state, mixed, host, commit#
2 from dba_2pc_pending;
 
LOCAL_TRAN_ID GLOBAL_TRAN_ID                            STATE       MIX   HOST       COMMIT#
--------------------   ------------------------------------------     --------      ---    ----------   ----------
5.32.251               ORCL01.TEST.COM.8705ca3e.5.32. prepared   no    dg1          498537
                             251
 
SQL> col local_tran_id format a13
SQL> col in_out format a6
SQL> col database format a25
SQL> col dbuser_owner format a15
SQL> col interface format a3
SQL> select local_tran_id, in_out, database, dbuser_owner, interface 
2 from dba_2pc_neighbors;
 
LOCAL_TRAN_ID IN_OUT DATABASE                  DBUSER_OWNER    INT
--------------------    --------- -------------------------     ------------------          ---
5.32.251             in                                                SYS                           N
5.32.251               out         PU_LINK.TEST.COM     SYS                           C

we have 2 choices
1. commit/rollback manually
2. rollback after fix environment

under product environment, we many have no time to execute choice 2.

rollback manually
based on prepared status, until orcl01 under prepared status, orcl02 cannot be commited.
to keep the consistence we had better

rollback force '5.32.251'

commit force

SQL> COMMIT FORCE '5.32.251';
 
Commit complete.
 
orcl01
SQL> select employee_id,salary from hr.employees;
 
EMPLOYEE_ID     SALARY
------------------    ----------
        198             2600
        199            2600
        200            4800
orcl02
SQL> select employee_id,salary from hr.employees;
 
EMPLOYEE_ID     SALARY
---------------------    ----------
        198               2600
        199               2600
        200               4700
 
SQL> select local_tran_id, global_tran_id, state, mixed, host, commit#
2 from dba_2pc_pending;
 
LOCAL_TRAN_ID GLOBAL_TRAN_ID                            STATE         MIX HOST       COMMIT#
-------------             -------------------------------------------       -----------     ---   ----------     ----------
5.32.251               ORCL01.TEST.COM.8705ca3e.5.32. forced c     no    dg1
                             251                                                       ommit
 
DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.32.251');