分布トランザクションのトラブルシューティング
Reference:(Chinese)
http://hi.baidu.com/wa0362/blog/item/93e400efddd841ebcf1b3e7d.html
distruibuted 2pc分散TXをシミュレーションする。
分散処理をコミットした後、速すぎでシミュレーションができないため、コミット前にシミュレーションする。
environment orcl01(ORCL01.TEST.COM),orcl02(ORCL02.TEST.COM) version 10.2.0.2環境orcl01 ( ORCL01.TEST.COM ) 、 orcl02 ( ORCL02.TEST.COM )バージョン10.2.0.2
--- orcl01 dblink: create public database link pu_link connect to hr identified by hr using 'orcl02'; -- orcl01 execute distributed sql: update hr.employees set salary=salary+1000 where employee_id=200; update employees@pu_link set salary=salary+1000 where employee_id=200;
-- orcl02 simulate nic failure ifconfig eth0 down
orcl01 is hanged before commit, and auto-recover when network is recovered.
but the administrator runs shutdown
-- orcl01 shutdown abort
the following is orcl01 alert log and trace file after startup again:
-- altert log Thu Mar 27 17:48:24 2008 SMON: enabling cache recovery Thu Mar 27 17:48:25 2008 DISTRIB TRAN ORCL01.TEST.COM.8705ca3e.5.32.251 is local tran 5.32.251 (hex=05.20.fb) insert pending prepared tran, scn=498537 (hex=0.00079b69) Successfully onlined Undo Tablespace 1. Thu Mar 27 17:48:26 2008 SMON: enabling tx recovery Thu Mar 27 17:48:26 2008 Database Characterset is UTF8 replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC QMNC started with pid=16, OS id=7938 Thu Mar 27 17:48:34 2008 Completed: ALTER DATABASE OPEN Thu Mar 27 17:49:22 2008 Errors in file /u01/app/oracle/oracle/product/10.2.0/db_1/admin/orcl01/bdump/orcl01_reco_7032.trc: ORA-12560: TNS:protocol adapter error Thu Mar 27 17:49:54 2008 Errors in file /u01/app/oracle/oracle/product/10.2.0/db_1/admin/orcl01/bdump/orcl01_reco_7032.trc: ORA-12560: TNS:protocol adapter error --- trace file saids cannot connect database, trans 5.32.251 [oracle@dg1 bdump]$ more /u01/app/oracle/oracle/product/10.2.0/db_1/admin/orcl01/bdump/orcl01_reco_7032.trc 。。。 *** SERVICE NAME:(SYS$BACKGROUND) 2008-03-27 17:49:22.716 *** SESSION ID:(108.1) 2008-03-27 17:49:22.716 *** 2008-03-27 17:49:22.716 ERROR, tran=5.32.251, session#=1, ose=0: ORA-12560: TNS:protocol adapter error *** 2008-03-27 17:49:54.732 ERROR, tran=5.32.251, session#=1, ose=0: ORA-12560: TNS:protocol adapter error *** 2008-03-27 17:50:43.748 ERROR, tran=5.32.251, session#=1, ose=0: ORA-12560: TNS:protocol adapter error -- error is raised when query hr.employees: SQL> select employee_id,salary from hr.employees; select employee_id,salary from hr.employees * ERROR at line 1: ORA-01591: lock held by in-doubt distributed transaction 5.32.251
Detail of error message:
ORA-01591: lock held by in-doubt distributed transaction string
Cause: Trying to access resource that is locked by a dead two-phase commit
transaction that is in prepared state.
Action: DBA should query the pending_trans$ and related tables, and attempt
to repair network connection(s) to coordinator and commit point. If timely
repair is not possible, DBA should contact DBA at commit point if known or
end user for correct outcome, or use heuristic default if given to issue a
heuristic commit or abort command to finalize the local portion of the
distributed transaction.
status of dba_2pc_pending dict
state of 5.32.251 is prepared.
we can find the database is pu_link.test.com with dba_2pc_neighbors dict.
SQL> col local_tran_id format a13 SQL> col global_tran_id format a30 SQL> col state format a8 SQL> col mixed format a3 SQL> col host format a10 SQL> col commit# format a10 SQL> select local_tran_id, global_tran_id, state, mixed, host, commit# 2 from dba_2pc_pending; LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE MIX HOST COMMIT# -------------------- ------------------------------------------ -------- --- ---------- ---------- 5.32.251 ORCL01.TEST.COM.8705ca3e.5.32. prepared no dg1 498537 251 SQL> col local_tran_id format a13 SQL> col in_out format a6 SQL> col database format a25 SQL> col dbuser_owner format a15 SQL> col interface format a3 SQL> select local_tran_id, in_out, database, dbuser_owner, interface 2 from dba_2pc_neighbors; LOCAL_TRAN_ID IN_OUT DATABASE DBUSER_OWNER INT -------------------- --------- ------------------------- ------------------ --- 5.32.251 in SYS N 5.32.251 out PU_LINK.TEST.COM SYS C
we have 2 choices
1. commit/rollback manually
2. rollback after fix environment
under product environment, we many have no time to execute choice 2.
rollback manually
based on prepared status, until orcl01 under prepared status, orcl02 cannot be commited.
to keep the consistence we had better
rollback force '5.32.251'
commit force
SQL> COMMIT FORCE '5.32.251'; Commit complete. orcl01 SQL> select employee_id,salary from hr.employees; EMPLOYEE_ID SALARY ------------------ ---------- 198 2600 199 2600 200 4800 orcl02 SQL> select employee_id,salary from hr.employees; EMPLOYEE_ID SALARY --------------------- ---------- 198 2600 199 2600 200 4700 SQL> select local_tran_id, global_tran_id, state, mixed, host, commit# 2 from dba_2pc_pending; LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE MIX HOST COMMIT# ------------- ------------------------------------------- ----------- --- ---------- ---------- 5.32.251 ORCL01.TEST.COM.8705ca3e.5.32. forced c no dg1 251 ommit DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('5.32.251');