客服微信
作者:炎燚小寳
原文链接:TDPUB数加社区
一、事件概述
客户反馈OGG进程无法停止,提示有大事务在运行,无法停止,最终定位到异常会话的游标没有正常关闭导致
二、故障分析
(一)故障现象
数据库2节点OGG的EXT进程无法停止。
(二)故障分析
1. 查看OGG日志。
GGSCI (xx2) 8> stop extract EJ1_FP Sending STOP request to EXTRACT EJ1_FP ... STOP request pending. There are open, long-running transactions. Before you stop Extract, make the archives containing data for those transactions available for when Extract restarts. To force Extract to stop, use the SEND EXTRACT EJ1_FP, FORCESTOP command. Oldest redo log file necessary to restart Extract is: Redo Log Sequence Number 616, RBA 135673360. 2022-10-21 10:55:41 WARNING OGG-01742 Command sent to EXTRACT EJ1_FP returned with an invalid response.
2. 查看当前的长事务。
GGSCI (xx2) 8> send extract EJ1_FP,showtrans duration 20 MIN Sending showtrans request to EXTRACT EJ1_FP ... Oldest redo log files necessary to restart Extract are: Redo Thread 1, Redo Log Sequence Number 47308, SCN 4585.1856782301 (19694281834461), RBA 575467024 Redo Thread 2, Redo Log Sequence Number 70563, SCN 4585.1916741249 (19694341793409), RBA 310660624 ------------------------------------------------------------ XID: 214.19.198565 Items: 0 Extract: EJ1_FP Redo Thread: 1 Start Time: 2022-10-13:09:40:20 SCN: 4585.1856782301 (19694281834461) Redo Seq: 47308 Redo RBA: 575467024 Status: Running ------------------------------------------------------------ XID: 234.31.119068 Items: 0 Extract: EJ1_FP Redo Thread: 1 Start Time: 2022-10-13:10:45:36 SCN: 4585.1858630286 (19694283682446) Redo Seq: 47378 Redo RBA: 817386000 Status: Running ------------------------------------------------------------ XID: 6.19.504269 Items: 0 Extract: EJ1_FP Redo Thread: 1 Start Time: 2022-10-18:13:22:51 SCN: 4585.1922052463 (19694347104623) Redo Seq: 47529 Redo RBA: 246746640 Status: Running ------------------------------------------------------------ XID: 54.10.981207 Items: 0 Extract: EJ1_FP Redo Thread: 2 Start Time: 2022-10-18:08:19:13 SCN: 4585.1916741249 (19694341793409) Redo Seq: 70563 Redo RBA: 310660624 Status: Running
3. 通过UNDO事务定位到SQL源头
select inst_id,saddr,sid,serial#,paddr,username,sql_id,status from gv$session where taddr in (select addr from gv$transaction where xidusn in (54,6,234,214)) 登录用户为sys,程序类型为sqlplus发起 SQLID:g72kdvcacxvtf DELETE FROM PLAN_TABLE WHERE STATEMENT_ID=:1 SQLID:9babjv8yq8ru3 BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;
4. 综合以上日志初步分析:ORACLE在执行DELETE FROM PLAN_TABLE WHERE STATEMENT_ID=:1一直未提交,导致长事务产生,进一步影响OGG进程关闭。
三、故障定位
根据以上日志报告分析,ORACLE内部递归调用的删除SQL一直未提交,产生长事务,为此次故障的根本原因。
四、故障解决措施及建议
(一)解决措施
如果我们单纯为了解决问题,KILL对应的长事务即可。但如果想知道的问题根源,该怎么办?
五、模拟重现长事务等待
SQL中涉及的目标表是PLAN_TABLE,跟执行计划有关,同时源头用户是SYS且程序为sqlplus,是不是有会话通过操作系统认证登录数据库查看执行计划了?而DBMS_OUTPUT.GET_LINES函数只是输出执行计划附带的而已。
接下来进行模拟测试
---session1--- sqlplus / as sysdba登录数据库,生成一条SQL的执行计划,不退出会话。此操作用10046记录。 -bash-4.1$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.4.0 Production on Sat Oct 15 06:15:39 2022 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> oradebug setmypid alter session set events '10046 trace name context forever ,level 12'; Statement processed. SQL> Session altered. SQL> set autot trace SQL> select 1 from t1; Execution Plan ---------------------------------------------------------- Plan hash value: 2990879143 ----------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | ----------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1 (0)| 00:00:01 | | 1 | INDEX FULL SCAN | IDX_IDNAME | 1 | 1 (0)| 00:00:01 | ----------------------------------------------------------------------- Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 1 consistent gets 0 physical reads 0 redo size 519 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL> alter session set events '10046 trace name context off'; Session altered. SQL> oradebug close_trace Statement processed. SQL> oradebug tracefile_name /u01/app/oracle/diag/rdbms/d11gr4/d11gr4/trace/d11gr4_ora_14810.trc ---session2--- 查看是否有INACTIVE的事务 SQL> SELECT ktuxesiz FROM x$ktuxe WHERE ktuxesta != 'INACTIVE'; KTUXESIZ ---------- 2 SQL> select addr,xidusn,xidslot,xidsqn,status from v$transaction ; ADDR XIDUSN XIDSLOT XIDSQN STATUS ---------------- ---------- ---------- ---------- ---------------- 00000000E6B17E38 9 24 1605 ACTIVE SQL> select sid,username,sql_id,PREV_SQL_ID,sql_hash_value from v$session where taddr='00000000E6B17E38'; SID USERNAME SQL_ID PREV_SQL_ID SQL_HASH_VALUE ---------- ------------------------------ ------------- ------------- -------------- 152 SYS g72kdvcacxvtf 0 SQL> select sql_fulltext from v$sql t where t.sql_id='g72kdvcacxvtf'; SQL_FULLTEXT -------------------------------------------------------------------------------- DELETE FROM PLAN_TABLE WHERE STATEMENT_ID=:1 DELETE FROM PLAN_TABLE WHERE STATEMENT_ID=:1 DELETE FROM PLAN_TABLE WHERE STATEMENT_ID=:1 --session 3-- 查看10046日志,果然找到上述SQL PARSING IN CURSOR #18446604434621518272 len=45 dep=0 uid=0 oct=7 lid=0 tim=237421908196 hv=349105966 ad='e11266d0' sqlid='g72kdvcacxvtf' DELETE FROM PLAN_TABLE WHERE STATEMENT_ID=:1 END OF STMT PARSE #18446604434621518272:c=45,e=45,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=2637181423,tim=237421908195 BINDS #18446604434621518272: Bind#0 oacdty=01 mxl=128(42) mxlc=00 mal=00 scl=00 pre=00 oacflg=03 fl2=1000000 frm=01 csi=873 siz=128 off=0 kxsbbbfp=ffff80ffbdc5d8b0 bln=128 avl=14 flg=05 value="PLUS4294967295"
从以上测试结果可以验证,就是用户通过操作系统认证以sys登录数据库,查看执行计划,未关闭会话导致的长事务,如何避免该问题咧?查看完执行计划之后关闭会话即可,这样会话对应的游标会自动提交,有兴趣的读者可以验证此点。
六、处理结果
通过以上处理,长事务己消失,OGG进程可以正常关闭。
GGSCI (XX2) 9> send extract EJ1_FP,showtrans duration 20 MIN Sending showtrans request to EXTRACT EJ1_FP ... No transactions found Oldest redo log files necessary to restart Extract are: Redo Thread 1, Redo Log Sequence Number 47308, SCN 4585.1856782301 (19694281834461), RBA 575467024 Redo Thread 2, Redo Log Sequence Number 70563, SCN 4585.1916741249 (19694341793409), RBA 31066062