大数跨境
0
0

【问题诊断】Oracle 11g ORA-7445 [opiaba] 和 ORA-600 [17147]错误案例分享

【问题诊断】Oracle 11g ORA-7445 [opiaba] 和 ORA-600 [17147]错误案例分享 数据库运维之道
2024-03-17
2
导读:数据库实例出现异常重启,经排查是PL/SQL语句块中嵌入多个UPDATE语句,SQL语句中绑定变量数量超过了65535个限制触发BUG问题。
本期将为大家分享“Oracle 11g ORA-7445 [opiaba] 和 ORA-600 [17147]错误”的解决方案。
关键词:ORA-7445[opiaba] 、ORA-600 [17147]、65535 binds、ORA-01006: bind variable

数据库版本:11.2.0.4.181016,操作系统版本:redhat linux 7.4,两节点的RAC集群。其中一个数据库实例出现异常重启,经排查是PL/SQL语句块中嵌入多个UPDATE语句,SQL语句中绑定变量数量超过了65535个限制触发BUG问题。

1、收到数据库告警信息后,第一时间登录数据库服务器,然后检查数据库运行状态。
集群环境:srvctl status database -d ywzd2 -vInstance ywzd21 is running on node ywzd-db03. Instance status: Open.Instance ywzd22 is running on node ywzd-db04. Instance status: Open.
单机环境:sqlplus / as sysdbaSQL> select startup_time,status from v$instance;STARTUP_TIME STATUS----------------------- ------------06-DEC-2023 12:29:59 OPEN

    2、检查数据库故障期间的alert日志,可以看到实例遇到ORA-07445[opiaba()+639]和ORA-00600[17147]内部错误,接着核心PMON 进程崩溃,导致了 DB 实例重启。

第一段日志出现ORA-07445[opiaba()+639]错误Wed Dec 06 12:29:21 2023Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x185806F, opiaba()+639] [flags: 0x0, count: 1]Errors in file /u01/app/oracle/diag/rdbms/ywzd2/ywzd21/trace/ywzd21_ora_538783.trc  (incident=418676):ORA-07445: exception encountered: core dump [opiaba()+639] [SIGSEGV] [ADDR:0x0] [PC:0x185806F] [SI_KERNEL(general_protection)] []Incident details in: /u01/app/oracle/diag/rdbms/ywzd2/ywzd21/incident/incdir_418676/ywzd21_ora_538783_i418676.trc第二段日志出现ORA-00600[17147]错误Wed Dec 06 12:29:47 2023Errors in file /u01/app/oracle/diag/rdbms/ywzd2/ywzd21/trace/ywzd21_pmon_512337.trc  (incident=411292):ORA-00600: internal error code, arguments: [17147], [0x14DAE09578], [], [], [], [], [], [], [], [], [], []Incident details in: /u01/app/oracle/diag/rdbms/ywzd2/ywzd21/incident/incdir_411292/ywzd21_pmon_512337_i411292.trcUse ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.Wed Dec 06 12:29:48 2023Dumping diagnostic data in directory=[cdmp_20231206122948], requested by (instance=1, osid=512337 (PMON)), summary=[incident=411292].Errors in file /u01/app/oracle/diag/rdbms/ywzd2/ywzd21/trace/ywzd21_pmon_512337.trc:ORA-00600: internal error code, arguments: [17147], [0x14DAE09578], [], [], [], [], [], [], [], [], [], []PMON (ospid: 512337): terminating the instance due to error 472System state dump requested by (instance=1, osid=512337 (PMON)), summary=[abnormal instance termination].System State dumped to trace file /u01/app/oracle/diag/rdbms/ywzd2/ywzd21/trace/ywzd21_diag_512357_20231206122952.trcInstance terminated by PMON, pid = 512337第三段日志集群实例自动重启Wed Dec 06 12:29:59 2023Starting ORACLE instance (normal)
批量检查命令:grep -ib1E 'opiaba|\[17147|terminating the instance due to error 472' /u01/app/oracle/diag/rdbms/ywzd2/ywzd21/trace/alert_ywzd21.log

    3、进一步检查用户进程的跟踪文件,该进程正在执行一条非常长的、使用了超过 10 万个绑定变量的SQL语句。该PLSQL语句块包含大量的UPDATE语句,并且绑定变量个数超过65535个,触发ora-7445 [opiaba]错误。

========= Dump for incident 328879 (ORA 600 [17147]) ========----- Beginning of Customized Incident Dump(s) -----********** Internal heap ERROR 17147 addr=0x12b0accc78 *********...
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)----- Current SQL Statement for this session (sql_id=btng01wnhnvnw) -----begin
update 业务表1set 字段1 = :1 ,字段2 = :2 ,字段3 = :3 ,字段4 = :4 ,字段5 = :5 ,字段6 = :6 ,字段7 = '1'where 字段8 = :7and 字段9 = :8and 字段10 = :9 ;
....(sql 太长,部分省略)...
update 业务表Nset 字段1 = :100381 ,字段2 = :100382 ,字段3 = :100383where 字段4 = :100384and 字段5 = :100385and 字段6 = :100386 ;end;

    4、进一步检查PMON进程的跟踪文件,堆栈信息显示数据库遇到“[03]: kgherror [KGH]”错误。

dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)----- SQL Statement (None) -----Current SQL information unavailable - no cursor....
----- Incident Context Dump -----Address: 0x7fff22dbad78Incident ID: 327263Problem Key: ORA 600 [17147]Error: ORA-600 [17147] [0x12B0ACCC78] [] [] [] [] [] [] [] [] [] [][00]: dbgexExplicitEndInc [diag_dde][01]: dbgeEndDDEInvocationImpl [diag_dde][02]: dbgeEndDDEInvocation [diag_dde][03]: kgherror [KGH]<-- Signaling[04]: kghfrmrg [KGH][05]: kghfre [KGH][06]: kghsfx [KGH][07]: kghunfhp [KGH][08]: kghfrh_internal [KGH][09]: kghfrh [KGH][10]: kksFreeHeap [cursor][11]: kksLockDelete [cursor][12]: kksLockOperation [cursor][13]: kgllkdl [LIBCACHE][14]: kgllkds [LIBCACHE][15]: kssxdl [VOS][16]: kssdel [VOS][17]: kssdch [VOS][18]: ksuxds [ksu][19]: kssxdl [VOS][20]: kssdel [VOS][21]: kssdch [VOS][22]: ksudlp [ksu][23]: kssxdl [VOS][24]: kssdel [VOS][25]: ksuxdl [ksu][26]: ksuxda_del_proc [ksu][27]: ksucln_dpc_cleanup [ksu][28]: ksucln_dpc_dfs [ksu][29]: ksucln_dpc_main [ksu][30]: ksucln_dpc [ksu][31]: ksucln [ksu][32]: ksbrdp [background_proc][33]: opirip []

    5、对这种批量更新的大量使用绑定变量的长SQL/PLSQL,DB 系统在设计时未预料到此种情形,从而触发Oracle  BUG导致数据库重启。结合官方文档1466343.1找到两种解决方式,分别为应用补丁12578873或修改应用程序减少绑定变量。

1、打补丁可以避免实例崩溃,跟业务部门沟通补丁修复窗口。
2、业务部门优化SQL语句,减少绑定变量个数,将同一条SQL/PLSQL的绑定变量个数控制在65535以内。
3、排查数据库是否还存在相似的问题,通过SQL审查数据库是否存在绑定变量大于1000个的情况。
select sql_id, count(*) bind_count  from v$sql_bind_capture t where t.CHILD_NUMBER = 0 group by t.SQL_IDhaving count(*) > 1000 order by count(*);
4、堆栈调用函数学习
-------------堆栈调用函数-------------------kgdsdst   skdstdst                operating system dependent kernel dump a stack traceksedst1                 kernel service (VOS) error debug dump the call stack stack trace 1 helperksedst                  kernel service (VOS) error debug dump the call stackdbkedDefDump            debug and diagnostics RDBMS diagnostic data extractor Dumper do DDE default dumpksedmp                  kernel service (VOS) error debug dump process statessexhd                  operating system dependent system dump routine__sighandler        opiabaopiprs                  oracle program interface parsekksParseChildCursor     kernel compile shared objects (cursor) parse child cursorrpiswu2                 recursive program interface switch user in recursive sqlkksLoadChild            kernel compile shared objects (cursor) load childkxsGetRuntimeLock       kernel execution shared cursor get runtime lockkksfbc                  kernel compile shared objects (cursor) find bound cursorkkspsc0                 kernel compile shared objects (cursor) parse shared cursorkksParseCursor          kernel compile shared objects (cursor) parse cursoropiosq0                 oracle program interface prepare to parse a sql command 0kpooprx                 kernel programmatic interface oracle open, parse, and optionally executekpoal8                  kernel programmatic interface oracle V8 bundled executionopiodr                  oracle program interface oracle code request driver, route the current requestttcpip                  two task common pipe read/writeopitsk                  oracle program interface two task function dispatcheropiino                  oracle program interface initialize opiopiodr opidrv                  oracle program interface route current request driver, entry side into two task interfacesou2o                   main oracle executable entry pointopimai_real             oracle program interface main real oracle start pointssthrdmain              operating system dependent system main for every thread in a threaded oraclemain                    general c starting function

对于一个提交的SQL语句,存在两种可选的解析过程, 一种叫做硬解析,一种叫做软解析。 一个硬解析需要经解析、制定执行路径、优化访问计划等许多的步骤。硬解释不仅仅耗费大量的cpu,更重要的是会占据重要的闩(latch)资源,严重的影响系统的规模的扩大(即限制了系统的并发行)。当一个sql语句提交后,首先检查一下共享缓冲池(shared pool)里有没有与之完全相同的语句,如果有的话只须执行软解析即可,否则就得进行硬解析。 而唯一使得oracle 能够重复利用执行计划的方法就是采用绑定变量。绑定变量的实质就是用于替代sql语句中的常量的替代变量。绑定变量能够使得每次提交的sql语句都完全一样。 
数据库每15分钟会捕获一次绑定变量,但不会捕获每次SQL执行的变量值。同时当SQL语句硬解析时,SQL对应的绑定变量会被捕获。通过视图v$sql_bind_capture查询捕获绑定变量。
1、创建测试表create table test01 as select * from dba_tables;2、修改共享游标参数值为SIMILARalter session set CURSOR_SHARING=SIMILAR; 3、第一次执行语句1select table_name from test01 where table_name='OBJ$';4、查看语句1的SQL_ID和执行次数set linesize 1000col sql_text for a80select sql_id,EXECUTIONS,LOADS,sql_text from v$sql where sql_text like 'select table_name from test01 where table_name%';SQL_ID          EXECUTIONS      LOADS SQL_TEXT------------- ---------- ---------- ----------------------------------------------------------------------------------------------------9dhppyxtr9qck           1      1 select table_name from test01 where table_name=:"SYS_B_0"5、通过SQL_ID获取语句1的绑定变量值col value_string for a30select child_address,to_char(LAST_CAPTURED,'dd-mm-yyyy hh24:mi:ss'),VALUE_STRING from v$sql_bind_capture where sql_id='9dhppyxtr9qck';CHILD_ADDRESS     TO_CHAR(LAST_CAPTUR VALUE_STRING---------------- ------------------- ------------------------------0000000068BD8220 17-03-2024 20:32:52 OBJ$6、查看参数_cursor_bind_capture_interval的值为900秒,默认15分钟set linesize 1000col name for a50col value for a30select nam.ksppinm NAME,val.KSPPSTVL VALUE from x$ksppi nam, x$ksppsv val where nam.indx = val.indx and nam.ksppinm like '%_cursor_bind_capture_interval%' order by 1;7、修改动态参数为10秒捕获一次alter system set "_cursor_bind_capture_interval"=10;8、再次执行语句1,参数值不一样,传入参数须为4个字节,避免硬解析select table_name from test01 where table_name='TAB$';9、再次查询绑定变值,已从COL$变为TAB$CHILD_ADDRESS     TO_CHAR(LAST_CAPTUR VALUE_STRING---------------- ------------------- ------------------------------0000000068BD8220 17-03-2024 20:35:56 TAB$

  • Bug 12578873 - ORA-7445 [opiaba] when using ore than 65535 bind variables (文档 ID 12578873.8)

  • Instance crashed after ORA-7445 [opiaba] and ORA-600 [17147] (Doc ID 1466343.1)

  • ORA-7445 [opiaba] (Doc ID 737378.1)

  • Not Every Bind Values Captured in v$sql_bind_capture (Doc ID 1370816.1)


以上就是本期关于“Oracle 11g ORA-7445 [opiaba] 和 ORA-600 [17147]错误”的案例分享。希望能给大家带来帮助!
欢迎关注“数据库运维之道”公众号,一起学习数据库技术!
欢迎扫码进“数据库运维之道2群”,此群用于数据库技术交流,禁止发广告!

可以加我的微信,交个朋友或讨论数据库解决方案,请备注”姓名单位“,谢谢!

【声明】内容源于网络
0
0
数据库运维之道
数据库领域原创技术号,专注于Oracle、MySQL、TDSQL、HotDB、TiDB、达梦等数据库研究,深入数据库技术原理,分布式数据库,开源数据库,国产数据库,前沿数据库技术。
内容 22
粉丝 0
数据库运维之道 数据库领域原创技术号,专注于Oracle、MySQL、TDSQL、HotDB、TiDB、达梦等数据库研究,深入数据库技术原理,分布式数据库,开源数据库,国产数据库,前沿数据库技术。
总阅读0
粉丝0
内容22