日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

两个主机mtu不相同_案例详解:MTU不一致导致主机和RAC不断重启

發布時間:2024/10/8 编程问答 32 豆豆
生活随笔 收集整理的這篇文章主要介紹了 两个主机mtu不相同_案例详解:MTU不一致导致主机和RAC不断重启 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章來源:JiekeXu之路 ,作者JiekeXu

內容更多查看:https://www.modb.pro/(復制至瀏覽器,即可查看)

AIX 操作系統因 MTU 不一致導致主機和 RAC 數據庫不斷重啟,事件就是發生在上周日。操作系統工程師因監控發現有一臺主機不斷重啟,排查硬件后無問題,便將事件轉至數據庫工程師排查了。當時主機是一套 SVC 存儲復制的災備環境。
RAC 數據庫是 11.2.0.4 版本的,操作系統是 AIX 7.1 版本,如下圖:

MOS 上說該問題發生在 10.1.0.2 to 11.2.0.2 版本,但我這個庫是 11.2.0.4,故記錄一下排查過程。

Unable To Start ASM RAC Instances Due To ORA-27303: Remote Port MTU Does Not Match Local MTU. (Doc ID 947223.1)

Oracle Database - Enterprise Edition - Version 10.1.0.2 to 11.2.0.2 [Release 10.1 to 11.2]

MTU

MTU 是英文 Maximum Transmission Unit 的縮寫,意為"最大傳輸單位",也就是在連接的時候,所傳輸信息包最多可以有多少字節。我們必須找到不會返回 fragment (碎片)信息的最大 MTU。除了 ADSL PPPoE 的 MTU 是 1492 外其余各種 DSL 的 MTU標準設置都是 1500。MaxMTU 是最大的 TCP/IP 傳輸單元,在 TCP/IP 協議中,將要傳輸的數據分成較小的組進行傳輸,每個組的大小為 576 字節。

AIX 系統錯誤日志存放路徑:/var/adm/ras/errlog 。

errpt –a 列詳細信息,詳細使用方法可以參考 man。

AIX 系統啟動錯誤日志存放路徑:/var/adm/ras/bootlog。

該日志可以跟蹤系統在 Boot 過程中發生的問題,包括服務器液晶板上的代碼信息都有記載。可以使用 alog 命令監視這些問題, 存放在 /var/adm/ras/bootlog 中,可以使用 alog –o –t boot 命令查看該文件。

問題排查

RAC 的節點二是出現問題的機器,主機每隔五六分鐘就會重啟,而節點一數據庫啟動后過幾分鐘也就宕了,這就十分奇怪了,主機二不斷宕機會導致實例一宕機嗎?故讓我很好奇不斷查下去了,以下是節點一的 alert 日志。

Sun Mar 15 19:06:11 2020ALTER SYSTEM SET local_listener=' (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.XX.12)(PORT=1521))' SCOPE=MEMORY SID='test1';ALTER DATABASE MOUNT /* db agent *//* {0:11:33} */This instance was first to mountORA-00210: cannot open the specified control fileORA-00202: control file: '+DATA/test/control02.ctl'ORA-17503: ksfdopn:2 Failed to open file +DATA/test/control02.ctlORA-15001: diskgroup "DATA" does not exist or is not mountedORA-15077: could not locate ASM instance serving a required diskgroupORA-00210: cannot open the specified control fileORA-00202: control file: '+DATA/test/control01.ctl'ORA-17503: ksfdopn:2 Failed to open file +DATA/test/control01.ctlORA-15001: diskgroup "DATA" does not exist or is not mountedORA-15077: could not locate ASM instance serving a required diskgroupORA-205 signalled during: ALTER DATABASE MOUNT /* db agent *//* {0:11:33} */...Sun Mar 15 19:06:12 2020Shutting down instance (abort)

以下是節點二 alert 日志,

Thu Mar 12 23:33:16 2020NOTE: ASMB terminatingErrors in file /app/oracle/diag/rdbms/test/test2/trace/test2_asmb_31457440.trc:ORA-15064: communication failure with ASM instanceORA-03113: end-of-file on communication channelProcess ID: Session ID: 2 Serial number: 1Errors in file /app/oracle/diag/rdbms/test/test2/trace/test2_asmb_31457440.trc:ORA-15064: communication failure with ASM instanceORA-03113: end-of-file on communication channelProcess ID: Session ID: 2 Serial number: 1ASMB (ospid: 31457440): terminating the instance due to error 15064Instance terminated by ASMB, pid = 31457440Sun Mar 15 19:51:34 2020Starting ORACLE instance (normal)LICENSE_MAX_SESSION = 0LICENSE_SESSIONS_WARNING =

這里說一句,怎么查 alert 日志有很多小伙伴們不熟悉數據庫便不知道日志位置,那么怎么查呢,登陸數據庫查看 dump 參數指定的位置就是告警日志咯!

節點 1 ASM 日志,ORA-27303:additional information: Remote port MTU does not match local MTU. [local: 1500, remote: 9000] (169.254.1XX.XX8)。ORA-27301、ORA-27302、ORA-27303 這里看到主要問題了吧。
繼續看一下節點二日志:

GMON started with pid=19, OS id=19857586 Sun Mar 15 18:18:10 2020MMON started with pid=20, OS id=19398838 Sun Mar 15 18:18:10 2020MMNL started with pid=21, OS id=24838354 lmon registered with NM - instance number 1 (internal mem no 0)Reconfiguration started (old inc 0, new inc 10)ASM instance List of instances: 1 2 (myinst: 1) Global Resource Directory frozen* allocate domain 0, invalid = TRUE KSXP IPC protocol is incompatible with instance 2Errors in file /app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_24707194.trc:ORA-27300: OS system dependent operation:config_check failed with status: 0ORA-27301: OS failure message: Error 0ORA-27302: failure occurred at: skgxpvalpidORA-27303: additional information: Remote port MTU does not match local MTU. [local: 1500, remote: 9000] (169.254.1XX.XX8)KSXP IPC protocol is incompatible with instance 2Errors in file /app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_24707194.trc:ORA-27300: OS system dependent operation:config_check failed with status: 0ORA-27301: OS failure message: Error 0ORA-27302: failure occurred at: skgxpvalpidORA-27303: additional information: Remote port MTU does not match local MTU. [local: 1500, remote: 9000] (169.254.1XX.XX8) Communication channels reestablished

節點 2 ASM 日志,ASM 日志存放于 ASM 實例之下,RAC 除了這兩種日志外還有集群日志以及集群下的各種進程日志等多種,這個有時間在說。ASM 日志位于 $ORACLE_BASE/diag/asm/+asm/+ASM2/trace 下。

cd?$ORACLE_BASE/diag/asm/+asm/+ASM2/tracetail -999f /app/grid/diag/asm/+asm/+ASM2/trace/alert*Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Real Application Clusters and Automatic Storage Management options.ORACLE_HOME = /app/product/11.2.0/gridSystem name: AIXNode name: test2Release: 1Version: 7Machine: 00C0A6D84C00Using parameter settings in server-side spfile +OCR/test-cluster/asmparameterfile/registry.253.1004572351System parameters with non-default values: large_pool_size = 12M instance_type = "asm" remote_login_passwordfile= "EXCLUSIVE" asm_diskgroups = "OCR" asm_diskgroups = "ARCH" asm_diskgroups = "DATA" asm_power_limit = 1 diagnostic_dest = "/app/grid"Cluster communication is configured to use the following interface(s) for this instance 169.254.1XX.XX8cluster interconnect IPC version:Oracle UDP/IP (generic)IPC Vendor 1 proto 2Sun Mar 15 19:00:30 2020PMON started with pid=2, OS id=22216894 Sun Mar 15 19:00:30 2020PSP0 started with pid=3, OS id=30539926 Sun Mar 15 19:00:31 2020VKTM started with pid=4, OS id=28573718 at elevated priorityVKTM running at (10)millisec precision with DBRM quantum (100)msSun Mar 15 19:00:31 2020GEN0 started with pid=5, OS id=22282482 Sun Mar 15 19:00:31 2020DIAG started with pid=6, OS id=29360284 Sun Mar 15 19:00:31 2020PING started with pid=7, OS id=21889258 Sun Mar 15 19:00:31 2020DIA0 started with pid=8, OS id=21823656 Sun Mar 15 19:00:31 2020LMON started with pid=9, OS id=28180628 Sun Mar 15 19:00:31 2020LMD0 started with pid=10, OS id=29819036 * System load used for high load check * New Low - High Load Threshold Range = [55296 - 73728] Sun Mar 15 19:00:31 2020LMS0 started with pid=11, OS id=30998728 at elevated prioritySun Mar 15 19:00:31 2020LMHB started with pid=12, OS id=27328766 Sun Mar 15 19:00:31 2020MMAN started with pid=13, OS id=22741166 Sun Mar 15 19:00:31 2020DBW0 started with pid=14, OS id=21495954 Sun Mar 15 19:00:31 2020LGWR started with pid=15, OS id=21168264 Sun Mar 15 19:00:31 2020CKPT started with pid=16, OS id=21954804 Sun Mar 15 19:00:31 2020SMON started with pid=17, OS id=30081278 Sun Mar 15 19:00:31 2020RBAL started with pid=18, OS id=27787438 Sun Mar 15 19:00:31 2020GMON started with pid=19, OS id=29425830 Sun Mar 15 19:00:31 2020MMON started with pid=20, OS id=27721860 Sun Mar 15 19:00:31 2020MMNL started with pid=21, OS id=30146734 lmon registered with NM - instance number 2 (internal mem no 1)Reconfiguration started (old inc 0, new inc 4)ASM instance List of instances: 1 2 (myinst: 2) Global Resource Directory frozen* allocate domain 0, invalid = TRUE KSXP IPC protocol is incompatible with instance 1Errors in file /app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmon_28180628.trc:ORA-27300: OS system dependent operation:config_check failed with status: 0ORA-27301: OS failure message: Error 0ORA-27302: failure occurred at: skgxpvalpidORA-27303: additional information: Remote port MTU does not match local MTU. [local: 9000, remote: 1500] (169.254.1XX.XX)KSXP IPC protocol is incompatible with instance 1Errors in file /app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_lmon_28180628.trc:ORA-27300: OS system dependent operation:config_check failed with status: 0ORA-27301: OS failure message: Error 0ORA-27302: failure occurred at: skgxpvalpidORA-27303: additional information: Remote port MTU does not match local MTU. [local: 9000, remote: 1500] (169.254.1XX.XX) Communication channels reestablished

根據這些便可發現和 MOS (Doc ID 947223.1)所說一致了,MTU 不一致導致主機重啟了。

當查看主機二的 MTU 時便驚奇的發現居然是 9000 ,而節點一均是 1500.這個就需要修改了,MTU 參數修改需要重啟才會永久生效。所以需要把 en2 和 en3 改成 1500 才行。 其他類型的系統查看如下:

On AIX:#?lsattr?-El?Example:[celcaix4]/usr/sbin> lsattr -El en5alias4 IPv4 Alias including Subnet Mask Truealias6 IPv6 Alias including Prefix Length Truearp on Address Resolution Protocol (ARP) Trueauthority Authorized Users Truebroadcast Broadcast Address Truemtu 1500 Maximum IP Packet Size for This Device True mtu 8232 index 1inet 127.xxx.xxx.1 netmask ff000000eri0: flags=1000843 mtu 1500 index

解決問題

使用如下命令臨時設置生效,重啟方可永久生效。

ifconfig eth2 mtu 1500

ifconfig eth3 mtu 1500

當修改完之后,重啟數據庫監控了半小時,便再也沒有發生過主機宕機數據庫宕機了。至于為何不一致了由于已是晚上當時就沒有深究了。

與50位技術專家面對面20年技術見證,附贈技術全景圖

總結

以上是生活随笔為你收集整理的两个主机mtu不相同_案例详解:MTU不一致导致主机和RAC不断重启的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。