We start with a two node cluster with all components up.
linux2:oracle:tst10g2:/home/oracle>./crsstat.ksh
HA Resource Target State
----------- ------ -----
ora.linux1.ASM1.asm ONLINE ONLINE on linux1
ora.linux1.LISTENER_LINUX1.lsnr ONLINE ONLINE on linux1
ora.linux1.gsd ONLINE ONLINE on linux1
ora.linux1.ons ONLINE ONLINE on linux1
ora.linux1.vip ONLINE ONLINE on linux1
ora.linux2.ASM2.asm ONLINE ONLINE on linux2
ora.linux2.LISTENER_LINUX2.lsnr ONLINE ONLINE on linux2
ora.linux2.gsd ONLINE ONLINE on linux2
ora.linux2.ons ONLINE ONLINE on linux2
ora.linux2.vip ONLINE ONLINE on linux2
ora.tst10g.ReqMan.cs ONLINE ONLINE on linux1
ora.tst10g.ReqMan.tst10g1.srv ONLINE ONLINE on linux1
ora.tst10g.ReqMan.tst10g2.srv ONLINE ONLINE on linux2
ora.tst10g.db ONLINE ONLINE on linux2
ora.tst10g.tst10g1.inst ONLINE ONLINE on linux1
ora.tst10g.tst10g2.inst ONLINE ONLINE on linux2
I then gracefully shutdown all components on the second node, which is what we did on oh1xpwcdb01 on Wednesday afternoon…
linux2:oracle:tst10g2:/home/oracle>srvctl stop instance -d tst10g -i tst10g2
linux2:oracle:tst10g2:/home/oracle>srvctl stop asm -n linux2
linux2:oracle:tst10g2:/home/oracle>srvctl stop nodeapps -n linux2
…and we then see all components are down, including the VIP on linux2.
linux2:oracle:tst10g2:/home/oracle>./crsstat.ksh
HA Resource Target State
----------- ------ -----
ora.linux1.ASM1.asm ONLINE ONLINE on linux1
ora.linux1.LISTENER_LINUX1.lsnr ONLINE ONLINE on linux1
ora.linux1.gsd ONLINE ONLINE on linux1
ora.linux1.ons ONLINE ONLINE on linux1
ora.linux1.vip ONLINE ONLINE on linux1
ora.linux2.ASM2.asm OFFLINE OFFLINE
ora.linux2.LISTENER_LINUX2.lsnr OFFLINE OFFLINE
ora.linux2.gsd OFFLINE OFFLINE
ora.linux2.ons OFFLINE OFFLINE
ora.linux2.vip OFFLINE OFFLINE
ora.tst10g.ReqMan.cs ONLINE ONLINE on linux1
ora.tst10g.ReqMan.tst10g1.srv ONLINE ONLINE on linux1
ora.tst10g.ReqMan.tst10g2.srv ONLINE OFFLINE
ora.tst10g.db ONLINE ONLINE on linux2
ora.tst10g.tst10g1.inst ONLINE ONLINE on linux1
ora.tst10g.tst10g2.inst OFFLINE OFFLINE
linux2:oracle:tst10g2:/home/oracle>
If we were to ping linux2-vip from another computer, it would time out.
linux1:oracle:tst10g1:/home/oracle>ping -c 4 linux2-vip
PING linux2-vip (192.168.1.154) 56(84) bytes of data.
From linux1.home (192.168.1.50): icmp_seq=1 Destination Host Unreachable
From linux1.home (192.168.1.50) icmp_seq=1 Destination Host Unreachable
From linux1.home (192.168.1.50) icmp_seq=2 Destination Host Unreachable
From linux1.home (192.168.1.50) icmp_seq=3 Destination Host Unreachable
--- linux2-vip ping statistics ---
3 packets transmitted, 0 received, +4 errors, 100% packet loss, time 2009ms
, pipe 3
linux1:oracle:tst10g1:/home/oracle>
We then restart all components for the next part of our test, and see everything is up…
linux2:oracle:tst10g2:/home/oracle>srvctl start nodeapps -n linux2
linux2:oracle:tst10g2:/home/oracle>srvctl start asm -n linux2
linux2:oracle:tst10g2:/home/oracle>srvctl start instance -d tst10g -i tst10g2
linux2:oracle:tst10g2:/home/oracle>./crsstat.ksh
HA Resource Target State
----------- ------ -----
ora.linux1.ASM1.asm ONLINE ONLINE on linux1
ora.linux1.LISTENER_LINUX1.lsnr ONLINE ONLINE on linux1
ora.linux1.gsd ONLINE ONLINE on linux1
ora.linux1.ons ONLINE ONLINE on linux1
ora.linux1.vip ONLINE ONLINE on linux1
ora.linux2.ASM2.asm ONLINE ONLINE on linux2
ora.linux2.LISTENER_LINUX2.lsnr ONLINE ONLINE on linux2
ora.linux2.gsd ONLINE ONLINE on linux2
ora.linux2.ons ONLINE ONLINE on linux2
ora.linux2.vip ONLINE ONLINE on linux2
ora.tst10g.ReqMan.cs ONLINE ONLINE on linux1
ora.tst10g.ReqMan.tst10g1.srv ONLINE ONLINE on linux1
ora.tst10g.ReqMan.tst10g2.srv ONLINE ONLINE on linux2
ora.tst10g.db ONLINE ONLINE on linux2
ora.tst10g.tst10g1.inst ONLINE ONLINE on linux1
ora.tst10g.tst10g2.inst ONLINE ONLINE on linux2
linux2:oracle:tst10g2:/home/oracle>
We then simulate a failure by taking down the network interface to which the VIP is assigned on linux2…
linux2:oracle:tst10g2:/home/oracle>sudo /sbin/ifdown eth0
root's password:
eth0 device: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
eth0 configuration: eth-id-00:14:d1:10:09:ba
…and see that the listener on linux2 is down and the linux2 VIP is now running on linux1…
linux1:oracle:tst10g1:/home/oracle>./crsstat.ksh
HA Resource Target State
----------- ------ -----
ora.linux1.ASM1.asm ONLINE ONLINE on linux1
ora.linux1.LISTENER_LINUX1.lsnr ONLINE ONLINE on linux1
ora.linux1.gsd ONLINE ONLINE on linux1
ora.linux1.ons ONLINE ONLINE on linux1
ora.linux1.vip ONLINE ONLINE on linux1
ora.linux2.ASM2.asm ONLINE ONLINE on linux2
ora.linux2.LISTENER_LINUX2.lsnr ONLINE OFFLINE
ora.linux2.gsd ONLINE ONLINE on linux2
ora.linux2.ons ONLINE ONLINE on linux2
ora.linux2.vip ONLINE ONLINE on linux1
ora.tst10g.ReqMan.cs ONLINE ONLINE on linux1
ora.tst10g.ReqMan.tst10g1.srv ONLINE ONLINE on linux1
ora.tst10g.ReqMan.tst10g2.srv ONLINE ONLINE on linux2
ora.tst10g.db ONLINE ONLINE on linux2
ora.tst10g.tst10g1.inst ONLINE ONLINE on linux1
ora.tst10g.tst10g2.inst ONLINE ONLINE on linux2
linux1:oracle:tst10g1:/home/oracle>
Basically:
1. a clean shutdown will not fail a VIP to a surviving node, because no failure has occurred. This is why all nodes are included in a JDBC URL.
2. In a failure, the VIP *is* failed over to a surviving node.