This post will provide a simple test case for the efficacy of creating and using an Oracle database cluster using NFS storage. The basic configuration will be shown, as well as the test case scenarios and results. A complete installation of an Oracle RAC environment will not be shown.
From a high level, the installation and configuration requires only the following:
Create NFS server
Create and export directory on NFS server
Create two Linux servers
Create cluster on linux servers using NFS filesystem for storage
Create database on cluster using NFS filesystem for storage
CONFIGURATION
All environments are virtualized, and consist of the following components:
Windows Server 2008 for Active Directory and DNS services
Centos 6.5 (2.6.32-431.el6.x86_64 kernel) for the following guests:
* rac01
* rac02
* nfs01
On each of the Linux guests, there is a separate network interface for the public IP, the private cluster communication IP, and the storage IP. This more closely resembles production and allows us to test the failure of individual components.
Below is the output of our base installation:
rac01:oracle:nfsdb:/home/oracle>./crsstat.sh NAME TARGET STATE SERVER STATE_DETAILS ------------------------- ---------- ---------- ------------ ------------------ ora.LISTENER.lsnr ONLINE ONLINE rac01 ora.LISTENER.lsnr ONLINE ONLINE rac02 ora.asm OFFLINE OFFLINE rac01 Instance Shutdown ora.asm OFFLINE OFFLINE rac02 ora.gsd OFFLINE OFFLINE rac01 ora.gsd OFFLINE OFFLINE rac02 ora.net1.network ONLINE ONLINE rac01 ora.net1.network ONLINE ONLINE rac02 ora.ons ONLINE ONLINE rac01 ora.ons ONLINE ONLINE rac02 ora.LISTENER_SCAN1.lsnr ONLINE ONLINE rac02 ora.cvu ONLINE ONLINE rac02 ora.nfsdb.db ONLINE ONLINE rac01 Open ora.nfsdb.db ONLINE ONLINE rac02 Open ora.oc4j ONLINE ONLINE rac01 ora.rac01.vip ONLINE ONLINE rac01 ora.rac02.vip ONLINE ONLINE rac02 ora.scan1.vip ONLINE ONLINE rac02 rac01:oracle:nfsdb:/home/oracle>
We show where the cluster related files are stored…
rac01:oracle:nfsdb:/u01/oradata/storage>ls -lrt total 23540 -rw-r-----. 1 root dba 272756736 Sep 2 09:55 ocr -rw-r-----. 1 grid dba 21004800 Sep 2 10:00 vdsk
We then show the database file locations…
rac01:oracle:nfsdb:/u01/oradata/db>ls -lrt total 8 drwxr-x---. 5 oracle dba 4096 Sep 2 09:00 NFSDB drwxr-x---. 2 oracle dba 4096 Sep 2 09:53 nfsdb rac01:oracle:nfsdb:/u01/oradata/db>ls -lrt NFSDB/datafile/ total 3022556 -rw-r-----. 1 oracle dba 30416896 Sep 2 09:09 o1_mf_temp_b0chx0oq_.tmp -rw-r-----. 1 oracle dba 545267712 Sep 2 09:10 o1_mf_sysaux_b0chqslw_.dbf -rw-r-----. 1 oracle dba 5251072 Sep 2 09:10 o1_mf_users_b0chqsrt_.dbf -rw-r-----. 1 oracle dba 328343552 Sep 2 09:10 o1_mf_example_b0chxg1g_.dbf -rw-r-----. 1 oracle dba 26222592 Sep 2 09:10 o1_mf_undotbs2_b0cj8z4d_.dbf -rw-r-----. 1 oracle dba 104865792 Sep 2 09:10 o1_mf_undotbs1_b0chqsoo_.dbf -rw-r-----. 1 oracle dba 754982912 Sep 2 09:10 o1_mf_system_b0chqsc3_.dbf -rw-r-----. 1 oracle dba 20979712 Sep 2 09:53 o1_mf_temp_b0clwoqd_.tmp -rw-r-----. 1 oracle dba 5251072 Sep 2 09:54 o1_mf_users_b0clqtm9_.dbf -rw-r-----. 1 oracle dba 26222592 Sep 2 10:00 o1_mf_undotbs2_b0clxo1j_.dbf -rw-r-----. 1 oracle dba 524296192 Sep 2 10:00 o1_mf_sysaux_b0clqtgb_.dbf -rw-r-----. 1 oracle dba 36708352 Sep 2 10:00 o1_mf_undotbs1_b0clqtjh_.dbf -rw-r-----. 1 oracle dba 734011392 Sep 2 10:00 o1_mf_system_b0clqt70_.dbf rac01:oracle:nfsdb:/u01/oradata/db>
…and finally, the state of each instance of the database…
SQL> select host_name,status from gv$instance; HOST_NAME STATUS -------------------- ------------ rac01.howard.local OPEN rac02.howard.local OPEN SQL>
TESTING
Oracle failure algorithm
Before describing our test scenarios, some background on how different failures are handled by the Oracle software would be beneficial.
Oracle handles failures based on a two prong approach.
1) Network communication – This is checked once per second by each node in the cluster to ensure it can successfully reach the others using the private cluster connection
2) Voting disk checks – Once per second, the oracle software issues a pwrite() call to the cluster voting disk with a supplied offset 512 bytes in size. Each node in the cluster has its own 512 byte “slot” in the voting disk file for status
The failure can be handled in an orderly manner if either of the above components and associated checks fail on a given server, or even multiple servers. This is defined as follows:
* Voting disk access failures will be allowed 200 seconds to self-heal, as long as the network heartbeats between nodes in the cluster are successful
* Network heartbeat failures will be allowed 30 seconds to self-heal, as long as the failing node(s) can continue to write their status to the shared voting disk for the other nodes to see
After each time period above, if the respective failure condition is still in place, the failing node will “commit suicide” by taking itself out of the cluster. Oracle 11.2.0 introduced fencing that did not always require a node reboot. The Oracle High Availability service (“OHAS”) will be used to trigger actions in the clusterware that result in nodes removing and adding themselves from and to the active list.
Test harness
Stun various servers by pausing guest in VirtualBox manager
Run java command line program that threads a connection to each instance in the cluster. This will be used to exercise the database as well as print the results of that activity. This is included as APPENDIX A
Specific tests
Single server has a failed connection to the NFS server
When we run our test and fail the storage connection for node 2, notice that no activity occurs for 207 seconds after the previous successful update. This is due to the fact the row is locked by the failed node, and this is not resolved until the node has been evicted. Notice also that this time is comprised of the 200 second timeout provided for any possible resolution of disk access failures, plus a few seconds for the actual eviction.
Tue Sep 02 12:31:15 GMT-05:00 2014 updated table in thread 2 Tue Sep 02 12:31:16 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 12:31:16 GMT-05:00 2014 updated table in thread 2 Tue Sep 02 12:31:17 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 12:31:17 GMT-05:00 2014 updated table in thread 2 Tue Sep 02 12:31:18 GMT-05:00 2014 updated table in thread 1 java.sql.SQLException: No more data to read from socket at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1199) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:308) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:199) at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:542)at testIt.run(testIt.java:32) at java.lang.Thread.run(Thread.java:637) Tue Sep 02 12:34:45 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 12:34:46 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 12:34:47 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 12:34:48 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 12:34:49 GMT-05:00 2014 updated table in thread 1
NFS server pause
In this test, we simulate a high availability event in which the enterprise NFS server stack fails a component of its environment. To simulate this, we pause the NFS server in the VirtualBoxManager UI for about ten seconds.
As we can see, the software was blocked for about 13 seconds, but no cluster reconfiguration occurred.
rac01:oracle:nfsdb1:/home/oracle>java testIt Tue Sep 02 13:05:20 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 13:05:20 GMT-05:00 2014 updated table in thread 2 Tue Sep 02 13:05:21 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 13:05:21 GMT-05:00 2014 updated table in thread 2 Tue Sep 02 13:05:22 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 13:05:22 GMT-05:00 2014 updated table in thread 2 Tue Sep 02 13:05:23 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 13:05:23 GMT-05:00 2014 updated table in thread 2 Tue Sep 02 13:05:24 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 13:05:24 GMT-05:00 2014 updated table in thread 2 Tue Sep 02 13:05:37 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 13:05:37 GMT-05:00 2014 updated table in thread 2 Tue Sep 02 13:05:38 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 13:05:38 GMT-05:00 2014 updated table in thread 2 Tue Sep 02 13:05:39 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 13:05:39 GMT-05:00 2014 updated table in thread 2 Tue Sep 02 13:05:40 GMT-05:00 2014 updated table in thread 1 Tue Sep 02 13:05:40 GMT-05:00 2014 updated table in thread 2
NFS server pause when the database storage is accessed separately from the cluster devices
This is not provided, as it is expected the storage will be accessed over a common network interface. If necessary, it would be trivial to add this to the test list. All that would be required is a separate interface for the database storage, as well as a separately mounted filesystem.
APPENDIX A
import java.sql.*; import java.util.*; public class testIt implements Runnable { String server; Thread t; static Random r; public static void main (String args[]) { r = new Random(); testIt t1 = new testIt("1"); testIt t2 = new testIt("2"); } testIt(String server) { try { t = new Thread(this); this.server = server; t.start(); } catch (Exception e) { e.printStackTrace(); } } public void run () { try { Class.forName("oracle.jdbc.driver.OracleDriver"); Connection conn = DriverManager.getConnection("jdbc:oracle:thin:system/welcome@rac0" + server + ":1521:nfsdb" + server); PreparedStatement pst = conn.prepareStatement("update test set c = ?"); while (true) { pst.setInt(1,r.nextInt()); pst.execute(); System.out.println(new java.util.Date().toString() + '\t' + "updated table in thread " + this.server); Thread.sleep(1000); } } catch (Exception e) { e.printStackTrace(); } } }