Configuring I/O For Raw Partitions

Configuring I O For Raw Partitions

General

Raw devices allow Oracle to bypass the OS cache. A raw device can be assigned or bound to block devices such as whole disks or disk partitions. When a raw device is bound to a disk or partition, any reads or writes to the raw device will cause the disk subsystem to perform raw I/Os with the disk. A raw I/O through the /dev/raw interface bypasses the kernel's block buffer cache which is normally utilized for block device reads/writes. By bypassing the cache the physical device is accessed directly which allows applications such as Oracle databases to have more control over the I/O. In fact, Oracle does it's own data caching and raw devices allow Oracle to ensure that data gets written to the disk immediately without OS caching.

Since Automatic Storage Management (ASM) is the recommended option for large amounts of storage in RAC environments, the focus of this article and section is on the usage of raw devices and block devices for ASM. ASM offers many advantages over conventional filesystems. The ASM filesystem is not buffered and supports async I/O. It allows you to group sets of physical disks to logical entities as diskgroups. You can add or remove disks without downtime. In fact, you could move a whole database from one SAN storage to another SAN without downtime. Also, ASM spreads I/O over all the available disks automatically to avoid hot spots. ASM does also it's own striping and offers mirroring. ASM can be setup using the ASM library driver or raw devices. Starting with 10g R2, neither is necessarily required, see next note.

NOTE:

Since raw I/O is now being deprecated by the Linux community and RHEL 4, Oracle 10g R2 no longer requires raw devices for the database. Oracle 10g R2 automatically opens all block devices such as SCSI disks using the O_DIRECT flag, thus bypasses the OS cache. But for older Oracle Database and RHEL versions raw devices are still a recommended option for ASM and datafiles. For more information on using block devices, see Using Block Devices for Oracle 10g Release 2 in RHEL 4. Unfortunately, Oracle Clusterware R2 OUI still requires raw devices or a Cluster File System.

CAUTION:

The name of the devices are assigned by Linux and is determined by the scan order of the bus. Therefore, the device names are not guaranteed to persist across reboots. For example, SCSI device /dev/sdb can change to /dev/sda if the scan order of the controllers is not configured. To force the scan order of the controllers, aliases can be set in /etc/modprobe.conf. For example:

alias scsi_hostadapter1 aic7xxx
alias scsi_hostadapter2 lpfc

These settings will guarantee that the Adaptec adapter for local storage is used first and then the Emulex adapter(s) for SAN storage. Fortunately, RHEL 4 has already addressed this issue by delaying the loading of lpfc (Emulex) and various qla (QLogic) drivers until after all other SCSI devices have been loaded. This means that the alias settings in this example would not be required in RHEL 4. For more information, see Red Hat Enterprise Linux AS 4 Release Notes.

Be also careful when adding/removing devices which can change device names on the system. Starting Oracle with incorrect device names or raw devices can cause damages to the database. For stable device naming in Linux 2.4 and 2.6, see Optimizing Linux I/O.

Basics of Raw Devices

To bind the first raw device /dev/raw/raw1 to the /dev/sdz SCSI disk or LUN you can execute the following command:

  1. raw /dev/raw/raw1 /dev/sdz

Now when you run the dd command on /dev/raw/raw1, it will write directly to /dev/sdz bypassing the OS block buffer cache:
(Warning: the following command will overwrite data on /dev/sdz)

  1. dd if=/dev/zero of=/dev/sdz count=1

To permanently bind /dev/raw/raw1 to /dev/sdz, add an entry to the /etc/sysconfig/rawdevices file:

/dev/raw/raw1 /dev/sdz

Now when you run /etc/init.d/rawdevices it will read the /etc/sysconfig/rawdevices file and execute the raw command for each entry:

/etc/init.d/rawdevices start

To have /etc/init.d/rawdevices run each time the system boot, it can be activated by executing the following command:

chkconfig rawdevices on

Note for each block device you need to use another raw device. To bind the third raw device to the second partition of /dev/sdz, the entry in /etc/sysconfig/rawdevices would look like this:

/dev/raw/raw3 /dev/sdz2

Or to bind the 100th raw device to /dev/sdz, the entry in /etc/sysconfig/rawdevices would look like this:

/dev/raw/raw100 /dev/sdz

Using Raw Devices for Oracle Databases

Many guides and documentations show instructions on using the devices in /dev/raw/ for configuring raw devices for datafiles. I do not recommend to use the raw devices in /dev/raw/ for the following reason: When you configure raw devices for Oracle datafiles, you also have to change ownership and permissions of the devices in /dev/raw/ to allow Oracle to read and write to these raw devices. But all device names in /dev/raw/ are owned by the dev RPM. So when the Linux systems administrator upgrades the dev RPM, which may happen as part of an OS update, then all device names in /dev/raw/ will automatically be recreated. This means that ownership and permissions must be set each time the dev RPM gets upgraded. Therefore I recommend to create all raw devices for Oracle datafiles in an Oracle data directory such as /u02.

For example, to create a new raw device for the system datafile system01.dbf in /u02/orcl/, execute the following command:

  1. mknod /u02/orcl/system01.dbf c 162 1

This command creates a new raw device called /u02/orcl/system01.dbf with minor number 1, which is equivalent to the first raw device /dev/raw/raw1. The major number 162 designates the device as a raw device. A major number always identifies the driver associated with the device.

To grant oracle:dba read and write permissions, execute:

  1. chown oracle.dba /u02/orcl/system01.dbf
  2. chown 660 /u02/orcl/system01.dbf

To bind this new raw device to the first partition of /dev/sdb, add the following line to the /etc/sysconfig/rawdevices file:

/u02/orcl/system01.dbf /dev/sdb1

To activate the raw device, execute:

/etc/init.d/rawdevices start

Here is an example for creating raw devices for ASM:

# mknod /u02/oradata/asmdisks/disk01 c 162 1
# mknod /u02/oradata/asmdisks/disk02 c 162 2
# mknod /u02/oradata/asmdisks/disk03 c 162 3
# mknod /u02/oradata/asmdisks/disk03 c 162 4

# chown oracle.dba /u02/oradata/asmdisks/disk01
# chown oracle.dba /u02/oradata/asmdisks/disk02
# chown oracle.dba /u02/oradata/asmdisks/disk03
# chown oracle.dba /u02/oradata/asmdisks/disk04

# chmod 660 /u02/oradata/asmdisks/disk01
# chmod 660 /u02/oradata/asmdisks/disk02
# chmod 660 /u02/oradata/asmdisks/disk03
# chmod 660 /u02/oradata/asmdisks/disk04

And the /etc/sysconfig/rawdevices file would look something like this if you use EMC PowerPath:
/u02/oradata/asmdisks/disk01 /dev/emcpowera
/u02/oradata/asmdisks/disk02 /dev/emcpowerb
/u02/oradata/asmdisks/disk03 /dev/emcpowerc
/u02/oradata/asmdisks/disk04 /dev/emcpowerd

In this example, 4 raw devices have been created using minor numbers 1 through 4. This means that the devices /dev/raw/raw1../dev/raw/raw4 should not be used by any application on the system. But this should not be an issue since all raw devices should be configured in one place, which is the /etc/sysconfig/rawdevices file. Note that you could also partition the LUNs or disks and configure a raw device for each disk partition.

Using Block Devices for Oracle 10g Release 2 in RHEL 4

For Oracle 10g Release 2 in RHEL 4 it is not recommended to use raw devices but to use block devices instead. Raw I/O is still available in RHEL 4, but it is now a deprecated interface. In fact, raw I/O has been deprecated by the Linux community. It has been replaced by the O_DIRECT flag, which can be used for opening block devices to bypass the OS cache. Unfortunately, Oracle Clusterware R2 OUI has not been updated and still requires raw devices or a Cluster File System. There is also another bug, see bug number 5021707 at http://www.oracle.com/technology/tech/linux/validated-configurations/html/vc_dell6850-rhel4-cx500-1_1.html.

By default, reading and writing to block devices are buffered I/Os. Oracle 10g R2 now automatically opens all block devices such as SCSI disks using the O_DIRECT flag, thus bypassing the OS cache. For example, when you create disk groups for ASM and you want to use the SCSI block devices /dev/sdb and /dev/sdc, you can simply set the Disk Discovery Path to "/dev/sdb, /dev/sdc" to create the ASM disk group. There is no need to create raw devices and to point the Disk Discovery Path to it.

Using the ASM example from Using Raw Devices for Oracle Databases, the Oracle data directory could be setup the following way:

$ ln -s /dev/emcpowera /u02/oradata/asmdisks/disk01
$ ln -s /dev/emcpowerb /u02/oradata/asmdisks/disk02
$ ln -s /dev/emcpowerc /u02/oradata/asmdisks/disk03
$ ln -s /dev/emcpowerd /u02/oradata/asmdisks/disk04

And the following command needs to be executed after each reboot:
# chown oracle.dba /u02/oradata/asmdisks/*

You need to ensure that the ownership of block devices is changed to oracle:dba or oracle:oinstall. Otherwise Oracle can't access the block devices and ASM disk discovery won't list them. You also need to ensure that the ownership of block devices is set after each reboot since Linux changes the ownership of block devices back to "brw-rw—— 1 root disk" at boot time.