Balanced Performance and Fault-tolerance

Assumptions:
- Box runs only a single Oracle instance and typical O/S services.
- This reference configuration provides a reasonable balance between performance and hardware fault-tolerance.
- The physical device most likely to fail is a disk drive. In fact, disk drives are almost guaranteed to fail
at some point in their useable lifetime.
- Actual throughput will vary according to the specific hardware, O/S version and patch level, Oracle version
and patch level, hardware driver versions, type of application, workload variance, logical transaction / physical
transaction ratio, physical transaction / disk I/O ratio, and numerous other characteristics specific to the particular
installation. Based on analysis of current 1999 TPC-C benchmarks and depending on the above factors, the configuration
should provide between approximately 2000 and 10000 tpm (transactions per minute) in production use. However, understand
that your throughput will be entirely dependent on the specifics of your installation and applications.
- This reference configuration can be "scaled-down" by removing resources and "scaled-up"
(to a point) by adding resources. Modifications to the RAID configuration can be done to improve performance while
sacrificing hardware fault-tolerance, to improve performance while keeping hardware fault-tolerance constant but
increasing the hardware cost, etc.
- There is no such thing as a "standard configuration". Every site, application, installation, etc.,
will have it's own unique requirements and workload.
- The three mains factor to consider in hardware configuration are performance requirements, hardware fault-tolerance
requirements, and cost constraints.
Configuration Details:
- Processors: Two 400+ MHz CPU's should be more than sufficient to handle computational load. Max CPU utilitization
consistently above 90% indicates a need for additional processors.
- Memory: One gigabyte should be sufficient to provide an SGA size that does not inhibit instance performance
while providing enough resources to the O/S to prevent excessive paging. With all instance-level measurements within
the target range, an O/S paging rate greater than 5 pages/second typically indicates a need for additional physical
memory.
- Disks: 18 SCSI disks, each individually capable of supporting 60-80 random I/O's per second and 80-100 sequential
I/O's per second when empty. This workload capacity is reduced to approximately 85% of empty performance when the
disk is half full, and 50% of empty performance when the disk is completely full.
- RAID Adapters: RAID is done at the hardware level using SCSI RAID Adapters with battery cache backup. Ideally,
three single channel adapters should be used. Alternatively, one adapter with three seperate channels could be
used since RAID adapters have a low probability of failure, but three seperate adapters provides a higher level
of fault-tolerance. These RAID adapters should support online replacement on failed drives and online expansion
of disk arrays.
- Disk arrays:
| Array |
RAID Level |
Number of disks |
Workload capacity (empty) - writes/second |
Workload capacity (empty) - reads/second |
| A1 |
RAID1 |
2 |
60 |
120 |
| A2 |
RAID5 |
3 |
45 |
120 |
| A3 |
RAID5 |
5 |
75 |
240 |
| A4 |
RAID5 |
5 |
75 |
240 |
| A5 |
RAID0 |
1 |
80 |
80 |
| A6 |
RAID0 |
1 |
80 |
80 |
| A7 |
RAID0 |
1 |
80 |
80 |
Performance and Fault-tolerance Characteristics:
- Operating system, Oracle software, and related configuration files and settings protected by hardware fault-tolerance.
Nothing will function if the O/S and software are unavailable due to device failure.
- O/S paging on hardware fault-tolerant device with performance impact minimized. O/S cannot function with pagefile
on failed device.
- Large number of drives in each RAID5 array provides reasonable overall read and write performance while maintaining
fault-tolerance and cost-effectiveness.
- Hardware and software fault-tolerance for Oracle redo logs maximized with performance impact minimized through
Oracle software mirroring of the redo logs. Critical transaction logs needed for complete recovery are protected
from device failure as well as software write errors. Workload capacity of single drives will typically be sufficient
for sequential redo log writes as long as no other files exist on the drives.
- Oracle datafiles on fault-tolerant devices with reasonable performance characteristics. Writes during checkpoint
activity are done asynchronously, thus helping overcome RAID5 write performance penalties. The striping characteristics
of RAID5 provides reasonably good read performance for datafiles. Allows better avoidance of downtime for recovery.
- Multiple arrays for datafiles provides reasonable flexibility for I/O balancing.
- Oracle archive logs on fault-tolerant device with reasonable performance characteristics. Archive log writes
are done asynchronously and performance degradation can be overcome by adding redo log groups. Database halts due
to failed archive log destination device are avoided.