System Sizing

One of the most frequently asked, yet most difficult questions to answer is: "How much box do I need for this application?" The answer is that it depends upon a variety of attributes. This topic is very important: acceptable performance is crucial to user acceptance of an application. Note also that the harder the operating system is flexed, the more certain types of instabilities begin to effect the system availability. Below is a list of attributes to consider:

1) The type of workload. Is it of a decision support or data warehousing nature (ie. primarily read-only during the day with nigthly batch updates) or is it an online transaction processing application (ie. primarily insert/update/delete activity during the day with some batch jobs at night)?

2) The number of concurrent users. As the number of concurrent users increases, the required resources (to maintain the same level of performance) increase at a higher non-linear rate due to characteristics such as contention and locking. Depending upon the specific workload, more than twice as much machine may be needed to service twice as many users.

3) The expected rate of transactions. In this case, "transactions" should be taken to mean large volumes of concurrent updates involving small numbers of rows each (OLTP) or smaller numbers of reads involving larger numbers of rows each (DSS). Note that DSS applications will tend to have a much larger variability in the rate of transactions.

4) Users' performance expectations. A one or two second response time for an OLTP system is generally accepted as sufficient. Three or four second response times are likely to generate complaints, and responses five seconds or greater are generally viewed as unacceptable. Variability of response times in sufficiently sized and reasonably well-tuned OLTP systems tends to be reasonably small (plus or minus 50%), and will primarily be a function of the number of concurrent users conducting transactions. DSS systems have much greater variability in response times (plus or minus several hundred percent), and this variability will be a function of the number of rows accessed by any particular query and the number of concurrent users. A read-only DSS-type query may access only 100 rows with one set of criteria, and 1,000,000 rows with a different set of criteria, so even measuring DSS response time can be a difficult task. DSS applications require careful management of the users' performance expectations.

5) Configuration and versions of the hardware, operating system, database system, and application software. Some combinations perform much better than other combinations for specific purposes. In general, later versions of ORACLE on NT tend to be more stable and use resources more efficiently, as do patched versions of various initial releases.

6) Availability requirements. If the database simply must be available 23.5 hours per day, then additional hardware resources will be needed to perform hot backups, full exports, etc., without significantly interrupting users of the application.

7) Budget. How much monetary resources are allocated to not only the purchase of new hardware and software, but also support of the hardware and software once it has been implemented.


Arriving at a reasonable estimate of the required throughput (typically expressed in transactions per minute) should be the end result of collection and analysis of the attributes listed above. Most vendor-supplied benchmarks will be expressed in transactions per minute (or some measurement that can be converted to these units). Once the estimated required transactions per minute has been determined, this number can then be compared to benchmarks of systems with a variety of configurations, as well as compared to other organizations running similar applications on similar platforms.

A revealing source of benchmark information is the Transaction Processing Performance Council's web-site (www.tpc.org). On approximately a monthly basis, TPC publishes an updated spreadsheet containing all valid reported benchmarks for a variety of hardware, O/S, and database configurations. More detailed information on any particular benchmark can be ordered for a fee. This information can typically be used to gain insight into how to size your database server for a given set of characteristics. Note, however, that this information should be understood as "laboratory results". You specific installation may perform within 20% of the posted benchmark, or it may only perform within 80% of the posted benchmark. There will be variability based upon your specific environment, so carry that expectation with your study of this information. This benchmark data should be used to provide insight into your particular environment. Other vendors of large software systems may publish their own application-specific benchmark information for a variety of platforms. Ask your application vendor for this information, or search their website. In fact, it may be beneficial to compare the application vendor-specific benchmark data to the TPC benchmark data for a reasonably similar configuration.

After estimating the machine resource requirements by matching the throughput requirements to published benchmark information, typically a good rule of thumb is to take what you think you need and multiply it by 1.5 or 2. If you are going to err, then it would be much preferable to err on the positive side rather than the negative side. If you purchased too much machine, then your organization will certainly "grow into it" over time. If you purchase too little machine, then you have no choice but to add resources, which requires an interruption in service to a system with which the user community is already displeased. The best suggestion is to take the conservative approach and buy more resources than it appears will be necessary. As stated earlier, an ORACLE instance on NT with sufficient resources will offer less variability in performance as well as running in a much more stable manner than a database instance will entirely insufficient resources.

On a fixed purchasing or upgrade budget, on which resource (CPU, memory, disk) is the money best spent? Whether the ORACLE/NT server in question is an appropriately configured and reasonably well-tuned database server that has the maximum number of CPU's (ie. 4-8 processors), the maximum amount of memory (ie. 2-4 Gb), and the maximum number of disk arrays and array controllers (ie. 8-12 disk arrays on seperate array controller channels), or it is a smaller box with a single CPU, 64-256 Mb of memory, and a small number of disks and controllers, the bottleneck could be any of the hardware resources. In fact, if one hardware resource is alleviated from being the performance bottleneck, a different hardware resource will always assume that role. Determining the source of the bottleneck (and in most cases, how to alleviate the bottleneck) is a reasonably straight-forward process and is covered in detail in following articles in this section. Note that application specific tuning can have a large impact on general level measures, and should not be ignored. Determining when performance is acceptable is a decision each organization will have to make with their own requirements considered. The true bottleneck can only be determined by performing the necessary analysis. In an upgrade situation, this approach will insure that the budget is spent on resources that truly will improve performance. In a new purchase situation, it will typically not be possible to analyze the existing performance, so the numbers must be estimated as accurately as possible, then monitored to determine the accuracy of the approximation. In this way, the accuracy of decisions on future new purchases can be improved.

What if the organization has absolutely no idea what the throughput requirements are or will be? In an upgrade scenario, monitoring of the current situation should provide the necessary information. In a new purchase situation, estimation should still be possible, especially if the new purchase is a replacement for an existing system. The platforms may even be different, but the throughput requirements are likely to be at least the same order of magnitude. Even if transaction rates are very difficult to estimate from other sources, the organization will surely know the approximate number of users. There is a finite number of transactions that it is humanly possible to perform over a fixed period of time. In this situation, the estimated rate of transactions should be calculated as a function of the number of users, including adjustments for batch jobs, hot backups, etc.