Like most database systems, Pick machines often spend a good deal of their time performing Selects, Sselects, and Sorts. Regardless of whether your selects occur during the day or in your overnight batch processing, getting the best performance on these jobs will improve both the actual select time and all other jobs that run at the same time. How then do you build the most efficient Access sentences?
The first requirement is to understand how the typical Access processor works. There seems to be considerable misinformation floating around regarding the various parts of the Sselect process. The most obvious efficiency issues have already been dealt with in the Access run time. For example, in multiple selection clauses, Access will stop evaluating selections as soon as one is sufficient to include or exclude a record from the dynaset. And contrary to popular misconceptions, no matter where they are placed in the source sentence, the work of evaluating the sort by clauses never happens until the record is actually selected. This does not mean that we are powerless to help improve the efficiency of the process.
Contrary to widespread belief, the first significant rule is to never break a Sort or Select into two pieces unless both the following conditions are true. One, the first Select comes from a saved list or a key-file or index file, and two, the first piece of the select is guaranteed to yield a very small percentage of the actual file. While this term is relative, you can assume you need to have substantially less than one active record per group which means you want well under five percent of the active file selected. Several customers were convinced that breaking selects up will speed up performance. Actual tests prove what system knowledge should tell us -- retrieving records twice is more expensive than doing it once. It will always take longer -- from a few milliseconds to many, many hours depending on the sizes of the files and memory hit ratios.
The next important issue is to order the selection clauses in the most efficient possible order. In the extreme, it is possible to change the performance of a Select by more than a hundred to one range. Here's how to do it.
All Access selection clauses are a subset of the following form.
SELECT FILE (WITH COND1 AND WITH COND2) OR (WITH COND3 AND WITH COND4)
Access will process the Or'ed clauses in the order they are presented
in the input sentence. This obviously means that if you know which of the
Or'ed clauses is the most likely to succeed for the least energy, it should
be put first in the sentence to reduce the extra work required to evaluate
clauses less likely to be evaluated as true. Simple attributes without
any correla-tives require the least energy to evaluate, followed by more
complex multi-valued attributes and those using A or F correlatives. The
largest amounts of energy go into correlatives containing Translate functions.
The sub-clauses that have And's, are a bigger problem. By the same logic, it's desirable to have the least likely to succeed And clause first. Access will stop evaluating as soon as any part of the sub-clause can't be met. However, there is a slight hitch to making this happen. In an attempt to pass through the record only once, Access will automatically reorder the And clauses by sorting on the dictionary AMC (attribute 2). I'll repeat that - Access will ignore the order you place the And's in the input sentence, and will order them based on attribute two of the dictionary items.
The secret of a least cost ordering system then is to adjust the dictionary definitions of the most expensive type of correlative to have higher AMC's. This is actually easier than it sounds. The most expensive correlatives are A, F, and Translates. The secret here is that for F and A correlatives, the AMC on line two is not used by the correlative. Its only use is this reordering effect in Select clauses. If we were to arbitrarily assign a high number (for example, AMC 98) to simple F and A's, they would be sorted after most other selection attributes.
Translates are usually even more expensive than simple A or F's. This is both because of the actual CPU time to do the record reads and because the Translate generally converts the selection process from a simple sequential pass of the selected file to a random read of the translate file. On the Sequoia, this converts the process from about 40 to 50 frames per second to perhaps 8 to 10 records per second. If the average frame holds 10 to 20 records, the penalty for a translate is a slowdown ranging from a low of perhaps 40 to 1, to possibly over 125 to 1. Now the tricky part -- you can insert an A or F correlative in front of a Translate (as many dictionary attributes already have). This allows you to use the AMC on line 2 to control the select sequencing problem by setting it to an even higher value (perhaps AMC 99). If you avoid 3 out of 4 translates with this trick, it will speed up your Sselect by a factor of four. This can be an utterly amazing improvement for an extremely low time investment and a nearly zero risk change!
Another potential problem is when the actual With clauses are constructed poorly. The usual example involves selecting by date ranges. If the selection is done like this - WITH ORDER.DATE > "01/01/93" AND WITH ORDER.DATE < "12/31/93" then 'order.date' needs to be evaluated twice. Imagine the cost if the attribute is translated to another file! If the clause is constructed like this - WITH ORDER.DATE > "01/01/93" AND < "12/31/93" the attribute is only evaluated once (and I feel the sentence is more readable).
Lastly, take the time to review and organize your critical dictionaries. Over time, dictionaries often accumulate redundant and unnecessary processing codes because dictionary builders tend to play it safe or don't fully understand Access and how the dictionaries work. A few hours spent here can have far reaching effects on the efficiency of your whole machine.