Review: 8
Well written but a little on the
simplistic side. The rules are useful, but for anyone who
understands what the rules mean, the explanations are probably too
simplistic to help much.
Concepts
- There are several design levels that can be optimized, with
varying degrees of results:
- System structures: The architecture of the software.
How do modules communicate with each other, etc.
- Intramodular structure: choice of algorithms, data
structures, etc. Large speedups can often be had at this level.
- Writing efficient code: Usually gives about 5x to 10x
improvement.
- Assembly: In general, good compilers produce as good
assembly as people do, but specific cases can be improved.
- System calls: Probably not changeable by the programmer.
- Hardware: Go to more specialized/expensive/etc. hardware
if necessary.
- Trading space for time:
- Data structure augmentation: "The time required for
common operations on data can often be reduced by augmenting the
structure with extra information or by changing the information within
the structure so that it can be accessed more easily." (e.g.
reference counters, hints [if hint applies, use quick algorithm with
assumptions otherwise revert to slow algorithm])
- Store precomputed results: take the computation out of
the loop, store it in a table (e.g. fast sin() table, or a character
mapping, cache precomputed results [i.e. when computing Fibonacci(5),
store Fib(0), Fib(1), ... Fib(4)])
- Caching: "Data that is accessed most often should be the
cheapest to access."
- Lazy evaluation: Don't evaluate something until it is
necessary. You could make a fast sin() function that on startup
computes all the possible sine values and stores them. This would
be a waste of time if sin() isn't called very frequently. (In this
case, it would be better to hardcode them and use rule 2)
- Packing: "Dense storage representations can decrease
storage costs by increasing the time required to store and retrieve
data" (e.g. bit vectors, a compressed filesystem, overlaying (e.g.
C unions) etc.)
- Interpreters: "The space required to represent a program
can often be decreased by the use of interpreters in which common
sequences of operations are represented compactly." Bentley gives
the example of someone writing a terminal emulator. Space was at a
premium but time wasn't (people don't interact frequently on a CPU time
scale). He made it very small by building an interpreter for the
emulator (i.e. and interpreter for the interpreter).
- Of course, these are reversible, too: getting rid of those
redundant speed improving fields in rule 1 will shrink up the data
structure.
- Loop rules
- Move code out of the loops:
for (i = 0;
i < n; ++i)
x[i] = x[i] + sqrt(Pi/2); // Compute sqrt(Pi/2) once before the
loop
- Combining tests: "An efficient inner loop should contain
as few tests as possible, and preferably only one. The programmer
should therefore try to simulate some of the exit conditions of the loop
by other exit conditions"
i =
1;
x[n+1] = t;
while (i < n && x[i] != t)
i = 1;
i++;
while (x[i] != t) i++;
if (i <= n)
if (i <= n)
...
...
else
else
...
...
- Loop unrolling: "A large cost of some short loops is in
modifying the loop indices. That cost can often be reduced by
unrolling the loop" Might want to do for i=0 to N/5; { A; A; A; A; A; } instead
of for i=0 to N; { A; }.
- Transfer-driven loop unrolling: "If a large cost of an
inner loop is devoted to trivial assignments, then those assignments
can often be removed by repeating the code and changing the use of
variables. Specifically, to remove the assignment i = j, the subsequent code must
treat j as though it were i.
- Unconditional branch removal: "A fast loop should contain
no unconditional branches. An unconditional branch at the end of
the loop can be removed by "rotating" the loop to have a conditional
branch at the bottom" (this is most applicable in low-level
languages)
- Loop fusion: "If two nearby loops operate on the same set
of elements, then combine their operational parts and use only one set
of loop control operations."
- Exploit algebraic identities
- Short-circuiting monotonic functions: "If we wish to test
whether some [monotonically increasing] function of several variables is
over a certain threshold, then we need not evaluate any of the variables
once the threshold has been reached."
- Reordering tests: "Logical tests should be arranged such
that the inexpensive and often successful tests precede expensive and
rarely successful tests."
- Precompute logical functions: "A logical function over a
small finite domain can be replaced by a lookup table that represents
the functions." (e.g. a character type function)
- Boolean variable elimination:
S1;
if (test)
if (test)
{ S1; S2; }
S2;
else
else
{ S1; S2; }
S3;
- Collapsing procedure hierarchies: "The run times of the
elements of a set of procedures that (nonrecursively) call themselves
can often be reduced by rewriting procedures in line and binding the
passed variables."
- Exploit common cases: "Procedures should be organized to
handle all cases correctly and common cases efficiently"
- Transformations on recursive procedures: tail recursion
can be often be replaced with a loop. "If the procedure contains
only one recursive call on itself, then it is not necessary to store the
return address on the stack ... It is necessary, thought to keep
track of the call depth in some other way." "It is often more
efficient to solve small subproblems by use of an auxiliary procedure,
rather than by recurring down to problems of size zero or one."
- Parallelism: "A program should be structured to exploit
as much of the parallelism as possible in the underlying hardware."
(In modern hardware, making sure that arrays are organized to
efficiently exploit the cache is helpful)
- Compile time initialization (a.ka. constant propogation):
(i.e. C++ global const variables) This tries to let the
compiler replace expressions with a constant.
- Exploit algebraic identities
- Common subexpression elimination: "If the same expression
is evaluated twice with none of its variables latered between
evaluations, then the second evaluation can be avoided by storing the
result of the first and using that in place of the second."
- Pairing computation: "If two similar expressions ar
efrequently evaluated together, then we should make a new procedure that
evaluates them as a pair." He gives an example that sine and
cosine are expensive to compute separately, but that if you are already
calculating one, the other can be calculated for a little extra cost.
- Exploit word parallelism: "Use the full word width of the
underlying computer architecture to evaluate expensive expressions"
(i.e. packing)
- Code simplification: "Most fast programs are simple.
Therefore, keep code simple to make it faster."
- Problem simplification: "To increase the efficiency of a
program, simplify the problem it solves."
- Relentless suspicion: "Question the necessity of each
instruction in a time-critical piece of code and each field in a
space-critical data structure."
- Early binding: "Move work forward in time.
Specifically do work now just once in hope of avoiding doing it
many times later."
Copyright © 2003 by Geoffrey
Prewett