Single-processor performance data. General features include
-- initial rise as matrix fills cache,
-- subsequent falloff as cache exceeded.
The code used in all examples is the
rate-limiting part (99%) of entire code.
|Origin. Note striking instability from run to run.
[Single processor claimed maximum = 390 Mflops]
|Exemplar. Of the many options available in the
compiler, only data
prefetch does anything of value; amazingly it is not compiler default.
The lack of any performance tools makes detailed examination impossible.
[Single processor claimed maximum = 720 Mflops]
alt="[click superscript 2 for pix]">²
|T3E. The design of chip makes claimed maximum rate always
a chimera. Note small performance for large matrices; flat performance
requires use of SGI preprocessor.
[Single processor claimed maximum = 600 Mflops]
alt="[click superscript 3 for pix]">³