Performance Comparison 16 December 1997

Single-processor performance data. General features include
 -- initial rise as matrix fills cache,
 -- subsequent falloff as cache exceeded.

	The code used in all examples is the 
	rate-limiting part (99%) of entire code.
Origin. Note striking instability from run to run. [Single processor claimed maximum = 390 Mflops] [click superscript 1 for pix]¹
Exemplar. Of the many options available in the compiler, only data prefetch does anything of value; amazingly it is not compiler default. The lack of any performance tools makes detailed examination impossible. [Single processor claimed maximum = 720 Mflops] alt="[click superscript 2 for pix]">²
T3E. The design of chip makes claimed maximum rate always a chimera. Note small performance for large matrices; flat performance requires use of SGI preprocessor. [Single processor claimed maximum = 600 Mflops] t3e_large.gif alt="[click superscript 3 for pix]">³

