Lars tests


-------------------------
Tests ability to do repeated multiply-add data in register.
-------------------------

Fowler: alpha  (Don't know clock frequency)  EV6 466 MHz
        four simultaneous instruction steps

Loop over  2 independent multiply-add = 230 Mflops
Loop over  4 independent multiply-add = 462 Mflops
Loop over  8 independent multiply-add = 625 Mflops
Loop over 16 independent multiply-add = 351 Mflops

Note: leguin and cherryh have campbell architecture
	with 500 MHz clock speed

Campbell: alpha (Don't know clock frequency)  EV5 250 MHz
        two simultaneous instruction steps
cxx compiler
Loop over  2 independent multiply-add = 60 Mflops
Loop over  4 independent multiply-add = 59 Mflops
Loop over  8 independent multiply-add = 58 Mflops
Loop over 16 independent multiply-add = 53 Mflops

g++ compiler
Loop over  2 independent multiply-add = 124 Mflops
Loop over  4 independent multiply-add = 246 Mflops
Loop over  8 independent multiply-add = 280 Mflops
Loop over 16 independent multiply-add = 253 Mflops

Blacks: Sun Ultra 10 at 300 MHz = 600 MFlops peak
	(supposedly sun has great compiler)

Loop over  2 independent multiply-add = 199 Mflops
Loop over  4 independent multiply-add = 397 Mflops
Loop over  8 independent multiply-add = 348 Mflops
Loop over 16 independent multiply-add = 309 Mflops

Lifu: IBM RS/6000 at 67 MHz = peak 267 Mflops

Loop over  2 independent multiply-add =  89 Mflops
Loop over  4 independent multiply-add = 177 Mflops
Loop over  8 independent multiply-add = 267 Mflops
Loop over 16 independent multiply-add = 267 Mflops

---------------------------------

Beowulf: Pentium II at 400 MHz = 400 Mflops peak.

Loop over  2 independent multiply-add: 107 Mflops
Loop over  4 independent multiply-add: 246 Mflops
Loop over  8 independent multiply-add: 356 Mflops
Loop over 16 independent multiply-add: 332 Mflops

Origin2000: MIPS R12000 at 300 MHz = 600 Mflops peak.

Loop over  2 independent multiply-add: 299 Mflops
Loop over  4 independent multiply-add: 599 Mflops
Loop over  8 independent multiply-add: 599 Mflops
Loop over 16 independent multiply-add: 599 Mflops

T3E: DEC alpha at 300 MHz = 300 Mflops peak.

Loop over  2 independent multiply-add:  86 Mflops
Loop over  4 independent multiply-add: 160 Mflops
Loop over  8 independent multiply-add: 200 Mflops
Loop over 16 independent multiply-add:  74 Mflops

----------------------------------


Your comments and suggestions are appreciated.

To cite this page:
Lars tests
<http://www.physics.ohio-state.edu>
[]
Edited by: wilkins@mps.ohio-state.edu on