Lars tests
-------------------------
Tests ability to do repeated multiply-add data in register.
-------------------------
Fowler: alpha (Don't know clock frequency) EV6 466 MHz
four simultaneous instruction steps
Loop over 2 independent multiply-add = 230 Mflops
Loop over 4 independent multiply-add = 462 Mflops
Loop over 8 independent multiply-add = 625 Mflops
Loop over 16 independent multiply-add = 351 Mflops
Note: leguin and cherryh have campbell architecture
with 500 MHz clock speed
Campbell: alpha (Don't know clock frequency) EV5 250 MHz
two simultaneous instruction steps
cxx compiler
Loop over 2 independent multiply-add = 60 Mflops
Loop over 4 independent multiply-add = 59 Mflops
Loop over 8 independent multiply-add = 58 Mflops
Loop over 16 independent multiply-add = 53 Mflops
g++ compiler
Loop over 2 independent multiply-add = 124 Mflops
Loop over 4 independent multiply-add = 246 Mflops
Loop over 8 independent multiply-add = 280 Mflops
Loop over 16 independent multiply-add = 253 Mflops
Blacks: Sun Ultra 10 at 300 MHz = 600 MFlops peak
(supposedly sun has great compiler)
Loop over 2 independent multiply-add = 199 Mflops
Loop over 4 independent multiply-add = 397 Mflops
Loop over 8 independent multiply-add = 348 Mflops
Loop over 16 independent multiply-add = 309 Mflops
Lifu: IBM RS/6000 at 67 MHz = peak 267 Mflops
Loop over 2 independent multiply-add = 89 Mflops
Loop over 4 independent multiply-add = 177 Mflops
Loop over 8 independent multiply-add = 267 Mflops
Loop over 16 independent multiply-add = 267 Mflops
---------------------------------
Beowulf: Pentium II at 400 MHz = 400 Mflops peak.
Loop over 2 independent multiply-add: 107 Mflops
Loop over 4 independent multiply-add: 246 Mflops
Loop over 8 independent multiply-add: 356 Mflops
Loop over 16 independent multiply-add: 332 Mflops
Origin2000: MIPS R12000 at 300 MHz = 600 Mflops peak.
Loop over 2 independent multiply-add: 299 Mflops
Loop over 4 independent multiply-add: 599 Mflops
Loop over 8 independent multiply-add: 599 Mflops
Loop over 16 independent multiply-add: 599 Mflops
T3E: DEC alpha at 300 MHz = 300 Mflops peak.
Loop over 2 independent multiply-add: 86 Mflops
Loop over 4 independent multiply-add: 160 Mflops
Loop over 8 independent multiply-add: 200 Mflops
Loop over 16 independent multiply-add: 74 Mflops
----------------------------------
Your comments and
suggestions are appreciated.
To cite this page:
Lars tests
<http://www.physics.ohio-state.edu/~wilkins/computing/benchmark/sing_proc_perf.html>
[Thursday, 04-Dec-2008 17:26:31 EST]
Edited by: wilkins@mps.ohio-state.edu on
Friday, 10-Dec-1999 14:34:12 EST