# Performance Comparison 6 Februrary 1998

Figure ¹ ``` SUBROUTINE matvec8a(n,m,a,x,y) INTEGER i,j,n,m DOUBLE PRECISION x, y, a DIMENSION x(8,m), y(8,n), a(m,n) DO i=1, n DO j=1, m y(1,i)=y(1,i)+a(j,i)*x(1,j) y(2,i)=y(2,i)+a(j,i)*x(2,j) y(3,i)=y(3,i)+a(j,i)*x(3,j) y(4,i)=y(4,i)+a(j,i)*x(4,j) y(5,i)=y(5,i)+a(j,i)*x(5,j) y(6,i)=y(6,i)+a(j,i)*x(6,j) y(7,i)=y(7,i)+a(j,i)*x(7,j) y(8,i)=y(8,i)+a(j,i)*x(8,j) END DO END DO RETURN END ``` ```#include #include #include #include #include extern "C" { int mp_my_threadnum_(); int mp_numthreads_(); #define MATVEC matvec8a_ void MATVEC(int*,int*,double*,double*,double*); } main() { clock_t time1, time2; double tt1=0.0,tt2=0.0; unsigned sec = 1; const int DIM=1440; const int ORD = 8; int dim=DIM; int dim1; int thds = mp_numthreads_(); const int LOOP=int(100.e9/(DIM*DIM)/ORD); cout &/lt;< "\n*******************************************\n"; cout &/lt;< "pragM: DIM = " << DIM << ", LOOP = " << LOOP << ", " &/lt;< "Threads = " << thds << "\n"; cout.flush(); int i,j,k,pnum; int pflags[64], flags_tot; double *out, tot_out=0.0; out=new double[64]; for(i=0;i<64;i++) out[i] = 0; for(i=0;i<64;i++) pflags[i] = 0; double B[DIM], a[DIM][ORD], c[DIM][ORD], A[DIM][DIM]; #pragma parallel shared(tt1,tt2,a,c,A,dim,dim1,pflags) \ local(time1,time2,flags_tot,i,j,k,pnum) { time1=clock(); #pragma pfor for(i=0;i

Your comments and suggestions are appreciated.
[Previous] [Wilkins Home Page]

Edited by: wilkins@mps.ohio-state.edu [February 1998]