Fri Aug 07, 2020 6:09 pm

## Matrix Multiply Embedded Python and GPU

Topics related to the set of Machine Learning libraries and Matrix processing algorithms
Does anyone know a better data structure for matrices being passed to embedded python where a GPU will multiply them? By better I mean the conversion to a GPU format will take the least amount of time.

In the following implementation, it takes the embedded python code approximately 21 seconds to convert to GPU format the 2 passed matrices that are 10,000 by 20,000 and 20,000 by 10,000. But, it only takes the GPU approximately 3 1/3 seconds to multiply the 2 matrices once the matrices are converted.

In the following implementation, the embedded python code computes the amount of time is takes to a) convert the 2 matrices to GPU format; b) do matrix multiplication, and c) convert the resultant matrix to a ecl dataset format.
The embedded python code puts the times in a file called TimeOfMatrixMultiply.txt. The directory where the file is stored is /var/lib/HPCCSystms/myeclagent.

(NOTE: All the ecl code I show below was ran on hthor.)

NOTE:I deployed my hpcc system on AWS using a g2.2xlarge instance type, which comes with a GPU. Also in region us-west-2, I used the AMI, ami-638c1e03, which comes with all the software you will need, except hpcc.

Code: Select all
`import python;matrec := RECORD SET OF REAL arow;END;DATASET(matrec) MatrixMultiply(DATASET(matrec) A, unsigned nrowsA, unsigned ncolsA,DATASET(matrec) B, unsigned nrowsB, unsigned ncolsB) := embed(Python)    import numpy as np    import re    import cudamat as cm    import gnumpy as gp    import sys    sys.stdout=open('TimeOfMatrixMultiply.txt','w')        def ECLDataset2NPArray(s,r,c):      zarray=np.empty([r,c],dtype=float)      i=0      for row in s:        zarray[i]=np.asarray(row)        i+=1      return zarray        from timeit import default_timer as timer    start1 = timer()    Aarray=ECLDataset2NPArray(A,nrowsA,ncolsA)    Barray=ECLDataset2NPArray(B,nrowsB,ncolsB)    cm.cublas_init()    cm.CUDAMatrix.init_random()    A_cm = gp.garray(Aarray)    B_cm = gp.garray(Barray)    ecl2cm_time=timer() - start1    print("Time to just convert 2 ecl matrices (%d X %d) and (%d X %d) to cm was %f seconds" % (nrowsA,ncolsA,nrowsB,ncolsB,ecl2cm_time))    time2=timer()    C_cm = A_cm.dot(B_cm)    mm_time=timer() - time2    print("Just GPU matrix multiple of 2 matrices (%d X %d) and (%d X %d)  took %f seconds" % (nrowsA,ncolsA,nrowsB,ncolsB,mm_time))    time3=timer()    cm.cublas_shutdown()    Carray=gp.as_numpy_array(C_cm)    ecl_Carray=Carray.tolist()    cm2ecl_time=timer() - time3    print("Time to just convert 2 cm matrices (%d X %d) and (%d X %d) to ecl was %f seconds" % (nrowsA,ncolsA,nrowsB,ncolsB,cm2ecl_time))    return ecl_Carrayendembed;//The A and B matrices (datasets) were created by "make2MatricesAndStoreOnDisk_b.ecl".A:=DATASET('~hthor::tlh::AMatrix',matrec,THOR);B:=DATASET('~hthor::tlh::BMatrix',matrec,THOR);NRowsA:=COUNT(A);NColsA:=COUNT(A[1].arow);NRowsB:=COUNT(B);NColsB:=COUNT(B[1].arow);MM:=MatrixMultiply(A,NRowsA,NColsA,B,NRowsB,NColsB);OUTPUT(MM,,'tlh::AMatrixAndBMatrixProduct',OVERWRITE)`

The following embedded python will get the timings.
Code: Select all
`import python;timings := RECORD  STRING line;END;DATASET(timings) readTimings() := EMBED(python)  timings = open('/var/lib/HPCCSystems/myeclagent/TimeOfMatrixMultiply.txt').readlines()  return timingsENDEMBED;OUTPUT(readTimings());`

The following ecl code makes the 2 matrices that are multiplied.
Code: Select all
`matrec := RECORD SET OF REAL arow;END;MakeMatrix(UNSIGNED NRows, UNSIGNED NCols) := FUNCTION    REAL RealRandom() := FUNCTION  r:= RANDOM()/4294967295;  return r;  END;  rec0 := RECORD    REAL cell;  END;     rec1 := RECORD    DATASET(rec0) arow;  END;  rec1 genMatrix(rec1 r, UNSIGNED row_num, UNSIGNED ncols) := TRANSFORM   SELF.arow:=NORMALIZE(DATASET([{0}],rec0), ncols, TRANSFORM(rec0,SELF.cell:=RealRandom()));  END;  rec1DS:=NORMALIZE(DATASET([{DATASET([{0}],rec0)}],rec1),NRows,genMatrix(LEFT,COUNTER,NCols));  matrec Dataset2Set(rec1 L) := TRANSFORM    SELF.arow := SET(L.arow,cell);  END;    RETURN PROJECT(rec1DS,Dataset2Set(LEFT));END;NRows1:= 10000;NCols1:= 20000;A:=MakeMatrix(NRows1, NCols1);OUTPUT(A,,'tlh::AMatrix',OVERWRITE);NRows2:= NCols1;NCols2:= NRows1;B:=MakeMatrix(NRows2, NCols2);OUTPUT(B,,'tlh::BMatrix',OVERWRITE);`
tlhumphrey2

Posts: 259
Joined: Mon May 07, 2012 6:23 pm

Tim,

I ran your MakeMatrix() code (changing to 10 rows and 20 columns) and saw that each row ends up with exactly the same data. Is that intentional?

HTH,

Richard
rtaylor
Community Advisory Board Member

Posts: 1549
Joined: Wed Oct 26, 2011 7:40 pm

Richard,

No, numbers in different rows should not be the same. I'm surprised they are.

Could the problem be the NORMALIZE I have in the transform, genMatrix, of a 2nd NORMALIZE? Plus, maybe I'm thinking procedurally instead of declaratively???

Tim
tlhumphrey2

Posts: 259
Joined: Mon May 07, 2012 6:23 pm

Tim,

I'm not entirely sure why your code was getting the same row repeated each time, but here's my version of your code that DOES generate a different row for each record.
Code: Select all
`MakeMatrix(UNSIGNED NRows, UNSIGNED NCols) := FUNCTION  rec0 := {REAL cell};  matrec := {SET OF REAL arow};  REAL RealRandom :=  RANDOM()/4294967295;  RealDS := DATASET(NRows*NCols,TRANSFORM(rec0,SELF.cell := RealRandom));  MatRec DS2Set(INTEGER C) := TRANSFORM    EndRec := C*NCols;     BegRec := (EndRec - NCols) + 1;     SELF.arow := SET(RealDS[BegRec..EndRec],cell);  END;  RETURN DATASET(NRows,DS2Set(COUNTER));END;NRows1:= 10;NCols1:= 20;NRows2:= NCols1;NCols2:= NRows1;A:=MakeMatrix(NRows1, NCols1);OUTPUT(A,,'tlh::AMatrix',OVERWRITE);B:=MakeMatrix(NRows2, NCols2);OUTPUT(B,,'tlh::BMatrix',OVERWRITE);`
Note that I'm just generating all the cells in one pass, then just splitting them into their separate recs.

HTH,

Richard
rtaylor
Community Advisory Board Member

Posts: 1549
Joined: Wed Oct 26, 2011 7:40 pm

Richard,

I like your code. It does everything I wanted to do inside MakeMatrix. And, it is much shorter than my code.

I did notice one odd behavior of the code -- matrices A and B have the same values. Again, I don't understand this behavior. I entered a jira asking why (HPCC-19025).

Tim
tlhumphrey2

Posts: 259
Joined: Mon May 07, 2012 6:23 pm