Fri Aug 17, 2018 5:23 pm
Login Register Lost Password? Contact Us


Matrix Multiply Embedded Python and GPU

Topics related to the set of Machine Learning libraries and Matrix processing algorithms

Fri Jan 26, 2018 3:44 pm Change Time Zone

Does anyone know a better data structure for matrices being passed to embedded python where a GPU will multiply them? By better I mean the conversion to a GPU format will take the least amount of time.

In the following implementation, it takes the embedded python code approximately 21 seconds to convert to GPU format the 2 passed matrices that are 10,000 by 20,000 and 20,000 by 10,000. But, it only takes the GPU approximately 3 1/3 seconds to multiply the 2 matrices once the matrices are converted.

In the following implementation, the embedded python code computes the amount of time is takes to a) convert the 2 matrices to GPU format; b) do matrix multiplication, and c) convert the resultant matrix to a ecl dataset format.
The embedded python code puts the times in a file called TimeOfMatrixMultiply.txt. The directory where the file is stored is /var/lib/HPCCSystms/myeclagent.

(NOTE: All the ecl code I show below was ran on hthor.)

NOTE:I deployed my hpcc system on AWS using a g2.2xlarge instance type, which comes with a GPU. Also in region us-west-2, I used the AMI, ami-638c1e03, which comes with all the software you will need, except hpcc.

Code: Select all
import python;

matrec := RECORD
SET OF REAL arow;
END;

DATASET(matrec) MatrixMultiply(DATASET(matrec) A, unsigned nrowsA, unsigned ncolsA,DATASET(matrec) B, unsigned nrowsB, unsigned ncolsB) := embed(Python)
    import numpy as np
    import re
    import cudamat as cm
    import gnumpy as gp

    import sys
    sys.stdout=open('TimeOfMatrixMultiply.txt','w')
   
    def ECLDataset2NPArray(s,r,c):
      zarray=np.empty([r,c],dtype=float)
      i=0
      for row in s:
        zarray[i]=np.asarray(row)
        i+=1
      return zarray
   
    from timeit import default_timer as timer

    start1 = timer()
    Aarray=ECLDataset2NPArray(A,nrowsA,ncolsA)
    Barray=ECLDataset2NPArray(B,nrowsB,ncolsB)
    cm.cublas_init()
    cm.CUDAMatrix.init_random()
    A_cm = gp.garray(Aarray)
    B_cm = gp.garray(Barray)
    ecl2cm_time=timer() - start1
    print("Time to just convert 2 ecl matrices (%d X %d) and (%d X %d) to cm was %f seconds" % (nrowsA,ncolsA,nrowsB,ncolsB,ecl2cm_time))

    time2=timer()
    C_cm = A_cm.dot(B_cm)
    mm_time=timer() - time2
    print("Just GPU matrix multiple of 2 matrices (%d X %d) and (%d X %d)  took %f seconds" % (nrowsA,ncolsA,nrowsB,ncolsB,mm_time))

    time3=timer()
    cm.cublas_shutdown()
    Carray=gp.as_numpy_array(C_cm)
    ecl_Carray=Carray.tolist()
    cm2ecl_time=timer() - time3
    print("Time to just convert 2 cm matrices (%d X %d) and (%d X %d) to ecl was %f seconds" % (nrowsA,ncolsA,nrowsB,ncolsB,cm2ecl_time))
    return ecl_Carray

endembed;

//The A and B matrices (datasets) were created by "make2MatricesAndStoreOnDisk_b.ecl".
A:=DATASET('~hthor::tlh::AMatrix',matrec,THOR);
B:=DATASET('~hthor::tlh::BMatrix',matrec,THOR);
NRowsA:=COUNT(A);
NColsA:=COUNT(A[1].arow);
NRowsB:=COUNT(B);
NColsB:=COUNT(B[1].arow);

MM:=MatrixMultiply(A,NRowsA,NColsA,B,NRowsB,NColsB);

OUTPUT(MM,,'tlh::AMatrixAndBMatrixProduct',OVERWRITE)

The following embedded python will get the timings.
Code: Select all
import python;

timings := RECORD
  STRING line;
END;

DATASET(timings) readTimings() := EMBED(python)
  timings = open('/var/lib/HPCCSystems/myeclagent/TimeOfMatrixMultiply.txt').readlines()
  return timings
ENDEMBED;

OUTPUT(readTimings());

The following ecl code makes the 2 matrices that are multiplied.
Code: Select all
matrec := RECORD
SET OF REAL arow;
END;

MakeMatrix(UNSIGNED NRows, UNSIGNED NCols) := FUNCTION
 
  REAL RealRandom() := FUNCTION
  r:= RANDOM()/4294967295;
  return r;
  END;

  rec0 := RECORD
    REAL cell;
  END;
   
  rec1 := RECORD
    DATASET(rec0) arow;
  END;

  rec1 genMatrix(rec1 r, UNSIGNED row_num, UNSIGNED ncols) := TRANSFORM
   SELF.arow:=NORMALIZE(DATASET([{0}],rec0), ncols, TRANSFORM(rec0,SELF.cell:=RealRandom()));
  END;

  rec1DS:=NORMALIZE(DATASET([{DATASET([{0}],rec0)}],rec1),NRows,genMatrix(LEFT,COUNTER,NCols));

  matrec Dataset2Set(rec1 L) := TRANSFORM
    SELF.arow := SET(L.arow,cell);
  END;
 
  RETURN PROJECT(rec1DS,Dataset2Set(LEFT));
END;

NRows1:= 10000;
NCols1:= 20000;
A:=MakeMatrix(NRows1, NCols1);
OUTPUT(A,,'tlh::AMatrix',OVERWRITE);

NRows2:= NCols1;
NCols2:= NRows1;
B:=MakeMatrix(NRows2, NCols2);
OUTPUT(B,,'tlh::BMatrix',OVERWRITE);
tlhumphrey2
 
Posts: 250
Joined: Mon May 07, 2012 6:23 pm

Mon Jan 29, 2018 3:22 pm Change Time Zone

Tim,

I ran your MakeMatrix() code (changing to 10 rows and 20 columns) and saw that each row ends up with exactly the same data. Is that intentional?

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1369
Joined: Wed Oct 26, 2011 7:40 pm

Mon Jan 29, 2018 4:21 pm Change Time Zone

Richard,

No, numbers in different rows should not be the same. I'm surprised they are.

Could the problem be the NORMALIZE I have in the transform, genMatrix, of a 2nd NORMALIZE? Plus, maybe I'm thinking procedurally instead of declaratively???

Tim
tlhumphrey2
 
Posts: 250
Joined: Mon May 07, 2012 6:23 pm

Mon Jan 29, 2018 8:56 pm Change Time Zone

Tim,

I'm not entirely sure why your code was getting the same row repeated each time, but here's my version of your code that DOES generate a different row for each record.
Code: Select all
MakeMatrix(UNSIGNED NRows, UNSIGNED NCols) := FUNCTION
  rec0 := {REAL cell};
  matrec := {SET OF REAL arow};
  REAL RealRandom :=  RANDOM()/4294967295;
  RealDS := DATASET(NRows*NCols,TRANSFORM(rec0,SELF.cell := RealRandom));

  MatRec DS2Set(INTEGER C) := TRANSFORM
    EndRec := C*NCols;
    BegRec := (EndRec - NCols) + 1;
    SELF.arow := SET(RealDS[BegRec..EndRec],cell);
  END;
  RETURN DATASET(NRows,DS2Set(COUNTER));
END;

NRows1:= 10;
NCols1:= 20;
NRows2:= NCols1;
NCols2:= NRows1;

A:=MakeMatrix(NRows1, NCols1);
OUTPUT(A,,'tlh::AMatrix',OVERWRITE);

B:=MakeMatrix(NRows2, NCols2);
OUTPUT(B,,'tlh::BMatrix',OVERWRITE);
Note that I'm just generating all the cells in one pass, then just splitting them into their separate recs.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1369
Joined: Wed Oct 26, 2011 7:40 pm

Mon Jan 29, 2018 9:10 pm Change Time Zone

Richard,

I like your code. It does everything I wanted to do inside MakeMatrix. And, it is much shorter than my code.

I did notice one odd behavior of the code -- matrices A and B have the same values. Again, I don't understand this behavior. I entered a jira asking why (HPCC-19025).

Tim
tlhumphrey2
 
Posts: 250
Joined: Mon May 07, 2012 6:23 pm


Return to Machine Learning

Who is online

Users browsing this forum: No registered users and 1 guest