Fri Nov 16, 2018 2:56 pm
Login Register Lost Password? Contact Us


Converting from old ML library to new ML_Core one

Topics related to the set of Machine Learning libraries and Matrix processing algorithms

Sat Sep 08, 2018 2:46 am Change Time Zone

Hello!

I'm trying to convert my ECL code from using old ML library to the new one (ML_Core and such).
I have a very simple code available on github: https://github.com/lpezet/hpcc_vs_sas/tree/master/AerobicFitnessPrediction

Main.ecl uses the old ML library, while MainMLCore.ecl is the conversion to ML_Core.
The former works perfectly. With the latter (ML_Core) I'm getting a "Not a positive definite matrix" error.

Does anyone have any hints on how to convert from old ML to ML_Core (like any catches???)?
Or maybe something about the "Not a positive definite matrix" error?
I do see differences between ML and ML_Core (like the new "work item" field when calling ToField()), but I still can't figure out what's wrong with my code.


Thanks for the help!
lpezet
 
Posts: 59
Joined: Wed Sep 10, 2014 3:14 am

Mon Sep 10, 2018 2:49 pm Change Time Zone

lpezet,

I'm attempting to duplicate the error you are getting. But, in your MainMLCore.ecl of your github repo, you are IMPORTing LinearRegression. But, I don't see it either in ecl-ml or ML_Core.

Tim
tlhumphrey2
 
Posts: 256
Joined: Mon May 07, 2012 6:23 pm

Mon Sep 10, 2018 2:59 pm Change Time Zone

lpezet,

I found LinearRegression.

Tim
tlhumphrey2
 
Posts: 256
Joined: Mon May 07, 2012 6:23 pm

Mon Sep 10, 2018 7:13 pm Change Time Zone

Your problem is caused by this line of code:
Code: Select all
X := oFields( Number IN [ 1, 2, 4, 6, 5, 7 ] );


Basically, the above line of code creates a sparse matrix where all elements of the 3rd column are zero. Why? Because missing elements are considered to be elements with value zero and all the elements of the 3rd column are missing in X.

So, you have created a matrix that is NOT positive definite.
tlhumphrey2
 
Posts: 256
Joined: Mon May 07, 2012 6:23 pm

Mon Sep 10, 2018 11:56 pm Change Time Zone

Hi Tim!

I'm not sure I understand. X and Y are simply the data for my independent and dependent variables respectively. X will just have data for Age (1), Weight (2), RunTime (4), RunPulse (5), RestPulse (6), and MaxPulse (7). And my dependent variable is Oxygen (3).

Is there something different then in that regard between ECL-ML and ML_Core?


Thanks!
lpezet
 
Posts: 59
Joined: Wed Sep 10, 2014 3:14 am

Tue Sep 11, 2018 12:46 pm Change Time Zone

I believe there must be a difference between ML and ML_Core because you didn't have this problem with ML.

In ML_Core, OLS uses PBblas to multiply matrices and since your X basically looks like the following, where all elements of the 3rd column (Oxygen) are missing:
Code: Select all
44, 89.47, , 11.37, 62, 178, 182
40, 75.07, , 10.07, 62, 185, 185
44, 85.84, ,  8.65, 45, 156, 168
42, 68.15, ,  8.17, 40, 166, 172
38, 89.02, ,  9.22, 55, 178, 180
47, 77.45, , 11.63, 58, 176, 176
40, 75.98, , 11.95, 70, 176, 180
43, 81.19, , 10.85, 64, 162, 170
44, 81.42, , 13.08, 63, 174, 176
38, 81.87, ,  8.63, 48, 170, 186
44, 73.03, , 10.13, 45, 168, 168
45, 87.66, , 14.03, 56, 186, 192
45, 66.45, , 11.12, 51, 176, 176
47, 79.15, , 10.60, 47, 162, 164
54, 83.12, , 10.33, 50, 166, 170
49, 81.42, ,  8.95, 44, 180, 185
51, 69.63, , 10.95, 57, 168, 172
51, 77.91, , 10.00, 48, 162, 168
48, 91.63, , 10.25, 48, 162, 164
49, 73.37, , 10.08, 67, 168, 168
57, 73.37, , 12.63, 58, 174, 176
54, 79.38, , 11.17, 62, 156, 165
52, 76.32, ,  9.63, 48, 164, 166
50, 70.87, ,  8.92, 48, 146, 155
51, 67.25, , 11.08, 48, 172, 172
54, 91.63, , 12.88, 44, 168, 172
51, 73.71, , 10.47, 59, 186, 188
57, 59.08, ,  9.93, 49, 148, 155
49, 76.32, ,  9.40, 56, 186, 188
48, 61.24, , 11.50, 52, 170, 176
52, 82.78, , 10.50, 53, 170, 172


PBblas treats missing elements as zeros. So to it, column 3 contains all zeros which makes X NOT positive definite.

One way to get around this problem is to break oRawData into oRawX and oRawY before you use ToField to convert them to a NumericField dataset.

Roger Dev will have to give you more details. He is the expert.
tlhumphrey2
 
Posts: 256
Joined: Mon May 07, 2012 6:23 pm

Tue Sep 11, 2018 2:48 pm Change Time Zone

Oh wow. That was it.
As you said, I basically TABLEd my data to filter the columns I needed between independent and dependent variables into 2 separate datasets before using ML_Core.ToField() and now it works.

Thanks a lot Tim!
lpezet
 
Posts: 59
Joined: Wed Sep 10, 2014 3:14 am


Return to Machine Learning

Who is online

Users browsing this forum: No registered users and 0 guests