## ML - calculate Euclidean distance

I have just started exploring HPCC ML module. I am trying to use KNN for classification of my test dataset (contains 4 feature fields). Please let me know if my approach is right:

- Gayathri

- 1. Read my training and test data files into datasets

- 2. use ML.ToField on the dataset

- 3. Call ML.Cluster.Distances with training and test datasets as parameters - this computes Euclidean distance (this is the default for 3rd param to distances?) for every row in left with every row in right, taking into account all features?

- 4. Call ML.Cluster.Closest for result from previous step - this computes closest neighbour for each row? How do I pass x to this to get x closest neighbours?

- Gayathri

- Gayathri_Jayaraman
**Posts:**75**Joined:**Wed May 08, 2013 5:03 am

Your steps 1 and 2 are correct. But, I believe you want a KNN supervised learning algorithm, i.e. you use your training set with a learning algorithm to learn some kind of model which you then use for classification. And, with KNN the model is actually all the rows of your training set. Then, the classifier compares new rows (of independent variables (or features or X)) with those of your training set. And, the class (or Y or dependent variable) that is assigned to each row will be the closest.

Look at ML.Tests.Explanatory.KNN_KDTree.ecl which is an example using KNN_KDTree (in ML.Lazy.ecl). Also, in the same module is KNN. You use it just like KNN_KDTree.

Look at ML.Tests.Explanatory.KNN_KDTree.ecl which is an example using KNN_KDTree (in ML.Lazy.ecl). Also, in the same module is KNN. You use it just like KNN_KDTree.

- tlhumphrey2
**Posts:**240**Joined:**Mon May 07, 2012 6:23 pm

Yes Tim, I want to implement supervised learning using KNN and I am using Euclidean distance for measurement. I have 2 labelled sets - training set and test set.

I want to use training set to learn and predict for the test set so that I can cross-verify predictions with labels from test set.

This is what I want to do:

for each row of test set

{

To implement this, given a training Matrix X and a test Matrix Y, for each row y in Y, I need to compute sqrt((x1-y1)^2 + (x2-y2)^2...). Will I be able to achieve this using ML.Cluster.Distances?

- Gayathri

I want to use training set to learn and predict for the test set so that I can cross-verify predictions with labels from test set.

This is what I want to do:

for each row of test set

{

- Compute Euc distance (for all features) with every row of training set

- Take k closest distances

- Assign the max label from k neighbours to the current row's label

To implement this, given a training Matrix X and a test Matrix Y, for each row y in Y, I need to compute sqrt((x1-y1)^2 + (x2-y2)^2...). Will I be able to achieve this using ML.Cluster.Distances?

- Gayathri

- Gayathri_Jayaraman
**Posts:**75**Joined:**Wed May 08, 2013 5:03 am

You might be able to use ML.Cluster.Distances, but I have a feeling it will be difficult because that function was setup for only those clustering algorithms is ML.Cluster.

- tlhumphrey2
**Posts:**240**Joined:**Mon May 07, 2012 6:23 pm

- Code: Select all
`REAL euclidean_distance(DATASET(Types.NumericField) a, DATASET(Types.NumericField) b):= FUNCTION`

temp := JOIN(a, b, LEFT.number = RIGHT.number, TRANSFORM(Types.NumericField,

SELF.id := -1;

SELF.number := LEFT.number;

SELF.value := POWER(LEFT.value-RIGHT.value, 2)

));

return (SQRT(SUM(temp, temp.value)));

END;

- vivekaxl
**Posts:**11**Joined:**Wed Oct 15, 2014 3:43 am

5 posts
• Page

**1**of**1**### Who is online

Users browsing this forum: Bing [Bot] and 1 guest