Wed Aug 15, 2018 2:53 pm
Login Register Lost Password? Contact Us


Naive Bayes Algorithm - Challenges in Prediction

Topics related to the set of Machine Learning libraries and Matrix processing algorithms

Mon Aug 15, 2016 10:04 pm Change Time Zone

Hi,
I have used the ML module to implement the Naive Bayes algorithm for the Classification problem. I have created the Independent Variables (in my case, Word bag) and Dependent variable (in my case, Classification id). It is similar to Sentiment analysis. I have created a training dataset (around 10K records) with 3 classification id. I have created the model with the training dataset (Independent and Dependent variables). After creating the Model, I have executed against the original dataset (with 400K records which includes the Training dataset as well). I was expecting the Training Dataset information should have been classified correctly. However, when I looked at the results, my training dataset records were not classified correctly. How do I resolve this issue and improve the accuracy of Predicted Classification?

Regards,
Subbu
kps_mani
 
Posts: 24
Joined: Wed Mar 04, 2015 3:42 pm

Tue Aug 23, 2016 2:59 pm Change Time Zone

Before using your own data, I would run the example, ML.Tests.Explanatory.Naive_Bayes.ecl. And, make sure the model produced does a good job classifying the training set (when I ran it, it did).

Then, use the example as a guide to setting up your code to train and classify your own data.
tlhumphrey2
 
Posts: 250
Joined: Mon May 07, 2012 6:23 pm

Tue Aug 23, 2016 3:21 pm Change Time Zone

Hi,
I could not see the ML.Tests.Explanatory.Naive_Bayes.ecl in the ML Beta version of Library. Can you please let me know if you have latest version of ML?

Regards,
Subbu
kps_mani
 
Posts: 24
Joined: Wed Mar 04, 2015 3:42 pm

Tue Aug 23, 2016 3:49 pm Change Time Zone

I shortened the example code, ML.Tests.Explanatory.NaiveBayes.ecl, to the following:

Code: Select all
IMPORT ML;
//NaiveBayes classifier
trainer:= ML.Classify.NaiveBayes();

// Monk Dataset - Discrete dataset 124 instances x 6 attributes + class
MonkData:= ML.Tests.Explanatory.MonkDS.Train_Data;
ML.ToField(MonkData, fullmds, id);
full_mds:=PROJECT(fullmds, TRANSFORM(ML.Types.DiscreteField, SELF:= LEFT));
indepDataD:= full_mds(number>1);
depDataD := full_mds(number=1);
// Learning Phase
D_Model:= trainer.LearnD(indepDataD, depDataD);
dmodel:= trainer.Model(D_model);
// Classification Phase
D_classDist:= trainer.ClassProbDistribD(indepDataD, D_Model); // Class Probalility Distribution
D_results:= trainer.ClassifyD(indepDataD, D_Model);
OUTPUT(D_results);
// Performance Metrics
D_compare:= ML.Classify.Compare(depDataD, D_results);   // Comparing results with original class
OUTPUT(D_compare);


The last line of this code is "OUTPUT(D_compare);". It outputs statistics that show the accuracy of the predicted vs the training set's dependent dataset. You should notice that the accuracy is around 80%.
tlhumphrey2
 
Posts: 250
Joined: Mon May 07, 2012 6:23 pm

Tue Aug 23, 2016 6:52 pm Change Time Zone

The following shows a tree diagram of ecl-ml. Notice where the folder Explanatory is at. Naive_Bayes.ecl is there.
Code: Select all
+---docs
|   \---images
+---Examples
|   \---Sentilyze
|       +---KeywordCount
|       \---NaiveBayes
+---ML
|   +---DMat
|   +---Docs
|   +---LDA
|   +---Mat
|   +---Regression
|   |   +---Dense
|   |   \---Sparse
|   +---StepRegression
|   +---StepwiseLogistic
|   +---SVM
|   |   \---LibSVM
|   |       \---Test
|   \---Tests
|       +---Benchmarks
|       +---Deprecated
|       +---Explanatory
|       \---Validation
+---PBblas
|   +---BLAS
|   +---Block
|   +---LAPACK
|   \---Tests
+---TS
|   \---Demo
\---VL
    \---XSLT
tlhumphrey2
 
Posts: 250
Joined: Mon May 07, 2012 6:23 pm

Tue Aug 23, 2016 8:17 pm Change Time Zone

Hi,
Here is the outcome of the Model with the training data set for the classification id (10, 20, 30)..

classfier c_actual c_modeled cnt
1 10 10 33
1 10 20 566
1 10 30 30
1 20 10 37
1 20 20 1660
1 20 30 84
1 30 10 38
1 30 20 49
1 30 30 2196

classifier c_modeled precision
1 10 30.55555555555556
1 20 72.96703296703296
1 30 95.06493506493507

classifier accuracy
1 82.86810142765822

Why am I not seeing better precision and modeling for Classification Id 10 & 20 whereas I see better precision and modeling for Classification Id 30? Any idea or suggestions?

Regards,
Subbu
kps_mani
 
Posts: 24
Joined: Wed Mar 04, 2015 3:42 pm

Tue Aug 23, 2016 8:34 pm Change Time Zone

I have attached the Dependent and Independent Variables which are used for training dataset for your reference. Can you please look at it and comment on why classification is not working properly for Classification Id 10 & 20?

Regards,
Subbu
Attachments
independenttrainervariable.txt
(197.07 KiB) Downloaded 154 times
dependenttrainervariable.txt
(68.75 KiB) Downloaded 156 times
kps_mani
 
Posts: 24
Joined: Wed Mar 04, 2015 3:42 pm

Tue Aug 23, 2016 8:44 pm Change Time Zone

The 2 files you attached aren't in the format needed for input into the ML classifiers. Attach you ecl code, I will modify it so the dependent and independent data is in the correct format and then attach the changed code to my next post.
tlhumphrey2
 
Posts: 250
Joined: Mon May 07, 2012 6:23 pm

Tue Aug 23, 2016 8:49 pm Change Time Zone

I have downloaded them as CSV file and I believe that you will be still able to load them in the Dataset and use it directly for ML with required transformation. Do you still need ECL code?

Regards,
Subbu
kps_mani
 
Posts: 24
Joined: Wed Mar 04, 2015 3:42 pm

Wed Aug 24, 2016 12:25 pm Change Time Zone

yes
tlhumphrey2
 
Posts: 250
Joined: Mon May 07, 2012 6:23 pm


Return to Machine Learning

Who is online

Users browsing this forum: No registered users and 1 guest