Tue Oct 26, 2021 12:54 pm
Login Register Lost Password? Contact Us

Gravitational Clustering

This forum is for topics related to the Google Summer of Code (GSoC) projects and the HPCC Systems Intern program.

Thu Mar 26, 2015 5:50 pm Change Time Zone

Posts: 2
Joined: Tue Sep 17, 2013 12:27 pm

Tue Apr 14, 2015 6:41 pm Change Time Zone

hi jamie :P
My name is rishab and i have applied for gsoc 2015 for the implementation of g-clustering algorithm.In the comment section you told me to think over certain points.I have pondered over them.you told that we need to do some pathological testing for the random sampling of the point and calculation of force between the current and the randomly selected point.I think the algorithm can go wrong but if we iterate over many a times with variation of the force parameters then we can find an optimum value.But we should be careful not to tinker with values to extent its supervised behaviour is affected.so we will need to create a dummy dataset to get range of parameters such that the algorithm works.
please reply if I am right.
Posts: 2
Joined: Tue Apr 07, 2015 4:39 pm

Thu Apr 16, 2015 11:31 am Change Time Zone

Hi Rishab,

Iterating the entire procedure over G-deltaG space is a possible solution and would be regarded as a Monte-Carlo approach. In fact such computational 'runs' could be done in parallel. You could then look for boundaries of cluster convergence/divergence in this G-deltaG space, which would be a really interesting analysis of the algorithm itself - almost a must. However, using a Monte Carlo method as an actual implementation detail may not be desirable for this particular algorithm. I guess an incredible low resolution implementation of this phase space may be doable/useful. Another solution could be to compute a rough estimate for the fractal dimension of the data-space which would give you an estimate on the clustering/sparseness of the sample. This could then be used in a trivial calculation to determine adequate values for the actual algorithm's parameters.

None of the above methods would be considered 'tinkering'. 'Tinkering' refers to the need to cater such parameters on a sample by sample basis and 'by-hand' i.e. have the actual user test for a good set of parameters based on their particular dataset.

Kind regards,
Posts: 2
Joined: Tue Sep 17, 2013 12:27 pm

Return to Student Programs

Who is online

Users browsing this forum: No registered users and 1 guest