HPCC Systems blog contributors are engineers, data scientists, and fellow community members who want to share knowledge, tips, and other helpful information happening in the HPCC Systems community. Check this blog regularly for insights into how HPCC Systems technology can put big data analytics to work for your own needs.

Lili Xu on 06/15/2022
If you are a PC user and want to run the HPCC Systems Platform, the simplest and most natural environment may be to use a Hyper-V Virtual Machine. Hyper-V is standard on many versions of Windows, and tends to work better than most add-on virtualization systems.  It also has good support for multiple CPUs, and is quite reliable and performant.
Lili Xu on 01/26/2022

Gaussian process regression is a powerful machine learning method to solve non-linear regression problems. However, because of the intensive computation, Gaussian process regression is not suitable for large-scale machine learning problems. Fortunately, researchers developed approximation methods to get a solution arbitrarily close as the original Gaussian process more rapidly and with better scaling.

Lili Xu on 11/14/2019
In this blog, I will introduce another clustering bundle: DBSCAN Bundle, a highly scalable and parallelized implementation of DBSCAN algorithm. DBSCAN is a density-based unsupervised machine learning algorithm to automatically cluster the data into subclasses or groups.
Lili Xu on 03/04/2019
Imagine you are sitting in front of thousands of articles and trying to organize them into different folders. How would you accomplish it and how long would you expect to finish it? Reading all the articles one by one and spending days or even months to finish the task? If you have some sort of data but have no clue how to efficiently cluster them, then this article should be a right place to start.