Skip to main content

HPCC Systems blog contributors are engineers, data scientists, and fellow community members who want to share knowledge, tips, and other helpful information happening in the HPCC Systems community. Check this blog regularly for insights into how HPCC Systems technology can put big data analytics to work for your own needs.

Charles Kaminski on 01/20/2016

Prefix Trees can be an important addition to a big-data toolbox. In two prior posts I showed how to combine a prefix tree and an edit-distance algorithm on a big-data platform for a significant performance boost. In this post, I show how to further improve performance by layering on additional pruning strategies. A reasonable expectation here is an additional performance improvement of 10% to 50% based on the real-world data and the pruning strategies you use for your data.

Charles Kaminski on 12/01/2015

In this blog post, I will walk you through using prefix trees and a big-data platform to build fast edit-distance queries. You can use the examples here to begin processing large volumes of data using an edit-distance algorithm or to build queries fast enough to query very large datasets interactively using an edit-distance algorithm. This blog post builds on a previous blog post.

Charles Kaminski on 09/18/2015

In this blog post I will show you how to use a big-data platform to build a fast prefix tree and why such a prefix tree can enable very fast edit-distance queries. In a follow-up post I offer different query examples, performance metrics, and code examples.