Thor (The Data Refinery Cluster) is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across several nodes. A cluster can scale from a single node to thousands of nodes.
Thor does the do the heavy lifting of big data: cleaning data, the merging and transforming it, profiling and analyzing it, and preparing it for use by end user queries. This refinery engine handles petabytes of data extremely efficiently. Depending on the data, a cluster of 500 nodes can crunch through a petabyte of data in less than ten seconds