What hardware should I use for my next HPCC?

I get asked frequently about the optimal configuration for a Thor or Roxie system. Besides the pro-forma “it depends” and “how will you use the system?” statements, I think it would be useful to describe what the guiding principles are, when defining the architecture for a given HPCC Systems platform implementation.

When it comes to Thor, one of the first variables that need to be considered, is the size of the input data that will be ingested into the system, over a reasonable time frame. For the majority of the applications, Thor ends up becoming a live archive for all data ever ingested or, at least, until the data becomes useless and/or must be destroyed due to regulatory and/or legal requirements. In addition to the size of the input data, the size of the output data, and any space required for intermediate and temporary files need to be accounted for. A good rule of thumb is, that the total amount of space needed for the system is, usually about 2.5 to 3 times the size of the aggregated input data over time. If, as it’s usually recommended, there is a need for redundancy, the total amount of space needs to be doubled, to account for the mirroring of the data slices in other nodes (although the use of RAID 5 or RAID 6 may reduce or eliminate the need for this extra redundancy). But deciding the amount of space needed in Thor is just the beginning: the next area that needs to be covered is related to processing time and overall system performance.

Thor has been designed as a massive parallel data-intensive workflow processing system and, as such, is relatively tolerant to reasonable I/O latencies. Moreover, the majority of the operations require reading a certain amount of data from the drives (almost always sequential, from beginning to end of the logical file slice residing in a particular node), performing some processing (which could be either entirely contained in memory, or require mostly sequential spills to drives) and eventually writing down the output sequentially (again) to the drives. With this particular behavior in mind, when running a single thor worker per node, it makes perfect sense to use SATA drives (even 7200RPM 3.5′ drives will do) and pick the largest available at the time. If more than a couple workers are running on each node (which could be used, in certain cases, to take advantage of special hardware configurations) the additional concurrency could turn the drive I/O to a more random pattern, which would substantially impair performance of SATA drives and significantly benefit from SAS drives (particularly, 2.5′ 10,000 RPM drives). SSD drives, due to the high cost and limited capacity, and to the fact that their sequential read/write performance is not significantly better than mechanical drives, are not recommended for this environment.

The number of CPU cores in Thor is not one of the most critical aspects of the configuration, with the processing time usually overshadowed by the I/O read/write times, so even a single socket motherboard with a quad or six core CPU should be fine, for the majority of the applications.

The total amount of Random Access Memory across a Thor cluster determines if a particular task (subgraph) can be kept entirely in memory, or if it needs to be spilled to drives. For this reason, having enough memory to handle entire subgraphs in memory can speed up workunits considerably, as memory access is orders of magnitude faster than drive I/O.

Roxie, on the other hand, has a very different performance profile. On the one side, Roxie workflows (queries) usually require a high amount of indirection due to indexed data access, and this has the side effect of introducing a large number of drive head seeks. On the other side, Roxie tends to be quite sensitive to I/O latencies, particularly when the total query time needs to be kept to a minimum. For these reasons, it’s desirable to consider only SAS drives (2.5′ 10,000 RPM drives) or even SSD drives. The total amount of space required in Roxie is usually much smaller than Thor, because primary indices have compressed payload (and data has been subject to some reduction, at this point), and secondary keys are just references to the payloads present on the primary indices, so SSD drives, even with their limited space, could be sufficient.

Roxie is noticeably more demanding, in terms of CPU utilization, and it’s recommended to have as many CPU cores as possible, to minimize latencies (Roxie multithreads and spreads the load across as many cores, as there are in the node).

Memory, in Roxie, is not as critical for overall performance, however certain in-memory operations can benefit the overall query times, in cases where, for example, keeping a small and frequently used dataset in RAM could be desirable.

And last, but not least, the network interconnect for both systems needs to accommodate the bandwidth required to transfer the data efficiently between nodes, to avoid bottlenecks. While Gigabit Ethernet tends to be sufficient for the majority of the systems, high performance nodes could benefit from 10GE or even Infiniband (the latter offering 5 times as much bandwidth as 10GE, with an equivalent cost).

While the intent of this blog post is not to replace an expert analysis and architectural design phase for particular applications, it should be a sufficient guideline to estimate the overall hardware cost of a given HPCC related project, and provide some good rules of thumb and best practices for people implementing these platforms.

Flavio Villanustre