Thu Oct 28, 2021 2:21 am
Login Register Lost Password? Contact Us


Multinode Roxie Setup

Topics related to recommendations or questions on the design for HPCC Systems clusters

Tue Dec 08, 2020 3:35 pm Change Time Zone

Hi There,

I am looking for some help configuring a multi Node Roxie, we have been using HPCC for many years now and have always deployed multiple single node Roxies and used our Firewall as the load balancer.

I am now looking at configuring Roxies more efficiently but have a few queries based on the documentation and working through the config manager.

Set-up :

THOR Cluster :

16x Physical Servers configured with multiple Thor Instances and queues :

THOR_L1 – 144 Nodes (12 Physical machines – configured with 12 Slaves Per Node)
THOR_L2 - 144 Nodes (12 Physical machines– configured with 12 Slaves Per Node))

THOR_S1 – 36 Nodes (3 physical Machines– configured with 12 Slaves Per Node))
THOR_S2 – 36 Nodes (3 physical Machines– configured with 12 Slaves Per Node))

Thor Master – one Physical machine for the above.

These Thor Clusters will publish to the new Roxies once configured :

Multi-Node Roxie (plan)

3 x Physical Machines
3 x Physical Machines

Behind a load balancer using Round Robin.

Questions :

Roxie Cluster :

Server – what is the definition of a Roxie Server – my understanding is it will be the server that hosts the relevant services and endpoints for applications to talk to e.g. ECL Watch, WsECL, Dali etc? is this correct?

Slave – what is the definition of a Roxie Slave – my understanding is these are physical computers that host the data, perform lookups etc (similar to a THOR slave) is this correct?

Redundancy – can you explain in more detail what each of these redundancy modes do? Do you have some real world examples where each would be applicable?

• Simple Redundancy - One channel per slave. Most commonly used for a single node Roxie.
• Full Redundancy - More slaves than the number of channels. Multiple slaves host each channel.
• Overloaded Redundancy - There are multiple channels per slave.
• Cyclic Redundancy - Each node hosts multiple channels in rotation. The most commonly used configuration.


After reading through all of the documentation and cannot seem to see where you would configure the following :

For maximum performance, you should configure your cluster so slave nodes perform most jobs in memory.

where in the config manager are these settings?

For maximum performance, you should configure your cluster so slave nodes perform most jobs in memory. For example, if a query uses three data files with a combined file size of 60 GB, a 40-channel cluster is a good size, while a 60-channel is probably better.

– how do I configure this to suit my set-up? Am I right in thinking a “Channel” is a disk on the Physical Server? So in order to get 60 channels I would need 30 machines configured with two hard disks in each? How do are those hard disks then configured for Roxie?

Another consideration is the size of the Thor cluster creating the data files and index files to be loaded. Your target Roxie cluster should be the same size as the Thor on which the data and index files are created or a number evenly divisible by the size of your Roxie cluster. For example, a 100-way Thor to a 20-way Roxie would be acceptable.

- We have a 144 Node Thor, and 6 Physical machines to use as Roxies, the plan here would be to set-up 2 X 3 node Roxies ( 1 Server + 2 Slaves) and put these behind an external load balancer - is the Divisible number the total number of nodes e.g. 3? Or the total number of Slave nodes? E.g 2? - I appreciate this needs to be correct so each node gets an equal distribution of data.

The final consideration is the number of Server processes in a cluster. Each slave must also be a Server, but you can dedicate additional nodes to be only Server processes. This is useful for queries that require processing on the Server after results are returned from slaves. Those Server-intensive queries could be sent only to dedicated Server IP addresses so the load is removed from nodes acting as both Server and slave.

- I have set all three in Config manager as Servers under Roxie Cluster – myroxie – Servers tab. but how do a set a server as a server only? Or a slave only?

node020025
192.168.20.25
mytoposerver myroxie mydali mydfuserver myeclccserver myesp myeclagent myftslave mysasha mydafilesrv myeclscheduler

node020026
192.168.20.26
myroxie myftslave mydafilesrv

node020027
192.168.20.27
myroxie myftslave mydafilesrv

The documentation also goes on to say :

The most typical scenario for HPCC Systems is utilizing it with a high volume of data. This suggested sample sizing would be appropriate for a site with large volumes of data. A good policy is to set the Thor size to 4 times the source data on your HPCC Systems. Typically, Roxie would be about 1/4 the size of Thor. This is because the data is compressed and the system does not hold any transient data in Roxie. Remember that you do not want the number of Roxie nodes to exceed the number of Thor nodes.

- What is the best practice? ¼ size of Thor? Equal to THOR? Or Divisible by the size of the Roxie Cluster?

Roxie keeps most of its data in memory, so you should allocate plenty of memory for Roxie. Calculate the approximate size of your data, and allocate appropriately. You should either increase the number of nodes, or increase the amount of memory.

- How do I allocate memory to Roxie? Is this physically installing more memory or is there a setting? Is HPCC’s definition of a NODE a Physical Server?

Config Manager

Lazy Open - what is the difference between True, False, and Smart?
Local Slave – FALSE – is this similar to the Thor Master not running slaves? If this is set to FALSE will the slaves only run on the Roxies not running the other HPCC services?

CPU Affinity & Cores Per Query - these are set to default, does that mean the system will use all available cores per query? Would you only set this if you had many large queries and wanted to reserve CPU?

Any help or advice would be greatly appreciated.

Thanks

Antony
amillar
 
Posts: 34
Joined: Fri Oct 16, 2015 7:32 am

Wed Dec 09, 2020 7:56 pm Change Time Zone

Antony,

Let me start with the general questions:
Server – what is the definition of a Roxie Server
In general, a multi-node ROXIE has n nodes (physical or virtual) where each node runs both a "Server" process and an "Agent" (slave) process. The Server process handles the queries themselves (each node in the ROXIE can handle all queries published to that ROXIE). The Agent (slave) process handles all the data access for the queries.
Slave – what is the definition of a Roxie Slave
The Agent (slave) process handles all the data access for the queries. In a multi-node ROXIE, the data is distributed across all the nodes in "channels" so that a single piece of data exists only in a single channel. So each node of the ROXIE contains only a portion of all the data, and access to that portion is controlled by the Agent (slave) processes.
Redundancy – can you explain in more detail what each of these redundancy modes do?
Each data channel contains a portion of the data. The data for a single channel is duplicated on two (or more) nodes, and the Agent (slave) processes on those nodes determines which actual node delivers the data for each request (the less busy node usually wins that battle). The most common form of Redundancy is "Cyclic" wherein a single piece of data goes to one channel -- and the channel stores its data on both nodes n and n+1.

So, here's an example, assuming I have a 12-node Thor:
  • I would probably configure a 3-node ROXIE for data delivery (4 to 1 is our most common production ratio between Thor and ROXIE nodes).
  • Publishing a simple query to the ROXIE means that the executable .SO for that query goes to all 3 ROXIE nodes for use by the Server process on each node.
  • The data for that query stays in the same physical parts as created on Thor, so for a 12-node Thor to 3-node ROXIE, each channel receives 4 parts.
  • So node 1 would receive file parts 1, 4, 7, & 10 -- which would go on its "C" drive. And that first channel's data is duplicated on node 2's "D" drive (for redundancy) to finish channel one. The rest of the parts are distributed and duplicated likewise.
  • Any node can handle any query, so your load balancer just needs to round-robin to each ROXIE node.
  • Each query is handled by that node's Server process, which requests the data from whichever channel(s) is appropriate, and the Agent (slave) processes on the channel's nodes negotiate which one will fulfill the data request and send that data back to the requesting Server process.
That is a brief overview of how a multi-node ROXIE operates. I'll let others chime in with responses to your configuration questions.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1601
Joined: Wed Oct 26, 2011 7:40 pm


Return to Clustering

Who is online

Users browsing this forum: No registered users and 1 guest

cron