There are several ECL functions that are designed specifically to help optimize queries for execution on Roxie. These include PRELOAD, ALLNODES, THISNODE, LOCAL, and NOLOCAL. Understanding how all these functions work together can make a big difference in the performance of your Roxie queries.
Writing efficient queries for Roxie or Thor can require an understanding of how the different clusters operate. This brings up three questions:
How does the graph execute, on a single node, or on all nodes in parallel?
How are datasets accessed by each node executing the graph, only the parts that are local to the node, or all parts on all nodes?
Does an operation coordinate with the same operation on other nodes, or does each node operate independently?
Here's how queries "normally" execute on each type of cluster:
Thor | Graphs execute on multiple worker nodes in parallel. |
Index/disk reads are done locally by each worker node. | |
All other disk access (FETCH, keyed JOIN, etc.) are effectively accessed across all nodes. | |
Coordination with operations on other nodes is controlled by the presence or absence of the LOCAL option on the operation. | |
No support for child queries (this may change in future releases). | |
hthor | Graphs execute on the single ECL Agent node. |
All parts of the dataset/index are accessed by directly accessing the disk drive of the node with the data--no other interaction with the other nodes. | |
Child queries always execute on same node as parent. | |
Roxie | Graphs execute on a single (Roxie server) node. |
All parts of the dataset/index are accessed by directly accessing the disk drive of the node with the data--no other interaction with the other nodes. | |
Child queries might execute on a single agent node instead of a Roxie server node. |