Mon Oct 25, 2021 12:52 am
Login Register Lost Password? Contact Us


Having the code that runs on Roxie on Thor

Comments and questions related to the Enterprise Control Language

Tue Aug 10, 2021 8:53 am Change Time Zone

Hello Everyone,

I am working on a project which involves running a ROXIE code / logic on Thor. This is the requirement.
I wish to ask, what things / code rules / principles should be considered while implementing the code that runs on Roxie, on Thor.

Thanks and regards,
Akhilesh Badhri.
akhileshbadhri
 
Posts: 23
Joined: Thu Sep 22, 2016 12:15 pm

Tue Aug 10, 2021 1:12 pm Change Time Zone

Akhilesh,

The general rule is that anything that runs on Thor should also run on ROXIE. But the reverse is not true, because there are several functions that are ROXIE-only. Here's a (possibly incomplete) list of those I'm aware of:
  • PRELOAD()
  • ALLNODES()
  • THISNODE()
  • LOCAL()
  • NOLOCAL()
Compiling code with any of these functions for a Thor cluster should just result in a noop for that part of the code, but that should not be a problem because most of them (except PRELOAD()) were designed to make ROXIE operate more like Thor does natively.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1600
Joined: Wed Oct 26, 2011 7:40 pm

Thu Aug 12, 2021 8:22 am Change Time Zone

Hello Richard,

Thanks for the response. To be specific, I wish to know if there would be any difference of using Joins in Thor and Roxie like -

1. Use of keyed for indexes.
2. Use of distribute for the left / right dataset.
3. Use of keep and atmost.
4. Would sorting a dataset before a distribute or a join would be better ?

Thanks and regards,
Akhilesh Badhri
akhileshbadhri
 
Posts: 23
Joined: Thu Sep 22, 2016 12:15 pm

Thu Aug 12, 2021 4:55 pm Change Time Zone

Akhilesh,
1. Use of keyed for indexes.
ROXIE is designed to use INDEXes, so the use of KEYED in the JOIN condition operates like using KEYED/WILD in an INDEX filter. This functionality is the same on both Thor and ROXIE.
2. Use of distribute for the left / right dataset.
If you're writing ROXIE query code, then it's best if you use INDEXes for both the LEFT and RIGHT "datasets" for your JOINs (usually payload INDEXes), since those will be most efficient. The DISTRIBUTE function is meant to re-distribute DATASET records (not INDEXes) on Thor, so it's not really applicable to most ROXIE code. Therefore, your ROXIE code (using INDEXes) should function the same way on Thor.
3. Use of keep and atmost.
These JOIN options were added specifically to limit the number of "matches" returned from the JOIN. ROXIE queries are targeted for "end-users" so you always want to limit the total number of results returned to avoid overloading them with "too many" results to be "meaningful" to them. Therefore, your ROXIE code (using KEEP and ATMOST) should function the same way on Thor.
4. Would sorting a dataset before a distribute or a join would be better ?
Once again, ROXIE queries should almost always be built using INDEXes, which are already sorted on the search terms of the INDEX, so SORT is unnecessary (and DISTRIBUTE is discussed above).

Remember, Thor is a back office tool and the query runs on all nodes at once (usually using DATASETs) to produce all possible results. But ROXIE is an end-user-facing tool where your code runs only on the one ROXIE Server node that handles the individual query instance and pulls the data for that one query instance (usually in INDEXes) from multiple ROXIE Agent nodes to produce the result for that single query (unless you're using the ALLNODES() function to force ROXIE to operate like Thor).

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1600
Joined: Wed Oct 26, 2011 7:40 pm


Return to ECL

Who is online

Users browsing this forum: No registered users and 1 guest

cron