Data synchronization and querying
Hello,
As per my understanding, the ECL queries submitted to a Roxie cluster can be :
1. Executed on a remote Thor cluster which has all the BigData(in TB/PB/ZP)
2. Executed on the same Roxie cluster itself, first by referring the remote data till it is getting copied on to Roxie and then, locally
There are a few queries I have here :
1. Assuming 1. is happening, the query processing is taking several seconds,probably minutes, given the large data. Now, while these queries are in progress, some new data is sprayed onto this Thor cluster. Now, will the running query consider this new data set or will it continue on the 'older' data set and give results accordingly
?
2. Assuming 2. is happening,
i. Again, the query is identical to 1.
ii. Suppose the query processing is complete at t1 and there is already some new data added to the Thor cluster before t1. Now, how and when does this new data come to Roxie(synchronization)? Again, at t2, if the same/similar query comes in, will it be run on the 'latest' data set? In simple words, how is the data between Thor and Roxie 'synchronized' ?
Thanks and regards !
As per my understanding, the ECL queries submitted to a Roxie cluster can be :
1. Executed on a remote Thor cluster which has all the BigData(in TB/PB/ZP)
2. Executed on the same Roxie cluster itself, first by referring the remote data till it is getting copied on to Roxie and then, locally
There are a few queries I have here :
1. Assuming 1. is happening, the query processing is taking several seconds,probably minutes, given the large data. Now, while these queries are in progress, some new data is sprayed onto this Thor cluster. Now, will the running query consider this new data set or will it continue on the 'older' data set and give results accordingly

2. Assuming 2. is happening,
i. Again, the query is identical to 1.
ii. Suppose the query processing is complete at t1 and there is already some new data added to the Thor cluster before t1. Now, how and when does this new data come to Roxie(synchronization)? Again, at t2, if the same/similar query comes in, will it be run on the 'latest' data set? In simple words, how is the data between Thor and Roxie 'synchronized' ?
Thanks and regards !
- kaliyugantagonist
- Posts: 43
- Joined: Mon Jul 23, 2012 11:23 am
Sorry, but that is not correct.As per my understanding, the ECL queries submitted to a Roxie cluster can be :
1. Executed on a remote Thor cluster which has all the BigData(in TB/PB/ZP)
Queries sent to a Roxie are executed on that Roxie -- they may either:
- access data locally on the Roxie (the "normal" way things are done in a production environment)
- or remotely access the data on a Thor cluster (usually done from a 1-node Roxie used just for query development/testing)
Yes. That scenario is possible. You can have Roxie configured to access data remotely while the data is in the process of being copied from Thor to Roxie.2. Executed on the same Roxie cluster itself, first by referring the remote data till it is getting copied on to Roxie and then, locally
This question presumes that HPCC operates like an RDBMS and can do OLTP -- this is not the case. HPCC is a batch-processing type of environment. Data files read in a job are never written to, therefore there is no "update" functionality. There are techniques that can be used to make an HPCC environment closely emulate an OLTP system, but accomplishing that requires a fairly complex design and implementation.2. Assuming 2. is happening,
i. Again, the query is identical to 1.
ii. Suppose the query processing is complete at t1 and there is already some new data added to the Thor cluster before t1. Now, how and when does this new data come to Roxie(synchronization)? Again, at t2, if the same/similar query comes in, will it be run on the 'latest' data set? In simple words, how is the data between Thor and Roxie 'synchronized' ?
Thor and Roxie serve very different purposes:
- Thor does one job at a time and is used to prepare massive amounts of data for delivery to customers.
- Roxie delivers final result data to each query as it comes in, using the data that has been pre-built, pre-linked, pre-whatevered by Thor so that Roxie can deliver the individual goods as quickly as possible, handling literally thousands of separate query results per second.
- The only "normal" direct interaction between Thor and Roxie comes when a query is published to Roxie and Roxie copies the necessary data over from Thor.
Richard
- rtaylor
- Community Advisory Board Member
- Posts: 1619
- Joined: Wed Oct 26, 2011 7:40 pm
2 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest