How Roxie channels affect data/partition distribution?
Hi,
I am a PhD student, and we are working on adding elasticity to Roxie. I am trying to understand how Roxie manage data. I setup a 4-node Thor cluster (not including the Thor master) and a 4-node Roxie cluster. To understand How Roxie works, I tried different channel mode (slaveConfig in environment.xml), including cyclic, overloaded and simple. I also tried different combinations of the following parameters.
Configurations
Cyclic mode: numDataCopies=[1, 2, 4]
Overload mode: channelsPerNode=[1, 2, 4]
Simple mode: numDataCopies[1, 2, 4]
Roxie Application
Six Degree from the official tutorial
Experiment Steps
1. Setup Roxie against one configuration combination from above
2. Upload data to landing zone
3. Spray data to Thor (observed two replicas per data partition) - see Figure 1
4. Publish the query to Roxie (observed only one replica per index partition, but metadata file is replicated to all Roxie nodes) - see Figure 2
5. Repeat the same experiment with remaining configurations
Questions
1. Does the channel mode (as described in Figure 3) affect how Roxie stores data? Our results do not indicate this relationship.
2. How a Roxie server process picks the slave process to run the Roxie query? It seems partition layout is fixed (as shown in Figure 2). The multicast channel does not help in this case?
3. Are the index files loaded to memory already? Does this mean I look the wrong place in my experiments?
4. Is it possible to know which slave nodes handle a certain query? Is the statistics (taken from the output of a query) used for the server or slave process?
<Statistic c="roxie"
count="1"
creator="[email protected]"
desc="Graph graph1"
kind="TimeElapsed"
s="graph"
scope="graph1"
ts="1457679003666831"
unit="ns"
value="296060166"/>
<Statistic c="roxie"
count="1"
creator="[email protected]"
kind="TimeElapsed"
s="global"
scope="workunit"
ts="1457679003666855"
unit="ns"
value="296060166"/>
<Statistic c="summary"
count="1"
creator="roxie"
desc="Total cluster time"
kind="TimeElapsed"
s="global"
scope="workunit"
ts="1457679003666871"
unit="ns"
value="296060166"/>
I would appreciate any feedback. Sorry for the very long questions.
-chin
I am a PhD student, and we are working on adding elasticity to Roxie. I am trying to understand how Roxie manage data. I setup a 4-node Thor cluster (not including the Thor master) and a 4-node Roxie cluster. To understand How Roxie works, I tried different channel mode (slaveConfig in environment.xml), including cyclic, overloaded and simple. I also tried different combinations of the following parameters.
Configurations
Cyclic mode: numDataCopies=[1, 2, 4]
Overload mode: channelsPerNode=[1, 2, 4]
Simple mode: numDataCopies[1, 2, 4]
Roxie Application
Six Degree from the official tutorial
Experiment Steps
1. Setup Roxie against one configuration combination from above
2. Upload data to landing zone
3. Spray data to Thor (observed two replicas per data partition) - see Figure 1
4. Publish the query to Roxie (observed only one replica per index partition, but metadata file is replicated to all Roxie nodes) - see Figure 2
5. Repeat the same experiment with remaining configurations
Questions
1. Does the channel mode (as described in Figure 3) affect how Roxie stores data? Our results do not indicate this relationship.
2. How a Roxie server process picks the slave process to run the Roxie query? It seems partition layout is fixed (as shown in Figure 2). The multicast channel does not help in this case?
3. Are the index files loaded to memory already? Does this mean I look the wrong place in my experiments?
4. Is it possible to know which slave nodes handle a certain query? Is the statistics (taken from the output of a query) used for the server or slave process?
<Statistic c="roxie"
count="1"
creator="[email protected]"
desc="Graph graph1"
kind="TimeElapsed"
s="graph"
scope="graph1"
ts="1457679003666831"
unit="ns"
value="296060166"/>
<Statistic c="roxie"
count="1"
creator="[email protected]"
kind="TimeElapsed"
s="global"
scope="workunit"
ts="1457679003666855"
unit="ns"
value="296060166"/>
<Statistic c="summary"
count="1"
creator="roxie"
desc="Total cluster time"
kind="TimeElapsed"
s="global"
scope="workunit"
ts="1457679003666871"
unit="ns"
value="296060166"/>
I would appreciate any feedback. Sorry for the very long questions.
-chin
- Attachments
-
s1_index_on_roxie.png
- Figure 2 - How index file is distributed in Roxie
- (57.22 KiB) Downloaded 1015 times
-
s1_data_on_thor.png
- Figure 1 - Two replicas per data partition on Thor
- (58.47 KiB) Downloaded 1015 times
-
Screen Shot 2016-03-11 at 1.24.31 AM.png
- Figure 3 - cyclic mode configured in Roxie
- (56.98 KiB) Downloaded 1015 times
- chsu6
- Posts: 8
- Joined: Fri Feb 19, 2016 9:58 pm
Hi Chin,
Let me answer each question individually.
What you are looking at is the actual INDEX in Figure 2. Each ROXIE node divides the INDEX into four pieces, and also stores a meta-key (that's the 32K piece) on each node. Since your INDEX was a non-payload, or standard index, the DATA part that was copied to ROXIE is shown in Figure 1, and reflects what is described in Figure 3.
A copy of the query is stored on each ROXIE node. The load balancer built in to the ROXIE software decides which Farmer will process the query, and then the Farmer will decide which Agent to use to retrieve the result. If one is currently busy, the replicated channel can sometimes be used.
The way that I understand it, the index files indeed are loaded into RAM at the start of the query, and remain there for its life cycle.
To be honest, I have never needed to dig this deep into the specific nodes, but I would imagine that the Ganglia monitoring tool can give you that information.
Regards,
Bob
Let me answer each question individually.
1. Does the channel mode (as described in Figure 3) affect how Roxie stores data? Our results do not indicate this relationship.
What you are looking at is the actual INDEX in Figure 2. Each ROXIE node divides the INDEX into four pieces, and also stores a meta-key (that's the 32K piece) on each node. Since your INDEX was a non-payload, or standard index, the DATA part that was copied to ROXIE is shown in Figure 1, and reflects what is described in Figure 3.
2. How a Roxie server process picks the slave process to run the Roxie query? It seems partition layout is fixed (as shown in Figure 2). The multicast channel does not help in this case?
A copy of the query is stored on each ROXIE node. The load balancer built in to the ROXIE software decides which Farmer will process the query, and then the Farmer will decide which Agent to use to retrieve the result. If one is currently busy, the replicated channel can sometimes be used.
3. Are the index files loaded to memory already? Does this mean I look the wrong place in my experiments?
The way that I understand it, the index files indeed are loaded into RAM at the start of the query, and remain there for its life cycle.
4. Is it possible to know which slave nodes handle a certain query? Is the statistics (taken from the output of a query) used for the server or slave process?
To be honest, I have never needed to dig this deep into the specific nodes, but I would imagine that the Ganglia monitoring tool can give you that information.
Regards,
Bob
- bforeman
- Community Advisory Board Member
- Posts: 1006
- Joined: Wed Jun 29, 2011 7:13 pm
Hi Bob,
Thank you for quick reply, and detailed description. I made a mistake to upload the wrong Figure 1 and the correct one should be Figure 4 (as attached here). In Figure 1, I manually upload the DATA part to verify data distribution. In our case, DATA is not copied to the Roxie cluster as shown in Figure 4. Does this mean the INDEX has payload?
With the following configuration:
1. 4-node cluster
2. channel mode: cyclic (as in Figure 3 in my previous post)
3. numChannels: 5
4. numDataCopies: 2
INDEX partition and distribution
Questions
Given a particular case in the following, how does Roxie select the slave process to handle a query? For example, when the server process on node 1 (n1) receives a Roxie query, the node n1 forwards this query to other slaves (via multicast channel in default setting). In this case, node 1 can communicate with only node 2 (via channel 1) and node 4 (via channel 4). What if the query involves INDEX partition i3 on node 3?
It is still not clear to me about how it works: the multicast channels and data management in Roxie. I really appreciate your response.
Thank you for quick reply, and detailed description. I made a mistake to upload the wrong Figure 1 and the correct one should be Figure 4 (as attached here). In Figure 1, I manually upload the DATA part to verify data distribution. In our case, DATA is not copied to the Roxie cluster as shown in Figure 4. Does this mean the INDEX has payload?
With the following configuration:
1. 4-node cluster
2. channel mode: cyclic (as in Figure 3 in my previous post)
3. numChannels: 5
4. numDataCopies: 2
INDEX partition and distribution
Questions
Given a particular case in the following, how does Roxie select the slave process to handle a query? For example, when the server process on node 1 (n1) receives a Roxie query, the node n1 forwards this query to other slaves (via multicast channel in default setting). In this case, node 1 can communicate with only node 2 (via channel 1) and node 4 (via channel 4). What if the query involves INDEX partition i3 on node 3?
It is still not clear to me about how it works: the multicast channels and data management in Roxie. I really appreciate your response.
- chsu6
- Posts: 8
- Joined: Fri Feb 19, 2016 9:58 pm
Data partitioning is determined by the thor used to build the data - Roxie uses the partitioning information stored in the top level key to decide which channel(s) to communicate with to retrieve data. Each roxie channel will handle a number of index file parts, and each channel may be implemented by several roxie slave nodes.
Indexes are not loaded into RAM on the slaves (they are too large) but are heavily cached.
You can get some insights into which slaves are retrieving what data using roxie control queries such as control:indexmetrics.
Richard
Indexes are not loaded into RAM on the slaves (they are too large) but are heavily cached.
You can get some insights into which slaves are retrieving what data using roxie control queries such as control:indexmetrics.
Richard
- richardkchapman
- Community Advisory Board Member
- Posts: 110
- Joined: Fri Jun 17, 2011 8:59 am
Thank you for your responses. It makes some things clear. But we are still unclear about a basic concept. Maybe we are asking a much simpler question than you think.
The data presented above was for Roxie cluster cyclic with number_of_data_copies = 2 and a 4-node Thor cluster with default config.
Figure 4 shows the data distribution on a 4-node Thor cluster. It has 2 replicas (and appears to be cyclic).
Table 1 shows the channel assignments (extracted from the logs) for our 4-node Roxie cluster. This is what we expected for cyclic with 2 copies.
Figure 1, which shows the index distribution of our Roxie cluster, has no replicas of the index parts. We didn't expect this because it is contrary to the config.
We do not understand the relationship between index parts and channel assignments. Specifically:
Q1: Are channels related to index parts? We thought that there was some correspondence between channels and index parts.
Q2: Consider n1, which has index part 1 and channels 1 and 4. What queries can the slave process on n1 service?
Really appreciate your time answering our questions.
Thanks,
Chin-Jung
The data presented above was for Roxie cluster cyclic with number_of_data_copies = 2 and a 4-node Thor cluster with default config.
Figure 4 shows the data distribution on a 4-node Thor cluster. It has 2 replicas (and appears to be cyclic).
Table 1 shows the channel assignments (extracted from the logs) for our 4-node Roxie cluster. This is what we expected for cyclic with 2 copies.
Figure 1, which shows the index distribution of our Roxie cluster, has no replicas of the index parts. We didn't expect this because it is contrary to the config.
We do not understand the relationship between index parts and channel assignments. Specifically:
Q1: Are channels related to index parts? We thought that there was some correspondence between channels and index parts.
Q2: Consider n1, which has index part 1 and channels 1 and 4. What queries can the slave process on n1 service?
Really appreciate your time answering our questions.
Thanks,
Chin-Jung
- chsu6
- Posts: 8
- Joined: Fri Feb 19, 2016 9:58 pm
richardkchapman wrote:You can get some insights into which slaves are retrieving what data using roxie control queries such as control:indexmetrics.
BTW, the information obtained from that command seems identical to that from ECL watch.
Thanks,
Chin-Jung
- chsu6
- Posts: 8
- Joined: Fri Feb 19, 2016 9:58 pm
Hi Chin-Jung,
I am wondering if your configuration experiments might have changed the configuration to not create replication on your ROXIE. I just ran a test of my 2-node ROXIE training cluster and for a given index, this is what I see:

Looking at my configuration file on each node, I see:
cyclicOffset="1"
numChannels="2"
numDataCopies="2"
So my cluster seems to be working as documented regarding replication. My server version is the latest 5.4.10-1
Regards,
Bob
I am wondering if your configuration experiments might have changed the configuration to not create replication on your ROXIE. I just ran a test of my 2-node ROXIE training cluster and for a given index, this is what I see:
Looking at my configuration file on each node, I see:
cyclicOffset="1"
numChannels="2"
numDataCopies="2"
So my cluster seems to be working as documented regarding replication. My server version is the latest 5.4.10-1
Regards,
Bob
- Attachments
-
ROXIEIndexParts.jpg
- (97.65 KiB) Downloaded 956 times
- bforeman
- Community Advisory Board Member
- Posts: 1006
- Joined: Wed Jun 29, 2011 7:13 pm
Hi Bob and Richard,
I have the chance to rerun the test and do a fresh install on the our cluster. I carefully follow the instruction and now the replication mechanism works as expected, as shown in Figure A and Figure B. Thanks a lot.
I compare the new configuration (which works) and the old configuration. I found the major difference comes from channelsPerSlave in Thor. I attached the two configurations in another post due to attachment limitation.
Thanks,
Chin-Jung
I have the chance to rerun the test and do a fresh install on the our cluster. I carefully follow the instruction and now the replication mechanism works as expected, as shown in Figure A and Figure B. Thanks a lot.
I compare the new configuration (which works) and the old configuration. I found the major difference comes from channelsPerSlave in Thor. I attached the two configurations in another post due to attachment limitation.
Thanks,
Chin-Jung
- Attachments
-
Screen Shot 2016-03-20 at 3.55.10 PM.png
- Figure A: mode=cyclic, numDataCopies=2
- (119.7 KiB) Downloaded 921 times
-
Screen Shot 2016-03-20 at 3.59.59 PM.png
- Figure 5: mode=cyclic, numDataCopies=5
- (155.02 KiB) Downloaded 921 times
- chsu6
- Posts: 8
- Joined: Fri Feb 19, 2016 9:58 pm
As mentioned before, here attached the two configuration files. The major difference is the channelsPerSlave parameter in Thor. Can this affect the replication behavior in Roxie?
Here is the diff result (diff new.xml old.xml):
2c2
< <!-- Edited with ConfigMgr on ip xx.xx.xx.xx on 2016-03-20T17:48:03 -->
---
> <!-- Edited with ConfigMgr on ip xx.xx.xx.xx on 2016-02-17T13:36:03 -->
218a219,222
> <AuthenticateFeature description="Access to ESDL configuration service"
> path="ESDLConfigAccess"
> resource="ESDLConfigAccess"
> service="ws_esdlconfig"/>
644a649,653
> <AuthenticateFeature authenticate="Yes"
> description="Access to ESDL configuration service"
> path="ESDLConfigAccess"
> resource="ESDLConfigAccess"
> service="ws_esdlconfig"/>
800a810,813
> <AuthenticateFeature description="Access to ESDL configuration service"
> path="ESDLConfigAccess"
> resource="ESDLConfigAccess"
> service="ws_esdlconfig"/>
989c1002
< slaveConfig="cyclic redundancy"
---
> slaveConfig="cyclic"
1105a1119
> channelsPerSlave="1"
1114a1129
> localThorPortInc="200"
1120a1136
> slaveport="20100"
1130,1131c1146
< <Storage/>
< <SwapNode/>
---
> <SwapNode AutoSwapNode="false"/>
Here is the diff result (diff new.xml old.xml):
2c2
< <!-- Edited with ConfigMgr on ip xx.xx.xx.xx on 2016-03-20T17:48:03 -->
---
> <!-- Edited with ConfigMgr on ip xx.xx.xx.xx on 2016-02-17T13:36:03 -->
218a219,222
> <AuthenticateFeature description="Access to ESDL configuration service"
> path="ESDLConfigAccess"
> resource="ESDLConfigAccess"
> service="ws_esdlconfig"/>
644a649,653
> <AuthenticateFeature authenticate="Yes"
> description="Access to ESDL configuration service"
> path="ESDLConfigAccess"
> resource="ESDLConfigAccess"
> service="ws_esdlconfig"/>
800a810,813
> <AuthenticateFeature description="Access to ESDL configuration service"
> path="ESDLConfigAccess"
> resource="ESDLConfigAccess"
> service="ws_esdlconfig"/>
989c1002
< slaveConfig="cyclic redundancy"
---
> slaveConfig="cyclic"
1105a1119
> channelsPerSlave="1"
1114a1129
> localThorPortInc="200"
1120a1136
> slaveport="20100"
1130,1131c1146
< <Storage/>
< <SwapNode/>
---
> <SwapNode AutoSwapNode="false"/>
- Attachments
-
6_hpcc_cyclic_5.txt
- Configuration: new and works
- (47.91 KiB) Downloaded 410 times
-
hpcc2_cyclic.txt
- Configuration: old and did not work
- (48.73 KiB) Downloaded 410 times
- chsu6
- Posts: 8
- Joined: Fri Feb 19, 2016 9:58 pm
A quick update. The parameter channelsPerSlave has nothing to do with the replication mechanism. I find out the following configuration can work. n is an integer greater than 0.
[Cyclic mode]
slaveConfig="cyclic redundancy"
numDataCopies=n
[Full mode]
slaveConfig="full redundancy"
numDataCopies=n
However, the overloaded configuration did not produce replication.
[Overloaded mode]
slaveConfig="overloaded"
channelsPerNode=n
numDataCopies=n
[Cyclic mode]
slaveConfig="cyclic redundancy"
numDataCopies=n
[Full mode]
slaveConfig="full redundancy"
numDataCopies=n
However, the overloaded configuration did not produce replication.
[Overloaded mode]
slaveConfig="overloaded"
channelsPerNode=n
numDataCopies=n
- chsu6
- Posts: 8
- Joined: Fri Feb 19, 2016 9:58 pm
10 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest