Sat Aug 18, 2018 9:44 pm
Login Register Lost Password? Contact Us


Setup HPCC on AWS that is fast.

Questions or comments related to Cloud Computing and the HPCC Systems Instant Cloud for AWS

Thu Feb 19, 2015 9:00 pm Change Time Zone

In the github repository, https://github.com/tlhumphrey2/BestHPCCoAWS, is software that will configure and deploy an HPCC System that runs fast.

In this repository is also a document, SetupBestHPCCoAWS.pdf, that describes in great detail, how to use the software of this repository to setup an HPCC System on AWS that runs fast.
tlhumphrey2
 
Posts: 250
Joined: Mon May 07, 2012 6:23 pm

Fri Feb 20, 2015 8:03 pm Change Time Zone

Did you perform any benchmarks to define 'fast'? Such as http://sortbenchmark.org/

So that image has 16 cpu, did you only have 1 thor slave per node?

Thanks for the pdf, very nice to have such a thorough walk through.
Lee_Meadows
 
Posts: 16
Joined: Mon Jul 21, 2014 1:43 pm

Sun Mar 01, 2015 7:17 pm Change Time Zone

I did run the terasort described here:
http://www.ordinal.com/gensort.html, but I didn't record the times. The ecl code for creating a 1GB dataset and distributing it across all slave nodes of the THOR follows:
Code: Select all
// Generate standard terasort datafile

unsigned8 numrecs := 1000000000/CLUSTERSIZE : stored('numrecs');   // rows per node

rec := record
      string10  key;
      string10  seq;
      string80  fill;
       end;

seed := dataset([{'0', '0', '0'}], rec);

rec addNodeNum(rec L, unsigned4 c) := transform
    SELF.seq := (string) (c-1);
    SELF := L;
  END;

one_per_node := distribute(normalize(seed, CLUSTERSIZE, addNodeNum(LEFT, COUNTER)), (unsigned) seq);

rec fillRow(rec L, unsigned4 c) := transform

    SELF.key := (>string1<)(RANDOM()%95+32)+
                (>string1<)(RANDOM()%95+32)+
                (>string1<)(RANDOM()%95+32)+
                (>string1<)(RANDOM()%95+32)+
                (>string1<)(RANDOM()%95+32)+
                (>string1<)(RANDOM()%95+32)+
                (>string1<)(RANDOM()%95+32)+
                (>string1<)(RANDOM()%95+32)+
                (>string1<)(RANDOM()%95+32)+
                (>string1<)(RANDOM()%95+32);

    unsigned4 n := ((unsigned4)L.seq)*numrecs+c;
    SELF.seq := (string10)n;
    unsigned4 cc := (n-1)*8;
    string1 c1 := (>string1<)((cc)%26+65);
    string1 c2 := (>string1<)((cc+1)%26+65);
    string1 c3 := (>string1<)((cc+2)%26+65);
    string1 c4 := (>string1<)((cc+3)%26+65);
    string1 c5 := (>string1<)((cc+4)%26+65);
    string1 c6 := (>string1<)((cc+5)%26+65);
    string1 c7 := (>string1<)((cc+6)%26+65);
    string1 c8 := (>string1<)((cc+7)%26+65);
    SELF.fill := c1+c1+c1+c1+c1+c1+c1+c1+c1+c1+
             c2+c2+c2+c2+c2+c2+c2+c2+c2+c2+
             c3+c3+c3+c3+c3+c3+c3+c3+c3+c3+
             c4+c4+c4+c4+c4+c4+c4+c4+c4+c4+
             c5+c5+c5+c5+c5+c5+c5+c5+c5+c5+
             c6+c6+c6+c6+c6+c6+c6+c6+c6+c6+
             c7+c7+c7+c7+c7+c7+c7+c7+c7+c7+
             c8+c8+c8+c8+c8+c8+c8+c8+c8+c8;
  END;

outdata := NORMALIZE(one_per_node, numrecs, fillRow(LEFT, counter)); 

OUTPUT(outdata,,'nhtest::terasort1',overwrite);
.
And the ecl code for doing the sort is here:
Code: Select all
// Perform global terasort

#option('THOR_ROWCRC', 0); // don/t need individual row CRCs

rec := record
      string10  key;
      string10  seq;
      string80  fill;
       end;

in := DATASET('nhtest::terasort1',rec,FLAT);
OUTPUT(SORT(in,key,UNSTABLE),,'nhtest::terasort1out',overwrite);
.

So that image has 16 cpu, did you only have 1 thor slave per node?


The perl code and bash scripts in the repository puts 16 thor slaves per ec2 instance.
tlhumphrey2
 
Posts: 250
Joined: Mon May 07, 2012 6:23 pm

Mon Mar 02, 2015 3:03 pm Change Time Zone

Ok great, that's the same code I'm using for the terasort.

I actually have some interesting findings on timings on an 8 node cluster, each with 32 cores. 7 thor slave nodes with [1,4,8,16,27,30,32] slaves per node. The fastest was the 27 per node. (for a 1TB file)

Using Ganglia, it was really easy to see the stats of cpu_io wait across the cluster, along with all the other cpu,IO and memory utilization metrics.

Of course this was just a test of terasort, so YMMV as other job profiles could be different.
Lee_Meadows
 
Posts: 16
Joined: Mon Jul 21, 2014 1:43 pm

Mon Mar 02, 2015 8:26 pm Change Time Zone

Lee,

Can you post the results on got when you ran terasort? I know others would appreciate it. A table with 1) thor execution time, instance type, number of slave instances, and number of slaves per instance.

By the way, here is one result I got when executing the terasort with 1BG of data.

It ran 2 minutes and 29 seconds on a THOR that had 2 ec2 i2.8xlarge instances for slave nodes, each having 16 slave nodes per instance. This was better than our internal cloud, REIL100, which had 100 slave nodes.

Tim
tlhumphrey2
 
Posts: 250
Joined: Mon May 07, 2012 6:23 pm

Sun Mar 22, 2015 9:12 pm Change Time Zone

Hi Tim,
I would like to get more understanding on your finding. You have mentioned that you have used 1BG of data. I am little confused on what you have used to find out the results of 2 Min and 29 Seconds. Is it 1 TB data or 1 GB data?

Also, you said that you have got 2 instances of EC2 (i2.8xlarge - 244 GB RAM - Total 488 GB RAM) which has 32 Thor Slave Nodes configured in total (16 on each instance). What was the slave node configuration? Are we missing something on the Sort Timings given that 488 GB RAM used?

My basic question:
I was under impression that you can configure only one Slave on given instance. How do we configure multiple Slave nodes in a instance? It would be great helpful if you have any pointers for the same.

Regards,
Subbu
kps_mani
 
Posts: 24
Joined: Wed Mar 04, 2015 3:42 pm

Wed Jun 10, 2015 4:24 pm Change Time Zone

Subbu,

Sorry for the really late response. I just saw your post.

I would like to get more understanding on your finding. You have mentioned that you have used 1BG of data. I am little confused on what you have used to find out the results of 2 Min and 29 Seconds. Is it 1 TB data or 1 GB data?

I wasn't very clear with my numbers was I. The total size of the data was 1TB. The size of each record was 100 bytes. So, the number of records was 10GB.
Also, you said that you have got 2 instances of EC2 (i2.8xlarge - 244 GB RAM - Total 488 GB RAM) which has 32 Thor Slave Nodes configured in total (16 on each instance). What was the slave node configuration? Are we missing something on the Sort Timings given that 488 GB RAM used?

There are 8 x 800 SSD disk drives on the i2.8xlarge which were raided to make one large volume. So, during the sort there had to be some spilling to disk.
My basic question:
I was under impression that you can configure only one Slave on given instance. How do we configure multiple Slave nodes in a instance? It would be great helpful if you have any pointers for the same

You can configure more than one slave node per instance. I've been using envgen to do so. Here is my envgen command:
Code: Select all
/opt/HPCCSystems/sbin/envgen  -env $created_environment_file -ipfile $private_ips \
  -supportnodes $supportnodes \
  -thornodes $thornodes \
  -roxienodes $roxienodes  \
  -slavesPerNode $slavesPerNode \
  -roxieondemand 1

where $created_environment_file is the name of the new environment.xml file that this command creates; $private_ips is the name of the file containing the private ip addresses for the instances you launched where the 1st IP in the file is for the master and other support functions, the next $thornodes IPs in the file are the instances that will have thor slave nodes, and the final $roxienodes IPs in the file are the instances that will have roxie nodes.

You will notice that one of the parameters for this command is -slavesPerNode and $slavesPerNode is the number of thor slave nodes you want per instance.

When you execute this command, make sure you have 1st stopped the HPCC System. And, after you have executed this command, use hpcc-push.sh to push the new environment file to every instance of your HPCC System. My hpcc-push.sh looks like:
Code: Select all
/opt/HPCCSystems/sbin/hpcc-push.sh \
  -s $created_environment_file \
  -t /etc/HPCCSystems/environment.xml 

where $created_environment_file is the same as in the envgen command.

Once this is all done, make sure you restart the HPCC System.
tlhumphrey2
 
Posts: 250
Joined: Mon May 07, 2012 6:23 pm

Thu Jun 11, 2015 11:30 am Change Time Zone

@tlhumphrey2

Tim, sorry I missed your earlier posting towards me. Do you work at LN/RE ?

If so, Flavio has my email, you can get it form him and we can talk offline. I don't want to post up my email on here.

Lee
Lee_Meadows
 
Posts: 16
Joined: Mon Jul 21, 2014 1:43 pm


Return to Cloud

Who is online

Users browsing this forum: No registered users and 1 guest

cron