Fri Dec 03, 2021 3:14 am
Login Register Lost Password? Contact Us


Inexplicable error when submitting a job to thor.

Comments and questions related to the Enterprise Control Language

Wed Oct 19, 2011 3:35 pm Change Time Zone

Hi,

I've been attempting to use multiple input files.
I have 4 sprayed in logical files:
~thor::niv::genesis
~thor::niv::exodus
~thor::niv::levit
~thor::niv::numbers

The code below passes the syntax checker but fails once submitted to thor.
Code: Select all
IMPORT * from STD.str;
IMPORT * from STD;

Layout_Book := RECORD
   STRING Text;
END;

base:= IF(System.Job.platform()='standalone', '', '~thor::niv::') : GLOBAL;
SetBooks := ['genesis', 'exodus', 'levit','numbers'];

SET OF DATASET(Layout_Book) Raw := [DATASET(base+SetBooks[1],Layout_Book,CSV(HEADING(4),SEPARATOR(''))),
                                    DATASET(base+SetBooks[2],Layout_Book,CSV(HEADING(4),SEPARATOR(''))),
                                                      DATASET(base+SetBooks[3],Layout_Book,CSV(HEADING(4),SEPARATOR(''))),
                                                      DATASET(base+SetBooks[4],Layout_Book,CSV(HEADING(4),SEPARATOR('')))];
OUTPUT(Raw[2]);

/*
Raw1 := DATASET(base+SetBooks[1],Layout_Book,CSV(HEADING(4),SEPARATOR('')));
Output(Raw1);
Raw2 := DATASET(base+SetBooks[2],Layout_Book,CSV(HEADING(4),SEPARATOR('')));
Output(Raw2);
Raw3 := DATASET(base+SetBooks[3],Layout_Book,CSV(HEADING(4),SEPARATOR('')));
Output(Raw3);
Raw4 := DATASET(base+SetBooks[4],Layout_Book,CSV(HEADING(4),SEPARATOR('')));
Output(Raw4);
*/


The eclagent.log file is:
Code: Select all
00000000 2011-10-19 15:14:23 14448 14448 ECLAGENT build community_3.2.2-1
00000001 2011-10-19 15:14:23 14448 14448 Waiting for workunit lock
00000002 2011-10-19 15:14:23 14448 14448 Obtained workunit lock
00000003 2011-10-19 15:14:23 14448 14448 Loading dll (libW20111019-151422.so) from location /var/lib/HPCCSystems/myeclccserver/libW20111019-151422.so
00000004 2011-10-19 15:14:23 14448 14448 Starting process
00000005 2011-10-19 15:14:23 14448 14448 RoxieMemMgr: Setting memory limit to 314572800 bytes (300 pages)
00000006 2011-10-19 15:14:23 14448 14448 RoxieMemMgr: 320 Pages successfully allocated for the pool - memsize=335544320 base=0x9d800000 alignment=1048576 bitmapSize=10
00000007 2011-10-19 15:14:23 14448 14448 Waiting for run lock
00000008 2011-10-19 15:14:23 14448 14448 Obtained run lock
00000009 2011-10-19 15:14:23 14448 14448 setResultString(gl2,-3,'~thor::niv::')
0000000A 2011-10-19 15:14:23 14448 14448 setResultString(gl4,-3,'~thor::niv::genesis')
0000000B 2011-10-19 15:14:23 14448 14448 setResultString(gl6,-3,'~thor::niv::exodus')
0000000C 2011-10-19 15:14:23 14448 14448 setResultString(gl8,-3,'~thor::niv::levit')
0000000D 2011-10-19 15:14:23 14448 14448 setResultString(glA,-3,'~thor::niv::numbers')
0000000E 2011-10-19 15:14:23 14448 14448 Enqueuing on thor.thor to run wuid=W20111019-151422, graph=graph1, timelimit=600 seconds, priority=0
0000000F 2011-10-19 15:14:23 14448 14448 Thor on 192.168.65.128:6500 running W20111019-151422
00000010 2011-10-19 15:14:23 14448 14448 ERROR: 4: Graph[1], workunitwrite[8]: MP link closed (192.168.65.128:6600), Master exception (in item 1)
00000011 2011-10-19 15:14:23 14448 14448 Releasing run lock
00000012 2011-10-19 15:14:23 14448 14448 System error: 4: Graph[1], workunitwrite[8]: MP link closed (192.168.65.128:6600), Master exception
00000013 2011-10-19 15:14:23 14448 14448 4: System error: 4: Graph[1], workunitwrite[8]: MP link closed (192.168.65.128:6600), Master exception
00000014 2011-10-19 15:14:23 14448 14448 Process complete
00000015 2011-10-19 15:14:23 14448 14448 Workunit written complete


I also have the 'thormaster' log if required but it seems to say much the same thing.

I know the files themselves, and references to them, are fine, because if I uncomment out the currently commented code (and comment the earlier code) the submitted workunit works fine.

By the way if anyone could give a better way to load multiple datasets into one set
(by called a TRANSFORM from a PROJECT I expect) I would be very greatful.

Yours

Allan
Allan
 
Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Wed Oct 19, 2011 5:00 pm Change Time Zone

Hi,

OUTPUT(Raw[2]);

does it also fail if you read 'exodus' directly, instead of via Raw[2] ?
i.e. OUTPUT( DATASET(base+SetBooks[2],Layout_Book,CSV(HEADING(4),SEPARATOR('')) ))

I may need to see the slave logs, could you post them here?
jsmith
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 81
Joined: Tue Jul 19, 2011 12:58 pm

Wed Oct 19, 2011 6:27 pm Change Time Zone

Hi Jsmith,

The:

Code: Select all
OUTPUT( DATASET(base+SetBooks[2],Layout_Book,CSV(HEADING(4),SEPARATOR('')) ));

instead of:
Code: Select all
OUTPUT(Raw[2]);


Works fine.

Err - what and where are the 'slave' logs?
Yours
Allan
Allan
 
Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Wed Oct 19, 2011 7:51 pm Change Time Zone

>>>Err - what and where are the 'slave' logs?

JSmith is looking for the slave log(s), which are only available (as far as I know) by going to the appropriate cluster, selecting the appropriate slave, then clicking on the disk icon.

Regards,

Bob
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1005
Joined: Wed Jun 29, 2011 7:13 pm

Wed Oct 19, 2011 11:52 pm Change Time Zone

Are they in the same format?
If so join them into the same superfile - then you can read them as one file ...

David
dabayliss
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 109
Joined: Fri Apr 29, 2011 1:35 pm

Thu Oct 20, 2011 12:25 am Change Time Zone

There's also a implicit super file format, e.g.:

d := DATASET(base+'{'+SetBooks[1]+','+SetBooks[2]+','+SetBooks[3]+','+SetBooks[4]+'}', Layout_Book,CSV(HEADING(4),SEPARATOR('')));


>what and where are the 'slave' logs?

For Thor, there's a master log + a slave log per thor node in the cluster.
The path to the master log is listed in the workunit under Helpers, the slave logs will have a very similar path on each thor node.
e.g. master log :
//192.168.16.101/var/log/HPCCSystems/mythor/10_18_2011_17_00_30/THORMASTER.log

slave logs will be, e.g.:
//192.168.16.101/var/log/HPCCSystems/mythor/10_18_2011_17_00_30_6600/THORSLAVE.192.168.16.101_6600.log
jsmith
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 81
Joined: Tue Jul 19, 2011 12:58 pm

Thu Oct 20, 2011 2:47 pm Change Time Zone

Hi,

This is the thormaster log from the failed run.

I cannot find any slave files.
Code: Select all
00000026 2011-10-19 15:14:23 14243 14243 Started wuid=W20111019-151422, user=hpccdemo, graph=graph1
**
00000027 2011-10-19 15:14:23 14243 14243 Query /var/lib/HPCCSystems/queries/mythor/V2664623750_libW20111019-151422.so loaded
00000028 2011-10-19 15:14:23 14243 14243 allocateMPTag: tag = 65537
00000029 2011-10-19 15:14:23 14243 14243 allocateMPTag: tag = 65538
0000002A 2011-10-19 15:14:23 14243 14243 allocateMPTag: tag = 65539
0000002B 2011-10-19 15:14:23 14243 14243 allocateMPTag: tag = 65540
0000002C 2011-10-19 15:14:23 14243 14243 allocateMPTag: tag = 65541
0000002D 2011-10-19 15:14:23 14243 14243 Graph graph1 created
0000002E 2011-10-19 15:14:23 14243 14243 Running graph=graph1
0000002F 2011-10-19 15:14:23 14243 14243 temp directory cleared
00000030 2011-10-19 15:14:23 14243 14243 Add: Launching graph thread for graphId=1
00000031 2011-10-19 15:14:23 14243 14464 Running graph [global] :   <graph>
   <node id="2" label="Csv Read">
    <att name="definition" value="Examples\ExProject.ecl(12,37)"/>
    <att name="_kind" value="99"/>
    <att name="ecl" value="DATASET(INTERNAL(&apos;gl4&apos;), layout_book, CSV(header(4), separator(&apos;&apos;)));&#10;"/>
    <att name="recordSize" value="4..4096(260)"/>
    <att name="recordCount" value="0..?[disk]"/>
   </node>
   <node id="3" label="Csv Read">
    <att name="definition" value="Examples\ExProject.ecl(13,37)"/>
    <att name="_kind" value="99"/>
    <att name="ecl" value="DATASET(INTERNAL(&apos;gl6&apos;), layout_book, CSV(header(4), separator(&apos;&apos;)));&#10;"/>
    <att name="recordSize" value="4..4096(260)"/>
    <att name="recordCount" value="0..?[disk]"/>
   </node>
   <node id="4" label="Csv Read">
    <att name="definition" value="Examples\ExProject.ecl(14,19)"/>
    <att name="_kind" value="99"/>
    <att name="ecl" value="DATASET(INTERNAL(&apos;gl8&apos;), layout_book, CSV(header(4), separator(&apos;&apos;)));&#10;"/>
    <att name="recordSize" value="4..4096(260)"/>
    <att name="recordCount" value="0..?[disk]"/>
   </node>
   <node id="5" label="Csv Read">
    <att name="definition" value="Examples\ExProject.ecl(15,19)"/>
    <att name="_kind" value="99"/>
    <att name="ecl" value="DATASET(INTERNAL(&apos;glA&apos;), layout_book, CSV(header(4), separator(&apos;&apos;)));&#10;"/>
    <att name="recordSize" value="4..4096(260)"/>
    <att name="recordCount" value="0..?[disk]"/>
   </node>
   <node id="6" label="Select Nway Input">
    <att name="definition" value="Examples\ExProject.ecl(17,8)"/>
    <att name="_kind" value="137"/>
    <att name="ecl" value="no_rowsetindex(raw, 2);&#10;"/>
    <att name="recordSize" value="4..4096(260)"/>
    <att name="recordCount" value="0..?[memory]"/>
   </node>
   <node id="7" label="Firstn">
    <att name="_kind" value="12"/>
    <att name="ecl" value="CHOOSEN(999);&#10;"/>
    <att name="recordSize" value="4..4096(260)"/>
    <att name="recordCount" value="0..999[group]"/>
   </node>
   <node id="8" label="Output&#10;Result #1">
    <att name="definition" value="Examples\ExProject.ecl(1,1)"/>
    <att name="name" value="exproject"/>
    <att name="definition" value="Examples\ExProject.ecl(17,1)"/>
    <att name="_kind" value="21"/>
    <att name="ecl" value="OUTPUT(..., workunit);&#10;"/>
    <att name="recordSize" value="4..4096(260)"/>
   </node>
   <att name="rootGraph" value="1"/>
   <edge id="2_0" source="2" target="6"/>
   <edge id="3_0" source="3" target="6">
    <att name="_targetIndex" value="1"/>
   </edge>
   <edge id="4_0" source="4" target="6">
    <att name="_targetIndex" value="2"/>
   </edge>
   <edge id="5_0" source="5" target="6">
    <att name="_targetIndex" value="3"/>
   </edge>
   <edge id="6_0" source="6" target="7"/>
   <edge id="7_0" source="7" target="8"/>
  </graph>
- graph(graph1, 1)
00000032 2011-10-19 15:14:23 14243 14464 getResultString(gl4,-3)
00000033 2011-10-19 15:14:23 14243 14464 ,FileAccess,Thor,READ,mythor,hpccdemo,thor::niv::genesis,W20111019-151422,graph1,207327,1,mythor
00000034 2011-10-19 15:14:23 14243 14464 getResultString(gl6,-3)
00000035 2011-10-19 15:14:23 14243 14464 ,FileAccess,Thor,READ,mythor,hpccdemo,thor::niv::exodus,W20111019-151422,graph1,177564,1,mythor
00000036 2011-10-19 15:14:23 14243 14464 getResultString(gl8,-3)
00000037 2011-10-19 15:14:23 14243 14464 ,FileAccess,Thor,READ,mythor,hpccdemo,thor::niv::levit,W20111019-151422,graph1,132582,1,mythor
00000038 2011-10-19 15:14:23 14243 14464 getResultString(glA,-3)
00000039 2011-10-19 15:14:23 14243 14464 ,FileAccess,Thor,READ,mythor,hpccdemo,thor::niv::numbers,W20111019-151422,graph1,184240,1,mythor
0000003A 2011-10-19 15:14:23 14243 14464 CONNECTING (id=2, idx=0) to (id=6, idx=0) - activity(nwayselect, 6)
0000003B 2011-10-19 15:14:23 14243 14464 CONNECTING (id=3, idx=0) to (id=6, idx=1) - activity(nwayselect, 6)
0000003C 2011-10-19 15:14:23 14243 14464 CONNECTING (id=4, idx=0) to (id=6, idx=2) - activity(nwayselect, 6)
0000003D 2011-10-19 15:14:23 14243 14464 CONNECTING (id=5, idx=0) to (id=6, idx=3) - activity(nwayselect, 6)
0000003E 2011-10-19 15:14:23 14243 14464 allocateMPTag: tag = 65542
0000003F 2011-10-19 15:14:23 14243 14464 CONNECTING (id=6, idx=0) to (id=7, idx=0) - activity(firstn, 7)
00000040 2011-10-19 15:14:23 14243 14464 allocateMPTag: tag = 65543
00000041 2011-10-19 15:14:23 14243 14464 CONNECTING (id=7, idx=0) to (id=8, idx=0) - activity(workunitwrite, 8)
00000042 2011-10-19 15:14:23 14243 14464 Query dll: /var/lib/HPCCSystems/queries/mythor/V2664623750_libW20111019-151422.so
00000043 2011-10-19 15:14:23 14243 14464 ,Progress,Thor,StartSubgraph,mythor,W20111019-151422,1,1,mythor,mythor.thor
00000044 2011-10-19 15:14:23 14243 14464 allocateMPTag: tag = 65544
00000045 2011-10-19 15:14:23 14243 14464 sendGraph took 5 ms - graph(graph1, 1)
00000046 2011-10-19 15:14:23 14243 14464 Processing graph - graph(graph1, 1)
00000047 2011-10-19 15:14:23 14243 14471 activity(firstn, 7) : Graph[1], firstn[7]: MP link closed (192.168.65.128:6600), Master exception
00000048 2011-10-19 15:14:23 14243 14469 activity(workunitwrite, 8) : Graph[1], workunitwrite[8]: MP link closed (192.168.65.128:6600), Master exception
00000049 2011-10-19 15:14:23 14243 14469 4: Graph[1], workunitwrite[8]: MP link closed (192.168.65.128:6600), Master exception
0000004A 2011-10-19 15:14:23 14243 14469 INFORM [EXCEPTION]
0000004B 2011-10-19 15:14:23 14243 14469 4: Graph[1], workunitwrite[8]: MP link closed (192.168.65.128:6600), Master exception
0000004C 2011-10-19 15:14:23 14243 14469 Posting exception: Graph[1], workunitwrite[8]: MP link closed (192.168.65.128:6600), Master exception to agent 192.168.65.128 for workunit(W20111019-151422)
0000004D 2011-10-19 15:14:23 14243 14469 INFORM [EXCEPTION]
0000004E 2011-10-19 15:14:24 14243 14469 Abort condition set - activity(workunitwrite, 8)
0000004F 2011-10-19 15:14:24 14243 14469 Abort condition set - activity(firstn, 7)
00000050 2011-10-19 15:14:24 14243 14469 Abort condition set - activity(nwayselect, 6)
00000051 2011-10-19 15:14:24 14243 14469 Abort condition set - activity(csvread, 2)
00000052 2011-10-19 15:14:24 14243 14469 Abort condition set - activity(csvread, 5)
00000053 2011-10-19 15:14:24 14243 14469 Abort condition set - activity(csvread, 4)
00000054 2011-10-19 15:14:24 14243 14469 Abort condition set - activity(csvread, 3)
00000055 2011-10-19 15:14:24 14243 14469 Aborting master graph - graph(graph1, 1) : MP link closed (192.168.65.128:6600)
00000056 2011-10-19 15:14:25 14243 14469 Aborting slave graph - graph(graph1, 1) : MP link closed (192.168.65.128:6600)
00000057 2011-10-19 15:14:25 14243 14469 4: Reporting exception to WU : 4, Graph[1], workunitwrite[8]: MP link closed (192.168.65.128:6600), Master exception : Error aborting job, will cause thor restart
00000058 2011-10-19 15:14:25 14243 14469 Stopping jobManager
00000059 2011-10-19 15:14:25 14243 14471 4: Graph[1], firstn[7]: MP link closed (192.168.65.128:6600), Master exception
0000005A 2011-10-19 15:14:25 14243 14471 INFORM [EXCEPTION]
0000005B 2011-10-19 15:14:25 14243 14471 4: Graph[1], firstn[7]: MP link closed (192.168.65.128:6600), Master exception
0000005C 2011-10-19 15:14:25 14243 14471 INFORM [EXCEPTION]
0000005D 2011-10-19 15:14:28 14243 14257 SYS: PU=  6% MU=  7% MAL=304032 MMP=0 SBK=304032 TOT=556K RAM=189552K SWP=32K
0000005E 2011-10-19 15:14:28 14243 14257 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=3.7 kw/s=59.4 bsy=0 NIC: rxp/s=2.4 rxk/s=1.0 txp/s=3.0 txk/s=0.8 CPU: usr=0 sys=4 iow=0 idle=95
0000005F 2011-10-19 15:14:28 14243 14257 KERN_INFO: [68519.077420] thorslave_6600[14470]: segfault at 4a8d0855 ip 009f7e68 sp 024e0ff0 error 4 in libactivityslaves_lcr.so[8c6000+171000]
00000060 2011-10-19 15:14:53 14243 14243 Waiting on executing graphs to complete.
00000061 2011-10-19 15:14:53 14243 14243 Currently running graphId = 1
00000062 2011-10-19 15:15:23 14243 14243 Waiting on executing graphs to complete.
00000063 2011-10-19 15:15:23 14243 14243 Currently running graphId = 1
00000064 2011-10-19 15:15:23 14243 14465 4: /var/jenkins/workspace/Release-3.2.2/src/thorlcr/graph/thgraphmaster.cpp(73) : FAILED TO RECOVER FROM EXCEPTION, STOPPING THOR : Graph[1], workunitwrite[8]: MP link closed (192.168.65.128:6600), Master exception
00000065 2011-10-19 15:15:23 14243 14461 4: /var/jenkins/workspace/Release-3.2.2/src/thorlcr/graph/thgraphmaster.cpp(73) : FAILED TO RECOVER FROM EXCEPTION, STOPPING THOR : Graph[1], workunitwrite[8]: MP link closed (192.168.65.128:6600), Master exception
00000066 2011-10-19 15:15:23 14243 14465 ,Timing,ThorGraph,mythor,W20111019-151422,1,1,1,60261,FAILED,mythor,mythor.thor
00000067 2011-10-19 15:15:23 14243 14461 ,Progress,Thor,Terminate,mythor,mythor,mythor.thor,exception


Yours

Allan
Allan
 
Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Thu Oct 20, 2011 3:39 pm Change Time Zone

You'll need access to the file system to get to, there are not links to them in the IDE/workunit...

They'll be on the thor cluster nodes, under /var/log/HPCCSystems/mythor/<logdir*>
The logdir prefix is visible under Helpers, i.e. it is part of the link name, that takes you to the master log.
You'll need to note that down and login to the node that's hosting the thor slave(s) and get the logs that way.

Hope that helps.
jsmith
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 81
Joined: Tue Jul 19, 2011 12:58 pm

Thu Oct 20, 2011 5:30 pm Change Time Zone

reply to dabayliss,

Sure I expect there are other ways to kill a cat.

But I'm attempting to learn the ECL and need to understand errors when they occur.

Yours

Allan
Allan
 
Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Thu Oct 20, 2011 6:17 pm Change Time Zone

This is the THORMASTER.log from a run that failed.
Code: Select all
00000001 2011-10-20 17:34:49 16545 16545 Opened log file //192.168.65.128/var/log/HPCCSystems/mythor/10_20_2011_17_34_08/THORMASTER.log
00000002 2011-10-20 17:34:49 16545 16545 Build community_3.2.2-1
00000003 2011-10-20 17:34:49 16545 16545 calling initClientProcess Port 6500
00000004 2011-10-20 17:34:49 16545 16545 Found file 'thorgroup', using to form thor group
00000005 2011-10-20 17:34:49 16545 16545 Starting watchdog
00000006 2011-10-20 17:34:49 16545 16545 ThorMaster version 4.0, Started on 192.168.65.128:6500
00000007 2011-10-20 17:34:49 16545 16545 CThorRowManager initialized, memlimit = 2147483648
00000008 2011-10-20 17:34:49 16545 16545 Thor name = mythor, queue = mythor.thor, nodeGroup = mythor
00000009 2011-10-20 17:34:49 16545 16545 Creating sentinel file thor.sentinel for rerun from script
0000000A 2011-10-20 17:34:49 16545 16545 Waiting for 1 slaves to register
0000000B 2011-10-20 17:34:49 16545 16545 Verifying connection to slave 1
0000000C 2011-10-20 17:34:49 16545 16545 verified connection with 192.168.65.128:6600
0000000D 2011-10-20 17:34:49 16545 16545 Slaves connected, initializing..
0000000E 2011-10-20 17:34:49 16545 16545 Initialization sent to slave group
0000000F 2011-10-20 17:34:49 16545 16545 Registration confirmation from 192.168.65.128:6600
00000010 2011-10-20 17:34:49 16545 16545 Slave 1 (192.168.65.128:6600) registered
00000011 2011-10-20 17:34:49 16545 16545 Slaves initialized
00000012 2011-10-20 17:34:49 16545 16560 Started watchdog
00000013 2011-10-20 17:34:49 16545 16545 verifying mp connection to rest of cluster
00000014 2011-10-20 17:34:49 16545 16545 verified mp connection to rest of cluster
00000015 2011-10-20 17:34:49 16545 16545 ,Progress,Thor,Startup,mythor,mythor,mythor.thor,//192.168.65.128/var/log/HPCCSystems/mythor/10_20_2011_17_34_08/THORMASTER.log
00000016 2011-10-20 17:34:49 16545 16545 Listening for graph
00000017 2011-10-20 17:34:49 16545 16545 ThorLCR(192.168.65.128:6500) available, waiting on queue thor.thor
00000018 2011-10-20 17:35:49 16545 16559 SYS: PU=  6% MU=  8% MAL=226552 MMP=0 SBK=226552 TOT=276K RAM=220548K SWP=32K
00000019 2011-10-20 17:36:49 16545 16559 SYS: PU=  4% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=221340K SWP=32K
0000001A 2011-10-20 17:36:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.1 w/s=1.4 kw/s=15.4 bsy=1 NIC: rxp/s=0.9 rxk/s=0.1 txp/s=1.0 txk/s=0.4 CPU: usr=0 sys=2 iow=0 idle=96
0000001B 2011-10-20 17:37:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=221236K SWP=32K
0000001C 2011-10-20 17:37:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.6 kw/s=17.4 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=1 idle=96
0000001D 2011-10-20 17:38:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=221388K SWP=32K
0000001E 2011-10-20 17:38:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.3 kw/s=16.2 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=1 idle=96
0000001F 2011-10-20 17:39:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=221804K SWP=32K
00000020 2011-10-20 17:39:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.5 kw/s=16.2 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=2 iow=0 idle=96
00000021 2011-10-20 17:40:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=221688K SWP=32K
00000022 2011-10-20 17:40:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.5 kw/s=17.0 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=2 iow=1 idle=96
00000023 2011-10-20 17:41:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=222092K SWP=32K
00000024 2011-10-20 17:41:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.5 kw/s=16.9 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=1 idle=96
00000025 2011-10-20 17:42:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=221988K SWP=32K
00000026 2011-10-20 17:42:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.3 kw/s=14.8 bsy=0 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=0 idle=96
00000027 2011-10-20 17:43:49 16545 16559 SYS: PU=  4% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=222012K SWP=32K
00000028 2011-10-20 17:43:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.4 kw/s=16.5 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=2 iow=0 idle=96
00000029 2011-10-20 17:44:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=221920K SWP=32K
0000002A 2011-10-20 17:44:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.4 kw/s=16.0 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=1 idle=96
0000002B 2011-10-20 17:45:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=221816K SWP=32K
0000002C 2011-10-20 17:45:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.2 kw/s=14.8 bsy=0 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=0 idle=97
0000002D 2011-10-20 17:46:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=221720K SWP=32K
0000002E 2011-10-20 17:46:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.4 kw/s=16.1 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=0 idle=97
0000002F 2011-10-20 17:47:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=221752K SWP=32K
00000030 2011-10-20 17:47:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.4 kw/s=16.0 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=0 idle=97
00000031 2011-10-20 17:48:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=221776K SWP=32K
00000032 2011-10-20 17:48:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.2 kw/s=14.6 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=1 idle=97
00000033 2011-10-20 17:49:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=221928K SWP=32K
00000034 2011-10-20 17:49:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.4 kw/s=16.0 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=1 idle=96
00000035 2011-10-20 17:50:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=221952K SWP=32K
00000036 2011-10-20 17:50:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.5 kw/s=15.2 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=0 idle=97
00000037 2011-10-20 17:51:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=221980K SWP=32K
00000038 2011-10-20 17:51:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.4 kw/s=16.3 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=0 idle=97
00000039 2011-10-20 17:52:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=222008K SWP=32K
0000003A 2011-10-20 17:52:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.4 kw/s=16.0 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=0 idle=97
0000003B 2011-10-20 17:53:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=222540K SWP=32K
0000003C 2011-10-20 17:53:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.5 kw/s=16.3 bsy=0 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=0 idle=97
0000003D 2011-10-20 17:54:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=222556K SWP=32K
0000003E 2011-10-20 17:54:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.4 kw/s=16.0 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=1 idle=97
0000003F 2011-10-20 17:55:49 16545 16559 SYS: PU=  4% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=222460K SWP=32K
00000040 2011-10-20 17:55:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.3 kw/s=16.1 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=1 idle=96
00000041 2011-10-20 17:56:49 16545 16559 SYS: PU=  6% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=222364K SWP=32K
00000042 2011-10-20 17:56:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.3 kw/s=15.1 bsy=4 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=4 idle=94
00000043 2011-10-20 17:57:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=222392K SWP=32K
00000044 2011-10-20 17:57:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.3 kw/s=16.0 bsy=0 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=0 idle=97
00000045 2011-10-20 17:58:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=222416K SWP=32K
00000046 2011-10-20 17:58:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.3 kw/s=15.0 bsy=0 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=0 idle=97
00000047 2011-10-20 17:59:49 16545 16559 SYS: PU=  4% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=222564K SWP=32K
00000048 2011-10-20 17:59:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.3 kw/s=16.1 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=1 idle=96
00000049 2011-10-20 18:00:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=222592K SWP=32K
0000004A 2011-10-20 18:00:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.4 kw/s=16.2 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=1 idle=96
0000004B 2011-10-20 18:01:49 16545 16559 SYS: PU=  3% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=222492K SWP=32K
0000004C 2011-10-20 18:01:49 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.4 kw/s=15.2 bsy=1 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=1 iow=0 idle=97
0000004D 2011-10-20 18:11:13 16545 16559 SYS: PU=  0% MU=  8% MAL=255344 MMP=0 SBK=255344 TOT=368K RAM=222684K SWP=32K
0000004E 2011-10-20 18:11:13 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=0.1 kw/s=1.1 bsy=0 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=0 iow=0 idle=99
0000004F 2011-10-20 18:12:13 16545 16559 SYS: PU= 15% MU=  8% MAL=255336 MMP=0 SBK=255336 TOT=368K RAM=223740K SWP=32K
00000050 2011-10-20 18:12:13 16545 16559 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=1.7 kw/s=19.2 bsy=13 NIC: rxp/s=2.9 rxk/s=0.3 txp/s=0.8 txk/s=0.2 CPU: usr=0 sys=2 iow=4 idle=91


This is the THORSLAVE from the same run.
Code: Select all
00000000 2011-10-20 17:34:49 16542 16542 Opened log file //192.168.65.128/var/log/HPCCSystems/mythor/10_20_2011_17_34_08_6600/THORSLAVE.192.168.65.128_6600.log
00000001 2011-10-20 17:34:49 16542 16542 Build community_3.2.2-1
00000002 2011-10-20 17:34:49 16542 16542 calling initClientProcess
00000003 2011-10-20 17:34:49 16542 16542 registering 192.168.65.128:6600 - master 192.168.65.128:6500
00000004 2011-10-20 17:34:49 16542 16542 Initialization received
00000005 2011-10-20 17:34:49 16542 16542 Registration confirmation sent
00000006 2011-10-20 17:34:49 16542 16542 verifying mp connection to rest of cluster
00000007 2011-10-20 17:34:49 16542 16542 verified mp connection to rest of cluster
00000008 2011-10-20 17:34:49 16542 16542 registered 192.168.65.128:6600
00000009 2011-10-20 17:34:49 16542 16542 CThorRowManager initialized, memlimit = 2147483648
0000000A 2011-10-20 17:34:49 16542 16542 ThorSlave Version LCR - 4.0 started
0000000B 2011-10-20 17:34:49 16542 16542 Slave 192.168.65.128:6600 - thor_tmp_dir set to : /var/lib/HPCCSystems/mythor/temp/
0000000C 2011-10-20 17:34:49 16542 16542 Using querySo directory: /var/lib/HPCCSystems/queries/mythor
0000000D 2011-10-20 17:34:49 16542 16542 FileCache: limit = 1800, purgeN = 10
0000000E 2011-10-20 17:34:49 16542 16562 Watchdog: thread running


I hope this helps.

Yours
Allan
Allan
 
Posts: 442
Joined: Sat Oct 01, 2011 7:26 pm

Next

Return to ECL

Who is online

Users browsing this forum: No registered users and 1 guest