Sun May 19, 2019 8:29 pm
Login Register Lost Password? Contact Us


Installing Ganglia Ubuntu 14.04

Questions related to node architecture, redundancy and system monitoring

Sun Aug 14, 2016 10:27 am Change Time Zone

Hi There,

I am having problems installing Ganglia on my Test Cluster which is no doubt probably down to me and my limited Ubuntu skills. I have followed the HPCC Monitoring and reporting document from the HPCC website as best I can, and was hoping someone could have a look at my process and give me some help.

Here is how I have installed Ganglia on the master node in my cluster :

sudo apt-get update && sudo apt-get -y upgrade
sudo apt-get install -y ganglia-monitor rrdtool gmetad ganglia-webfrontend
sudo cp /etc/ganglia-webfrontend/apache.conf /etc/apache2/sites-enabled/ganglia.conf

I then edit this file : sudo vi /etc/ganglia/gmetad.conf adding in my cluster name and the frequency data is collected :

data_source "my cluster" 60 localhost

I then edit this file sudo vi /etc/ganglia/gmond.conf comment out the mcast_join and add my local hosts, as per the gmetad.conf file.

udp_send_channel {
#mcast_join = 239.2.11.71
host = localhost
port = 8649
ttl = 1

udp_recv_channel {
#mcast_join = 239.2.11.71 ## comment out
port = 8649
#bind = 239.2.11.71 ## comment out
}

Once complete I then restart the services : sudo service ganglia-monitor restart && sudo service gmetad restart && sudo service apache2 restart

I can connect to Ganglia no problem once I have completed these steps by going to : http://IPADDRESS/Ganglia

The part I am getting stuck on, is bit of the document :

If you have a Ganglia monitoring server running in your environment, you already have the required components and
prerequisites. Verify that you have /etc/ganglia/conf.d and /etc/ganglia/.pyconf files in place and then add the Roxie
nodes you wish to monitor. You can do that by installing the Ganglia components and HPCC Monitoring components
on to each Roxie node.
If you do not have Ganglia, or want to install it, read the Ganglia documentation provided at the above link, and install
it and any system dependencies. You will then need to download and install the HPCC Monitoring component.

These two files : /etc/ganglia/conf.d and /etc/ganglia/.pyconf - do not exist on my test box, so have I maybe missed something out?

I have downloaded the files from : http://sourceforge.net/apps/trac/gangli ... on_modules

but I am unsure how to proceed correctly from here correctly step by step.

Our HPCC version is : community_5.4.8-1 but I can only find a Ganglia monitoring tool for 5.2.0, 5.2.2, 5.6 or 6+ - will any of these work ok?

I ended up installing 5.2.2 as it was the closed I could get before 5.6 using these steps :

1. sudo dpkg -i hpccsystems-ganglia-monitoring-5.2.2-1trusty_amd64.deb
2. sudo apt-get update
3. sudo apt-get install -f
4. sudo dpkg -i hpccsystems-ganglia-monitoring-5.2.2-1trusty_amd64.deb
5. sudo service ganglia-monitor restart && sudo service gmetad restart && sudo service apache2 restart

When I go back to Ganglia I can see there are now more metrics added for things like roxie - which is great and looks promising.

I then ran : install_graphs_helper.sh after modified the following lines :

# echo "Alias /ganglia /usr/share/ganglia-webfrontend" >> /etc/apache2/apache2.conf; \

as the alias was already specified in : /etc/apache2/sites-enabled/ganglia.conf

and commented this part, as I have previously configured gmetad.conf

#sed 's/my cluster\" localhost/VM Cluster\" localhost/g' < /etc/ganglia/gmetad.conf > /tmp/gmetad.conf; mv /tmp/gmetad.conf /etc/ganglia/gmetad.conf; \

I then restarted the cluster : sudo bash start-hpcc.sh restart.

When I connect to ECL on port 8010 I can see the plug, but do get the following errors :

could not find rrd file for /var/lib/ganglia/rrds/__summaryindo__/disk_total.rrd three times.

It looks as though I am nearly there but clearly have a few steps I may have gotten wrong, any advice would be greatly appreciated. Once I have this configured correctly I will then move on to adding more nodes to Ganglia.

Thanks in Advance.
amillar
 
Posts: 16
Joined: Fri Oct 16, 2015 7:32 am

Mon Aug 15, 2016 8:13 pm Change Time Zone

It looks like you are fairly close.

The short answer is that the specific version of ganglia you are running on Ubuntu 14.04 doesn’t support the disk_total metric. You can remove it or replace it with other metric(s) that provide the information that you want.

In more detail…

On the node running ECLWatch with ganglia, there is file called ganglia.json located in /opt/HPCCSystems/componentfiles/files/ganglia/. It contains the graph and metric definitions that are used by the ECLWatch plugin. The default install contains a suggested predefined set of metrics, but it can be modified to meet your specific needs. Not all distros and versions of ganglia support all non-HPCC specific metrics. There are a number of 3rd party ganglia plugins available to allow for the monitoring of various metrics and parameters. The hpcc-ganglia-monitoring package includes plugins to monitor Roxie metrics and to surface those metrics (and possible others) in EclWatch.

The file disk_total.rrd is the round robin database file that stores the disk_total metric. The error is correct in that the file can’t be opened by the EclWatch ganglia plugin. Typically this is caused by something as simple as not waiting long enough for the metric to be populated. Allow some time to pass for all the metrics to be populated and updated. It is expected that not all metrics would be visible when initially starting ganglia monitoring.

Alternatively, the ganglia plugin responsible for collecting the metric data could be failing or not configured properly. In the case of disk_total, that metric is provided by the ganglia package, and is not an HPCC specific metric. You can examine gmond.conf, and in there you should see the entry for disk_total. In 14.04 it does not appear that disk_total is supported. If you drill down through the configs you will find that the python code is missing for this metric.

In general to debug gmond you may want to telnet to port 8649 (gmond default) to see which metrics are being gathered by the gmond processing running on the node of interest.

To debug the gmetad aggregation service you can telnet to port 8651 (default) on the EclWatch node running gmetad.

If you simple don’t care about a particular metric, you can remove it from the ganglia.json and gmond.conf files.

As far as the install_graphs_helper.sh script, that file is used as part of our internal VM build process. Some of the steps in that script may not be applicable to all users. While others may want use their own customizations. It is intended primarily as a sample or template for users to use or examine.

Let me know if you have any other questions.
Gleb Aronsky
 
Posts: 22
Joined: Fri Feb 10, 2012 1:49 pm

Tue Aug 16, 2016 2:11 pm Change Time Zone

Hi Gleb,

Thanks for the quick reply that’s been a great help for me to understand how all these components work together.

This morning I completely removed and purged all of the ganglia components and folders, one to see if I can safely remove everything ok, and two to start the setup from fresh so I am more familiar with it.

I have now re-setup the Ganglia cluster, by just installing Ganglia monitor, RRDtool, Gmeatad, Ganglia Web Frontend on the Ganglia master node, and then just the Ganglia monitor (Gmond) on the other nodes. A quick config of the gmetad.conf and gmond.conf on the master, and the gmond.conf on the other nodes and all seems to be working ok.

I then proceeded to install the HPCC monitoring component ( hpccsystems-ganglia-monitoring-5.2.2-1trusty_amd64.deb), on all the nodes and then ran sudo bash start-hpcc.sh restart at the end of the process, and I can see all the nodes and metrics via http://IPADDRESS/Ganglia, the ECL watch plugin as well as the /etc/ganglia/conf.d and /etc/ganglia/.pyconf files :D

The only part in the ECL watch that does not seem to be working is the “custom monitoring” I can use the drop down to select the cluster and metrics e.g. free mem over the last hour, but when I hit “generate graph” nothing seems to happen, no errors or anything – is this just a case of waiting for metrics to be generated over the next few hours?

Also this time around I have completely bypassed the install_graphs_helper.sh script, as I think that was complicating my set-up and it I don’t think it’s needed, but do let me know if I am wrong on this.

I think I may have jumped the gun a bit, with these errors:

could not find rrd file for /var/lib/ganglia/rrds/__summaryindo__/disk_total.rrd three times.

As after around 10mins (thanks to your post) these errors do go away. ;)

The next stage for me now is keeping tabs on the RRDS folder, I see that only after a few hours on a test cluster not doing very much the space used is over 1GB.

Do you have any steps I can follow to limit the size of this database? E.g. keep the last three months of data? Or possibly a CRON job to delete files older than a certain date (if possible)?

I took a look at the gmetad.conf about the RR archives, but am having trouble working out how to set it to limit the database size :

#
# Round-Robin Archives
# You can specify custom Round-Robin archives here (defaults are listed below)
#
# Old Default RRA: Keep 1 hour of metrics at 15 second resolution. 1 day at 6 minute
# RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" \
# "RRA:AVERAGE:0.5:5760:374"
# New Default RRA
# Keep 5856 data points at 15 second resolution assuming 15 second (default) polling. That's 1 day
# Two weeks of data points at 1 minute resolution (average)
#RRAs "RRA:AVERAGE:0.5:1:5856" "RRA:AVERAGE:0.5:4:20160" "RRA:AVERAGE:0.5:40:52704"


The next stage after this is to tackle Nagios!

Thanks again for your help.

Best Regards

Antony
amillar
 
Posts: 16
Joined: Fri Oct 16, 2015 7:32 am

Tue Aug 16, 2016 9:17 pm Change Time Zone

I installed hpccsystems-ganglia-monitoring 5.2.2 on Ubuntu 14.04 with HPCC 5.4.8-1, and I was not able to reproduce your issue with custom monitoring. The fact that you get some graphs and not custom graphs is a bit strange. Btw there is a ganglia-monitoring 5.4.2 package on the portal, though I don’t think that is the issue you are seeing.

Can you please provide a portion of the esp log (/var/log/HPCCSystems/myesp/esp.log) that deals with the custom graph call. You can tail the log and you will see entries starting with “RRDTOOL GRAPH CMD -->” for every graph that is generated. If there is an error I would expect to see it after that entry. You can also try copying the command from the log and running it directly on the command line to see if you get an error. In general, if the rrd file is there and the ESP can access the file for reading, then I would expect the command to generate a graph.

In regards to size, the round robin database stores data with progressively less resolution. So data that is 6 months old is less granular than data from the last hour. The RRD file should reach a fixed size and write over itself, so you won’t have to worry about cycling the file out. I am not sure how to map the defined resolution, the number of metrics, and the types of data stored (such as avg and running count) to a fixed file size. You may need to research RR databases further to get a concrete answer, but some experimentation will probably give some insight to the max file size you could expect.
Gleb Aronsky
 
Posts: 22
Joined: Fri Feb 10, 2012 1:49 pm

Wed Aug 17, 2016 3:05 pm Change Time Zone

Hi Gleb,

Thanks for your continued help on this.

I have just had a look for the 5.4.2 package on https://hpccsystems.com/download/Monitoring but I can’t seem to find it, do you have a location I can download it from?

Great news on the RRD files, I will keep an eye on this as we do have limited space on our cluster.

To get things moving forward, I have just added the GMOND and the HPCC Ganglia monitor to one of our live roxies. This is running on platform 5.2.0-1 so I installed that version of the HPCC monitor also.

Currently this live Roxie is logging directly to my test cluster, which is on a 5.4.8-1 – I imagine that won’t make any difference though as the Ganglia versions are the same, please let me know if that is incorrect.

However currently no graphs are showing on this one, I have attached a screen shot of the errors, which I still get 30mins or so later.

Here is a sample of ESP log from my test cluster you ask for :

000B44 2016-08-17 13:58:41.276 19165 17958 "RRDTOOL GRAPH CMD --> /usr/bin/rrdtool graph /tmp/hpcc_ws_rrd_graphsJs0Sfx/graphhlXVNz -a SVG --start 1471350866 --end 1471354466 DEF:ds11=/var/lib/ganglia/rrds/TEST HPCC/localhost/mem_free.rrd:sum:AVERAGE LINE1:ds11#0000FF:mem_free -w 300 -h 120 -t 'TEST HPCC:localhost:mem_free' <--"
00000B45 2016-08-17 13:58:41.281 19165 17958 "================================================"
00000B46 2016-08-17 13:58:41.281 19165 17958 "Signal: 11 Segmentation fault"
00000B47 2016-08-17 13:58:41.281 19165 17958 "Fault IP: 00007FA465B3BD63"
00000B48 2016-08-17 13:58:41.281 19165 17958 "Accessing: 0000000000000004"
00000B49 2016-08-17 13:58:41.281 19165 17958 "Registers:"
00000B4A 2016-08-17 13:58:41.281 19165 17958 "EAX:0000000000000004 EBX:0000000000000001 ECX:00007FA465B52620 EDX:0000000000000064 ESI:0000000000000022 EDI:0000000000000004"
00000B4B 2016-08-17 13:58:41.281 19165 17958 "CS:EIP:0033:00007FA465B3BD63"
00000B4C 2016-08-17 13:58:41.281 19165 17958 " ESP:00007FA43B5A1DF8 EBP:00007FA42C0036A0"
00000B4D 2016-08-17 13:58:41.281 19165 17958 "Stack[00007FA43B5A1DF8]: 00007FA43BDEBA68 3BDEE06F00007FA4 00007FA43BDEE06F 2C0029E000007FA4 00007FA42C0029E0 0000000000007FA4 0000000000000000 0000000200000000"
00000B4E 2016-08-17 13:58:41.281 19165 17958 "Stack[00007FA43B5A1E18]: 0000000000000002 3B5A1EC000000000 00007FA43B5A1EC0 2C0029E000007FA4 00007FA42C0029E0 3B5A1F2000007FA4 00007FA43B5A1F20 0000000000007FA4"
00000B4F 2016-08-17 13:58:41.281 19165 17958 "Stack[00007FA43B5A1E38]: 0000000000000000 0000000000000000 0000000000000000 0000000100000000 0000000000000001 0000000100000000 0000000000000001 0000000400000000"
00000B50 2016-08-17 13:58:41.281 19165 17958 "Stack[00007FA43B5A1E58]: 0000000000000004 2C00275000000000 00007FA42C002750 2C0036A000007FA4 00007FA42C0036A0 2C00277800007FA4 00007FA42C002778 0000000100007FA4"
00000B51 2016-08-17 13:58:41.281 19165 17958 "Stack[00007FA43B5A1E78]: 0000000000000001 0000000100000000 0000000000000001 3B5A1EE000000000 00007FA43B5A1EE0 3B5A294000007FA4 00007FA43B5A2940 2C00293000007FA4"
00000B52 2016-08-17 13:58:41.281 19165 17958 "Stack[00007FA43B5A1E98]: 00007FA42C002930 0000000000007FA4 0000000000000000 0000000000000000 0000000000000000 2C00473000000000 00007FA42C004730 0000003600007FA4"
00000B53 2016-08-17 13:58:41.281 19165 17958 "Stack[00007FA43B5A1EB8]: 0000080000000036 0000000000000800 0000000000000000 0000000000000000 0000000000000000 2C003F2000000000 00007FA42C003F20 0000010200007FA4"
00000B54 2016-08-17 13:58:41.281 19165 17958 "Stack[00007FA43B5A1ED8]: 0000080000000102 0000000000000800 0000000000000000 0000000000000000 0000000000000000 2C0036D000000000 00007FA42C0036D0 0000002900007FA4"
00000B55 2016-08-17 13:58:41.281 19165 17958 "Backtrace:"
00000B56 2016-08-17 13:58:41.282 19165 17958 " /opt/HPCCSystems/lib/libjlib.so(+0xe2ff8) [0x7fa466898ff8]"
00000B57 2016-08-17 13:58:41.282 19165 17958 " /opt/HPCCSystems/lib/libjlib.so(_Z13excsighandleriP9siginfo_tPv+0x22c) [0x7fa46689aa7c]"
00000B58 2016-08-17 13:58:41.282 19165 17958 " /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330) [0x7fa465e8a330]"
00000B59 2016-08-17 13:58:41.282 19165 17958 " /lib/x86_64-linux-gnu/libc.so.6(+0x86d63) [0x7fa465b3bd63]"
00000B5A 2016-08-17 13:58:41.282 19165 17958 " /opt/HPCCSystems/lib/libws_rrd.so(_ZN16CRRDGraphWrapper8getGraphEP12MemoryBufferRK11StringArrayS4_S4_lliiPKcbS6_+0x798) [0x7fa43bdeba68]"
00000B5B 2016-08-17 13:58:41.282 19165 17958 " /opt/HPCCSystems/lib/libws_rrd.so(_ZN9Cws_rrdEx13ongetGraphSVGER11IEspContextR23IEspGraphSVGDataRequestR24IEspGraphSVGDataResponse+0xc4) [0x7fa43bdea474]"
00000B5C 2016-08-17 13:58:41.282 19165 17958 " /opt/HPCCSystems/lib/libws_rrd.so(_ZN6ws_rrd18Cws_rrdSoapBinding17onGetInstantQueryER11IEspContextP12CHttpRequestP13CHttpResponsePKcS8_+0x6fd) [0x7fa43bddfeed]"
00000B5D 2016-08-17 13:58:41.282 19165 17958 " /opt/HPCCSystems/lib/libesphttp.so(_ZN14EspHttpBinding5onGetEP12CHttpRequestP13CHttpResponse+0x1f8) [0x7fa46766f1d8]"
00000B5E 2016-08-17 13:58:41.282 19165 17958 " /opt/HPCCSystems/lib/libesphttp.so(_ZN14CEspHttpServer14processRequestEv+0x5e7) [0x7fa467679fd7]"
00000B5F 2016-08-17 13:58:41.282 19165 17958 " /opt/HPCCSystems/lib/libesphttp.so(_ZN11CHttpThread9onRequestEv+0x164) [0x7fa467675ab4]"
00000B60 2016-08-17 13:58:41.282 19165 17958 " /opt/HPCCSystems/lib/libesphttp.so(_ZN18CEspProtocolThread3runEv+0x31) [0x7fa4676a7fb1]"
00000B61 2016-08-17 13:58:41.282 19165 17958 " /opt/HPCCSystems/lib/libjlib.so(_ZN6Thread5beginEv+0x2d) [0x7fa46693e1ad]"
00000B62 2016-08-17 13:58:41.282 19165 17958 " /opt/HPCCSystems/lib/libjlib.so(_ZN6Thread11_threadmainEPv+0x1e) [0x7fa46693f97e]"
00000B63 2016-08-17 13:58:41.282 19165 17958 " /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7fa465e82184]"
00000B64 2016-08-17 13:58:41.282 19165 17958 " /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fa465baf37d]"
00000B65 2016-08-17 13:58:41.282 19165 17958 "ThreadList:
7FA46384F700 140344021022464 19166: CMPNotifyClosedThread
7FA46304E700 140344012629760 19167: CSocketBaseThread
7FA46284D700 140344004237056 19168: MP Connection Thread
7FA46204C700 140343995844352 19170: CMemoryUsageReporter
7FA4595B6700 140343850526464 19171: unknown
7FA44F780700 140343684630272 19172: unknown
7FA44B647700 140343616239360 19173: CDaliPublisherClient
7FA4427E7700 140343466948352 19174: unknown
7FA4411F5700 140343443937024 19175: unknown
7FA43BDA4700 140343355524864 19176: CSocketBaseThread
7FA43B5A3700 140343347132160 17958: CEspProtocolThread

I also tried running this directly via putty on the host master (hopefully I have done this correctly)

@HPCC-T1:~$ RRDTOOL GRAPH CMD --> /usr/bin/rrdtool graph /tmp/hpcc_ws_rrd_graphsJs0Sfx/graphhlXVNz -a SVG --start 1471350866 --end 1471354466 DEF:ds11=/var/lib/ganglia/rrds/TEST HPCC/localhost/mem_free.rrd:sum:AVERAGE
-bash: /usr/bin/rrdtool: Permission denied

Then again with Sudo :

@HPCC-T1:~$ sudo RRDTOOL GRAPH CMD --> /usr/bin/rrdtool graph /tmp/hpcc_ws_rrd_graphsJs0Sfx/graphhlXVNz -a SVG --start 1471350866 --end 1471354466 DEF:ds11=/var/lib/ganglia/rrds/TEST HPCC/localhost/mem_free.rrd:sum:AVERAGE
-bash: /usr/bin/rrdtool: Permission denied

Let me know if you need anymore information

Thanks in advance

Antony
Attachments
Screen Shot 2016-08-17 at 15.54.36.png
(131.72 KiB) Downloaded 1475 times
amillar
 
Posts: 16
Joined: Fri Oct 16, 2015 7:32 am

Wed Aug 17, 2016 9:07 pm Change Time Zone

To find hpcc-ganglia-monitoring 5.4.2-1 on the download page please select Previous for the version type. Here is the direct link: http://wpc.423a.rhocdn.net/00423A/relea ... _amd64.deb

It looks like the issue has to do with the space in the name of your cluster. Please change the cluster name from "TEST HPCC" to something like "TEST_HPCC". That should fix your issue with the missing graphs.

As far as the graph generation on the command line goes, the first part "RRD TOOL GRAPH CMD -->" should not be include as part of the command line. The command should start with /usr/bin/rrdtool. Sorry, I wasn't clear enough previously. However, I think that once you change your cluster name, the issue will be resolved.

It would also be a good idea to delete all the existing .rrds file before restarting ganglia with the new cluster name.
Gleb Aronsky
 
Posts: 22
Joined: Fri Feb 10, 2012 1:49 pm

Mon Aug 22, 2016 1:53 pm Change Time Zone

Hi Gleb,

Thanks for the link for the HPCC Monitoring agent ;)

This morning I have re-configured Ganglia to use the name "TEST_HPCC" and purged the rrds folder. This seems to have solved the problem with the custom graphs so thanks for that.

I am still having problems capturing information from one of our Live Roxies though. It is on a different subnet from my test cluster, but both can communicate ok on a network level, however when I got to 8010 on my Roxie, and click the plugin icon, no graphs display and I get around 15 errors which all seem to relate to opening files /var/lib/ganglia/rrds/__sumaryinfo__/ - my thought here is that ECL watch is trying to open these locally, rather than from the Ganglia master which has the rrds files. Do you know where the IP / Host information is configured for this? Should it not get this configuration from the gmond.conf file?

I thought the different subnet may be an issue so I set-up a new Ganglia monitor on another test machine on the same subnet, but I can’t get ESP to start when the monitor is installed, when I remove it, it starts ok. I can confirm that the HPCC agent and the HPCC platform version are 5.2.0.1.

I did have a good look around the internet, and Ganglia mention having the hostname configured in /etc/hosts can cause issues, so I have now entered this value on all the hosts configured for Ganglia "127.0.0.1 *sysname* localhost.localdomain localhost" I gave them all a restart but I am still getting the error opening graphs on the live roxie, however I can see metrics coming in on http://IPADDRESS/Ganglia. So I think communication between all the nodes is ok. Just the ECL watch plugin I am having problems with.

Any help would be greatly appreciated.

Best Regards

Antony
amillar
 
Posts: 16
Joined: Fri Oct 16, 2015 7:32 am

Tue Aug 23, 2016 2:08 pm Change Time Zone

Hi Antony,

I am not sure if I totally understand your exact configuration. In a typical small environment you should have an external monitoring node located outside the cluster you intend to monitor. The monitored cluster would have gmond running on each of the nodes. The external monitoring node would have the gmetad daemon, EclWatch. and/or the Apache Ganglia web interface running to view the graphs.

EclWatch looks locally for the rrds files, therefore an instance of gmetad or a remote mount of the rrds files is needed. An ECLWatch instance running in the cluster being monitored would need local access to the rrd files to view the graphs. Though if the ESP goes down in the cluster you wouldn't be able to access the graphs, so the external monitoring node is needed.

Can you provide ESP logs for the case where it fails to start with the ganglia monitoring plugin installed?

-Gleb
Gleb Aronsky
 
Posts: 22
Joined: Fri Feb 10, 2012 1:49 pm

Tue Aug 30, 2016 2:21 pm Change Time Zone

Hi Gleb,

Sorry the late reply,

We have a test HPCC cluster here which is currently not doing anything data wise so I have made that the “external monitoring node” and that has – gmetad,RRDtool, Ganglia monitor and Ganglia web front end installed, by running this command

sudo apt-get install -y ganglia-monitor rrdtool gmetad ganglia-webfrontend

The monitored Roxie cluster includes – two test roxie’s from my test cluster (I did this to practice the steps getting this set-up), and one live Roxie from my Live cluster (to test we are receiving some real metrics) each of these monitored nodes only has Ganglia Monitor installed by running this command:

sudo apt-get install -y ganglia-monitor

However, all machines have the HPCC monitoring agent installed by running these commands:

1. sudo dpkg -i hpccsystems-ganglia-monitoring-5.2.2-1trusty_amd64.deb
2. sudo apt-get update
3. sudo apt-get install -f
4. sudo dpkg -i hpccsystems-ganglia-monitoring-5.2.2-1trusty_amd64.deb
5. sudo service ganglia-monitor restart && sudo service gmetad restart && sudo service apache2 restart (on master node)
6. . sudo service ganglia-monitor restart (on monitored nodes)


I was unware that ECL Watch is looking locally for the files, but that would explain all the graph errors, do you know the steps involved to remote mount to another machine on Ubuntu 14.04?

Or is the preferred method to install Gmetad and change the config to point to the new RRDS file location?

# Where gmetad stores its round-robin databases
# default: "/var/lib/ganglia/rrds"
# rrd_rootdir "/some/other/place"

If so could you send over an example?

Below is some of the ESP log file from the 22nd, I couldn't upload it all with it being close to 1mb, you will see a few calls from IP : 192.168.20.72 which is my PC I am using to connect to ECL watch.

00000356 2016-08-22 14:26:49.907 12301 12306 "SYS: PU= 0% MU= 1% MAL=5406816 MMP=1564672 SBK=3842144 TOT=7316K RAM=707236K SWP=0K"
00000357 2016-08-22 14:26:49.907 12301 12306 "DSK: [sda] r/s=0.1 kr/s=1.5 w/s=6.4 kw/s=43.1 bsy=0 NIC: rxp/s=0.0 rxk/s=0.0 txp/s=0.0 txk/s=0.0 CPU: usr=0 sys=0 iow=0 idle=99"
00000358 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.897389] init: tty4 main process (1227) killed by TERM signal"
00000359 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.897661] init: tty5 main process (1230) killed by TERM signal"
0000035A 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.897909] init: tty2 main process (1235) killed by TERM signal"
0000035B 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.898168] init: tty3 main process (1236) killed by TERM signal"
0000035C 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.898419] init: tty6 main process (1238) killed by TERM signal"
0000035D 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.898665] init: cron main process (1279) killed by TERM signal"
0000035E 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.899106] init: tty1 main process (3088) killed by TERM signal"
0000035F 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.899985] init: irqbalance main process (27775) killed by TERM signal"
00000360 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.900647] init: ganglia-monitor main process (25128) killed by TERM signal"
00000361 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.900837] init: gmetad main process (25144) killed by TERM signal"
00000362 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.901206] init: plymouth-upstart-bridge main process (14247) terminated with status 1"
00000363 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.901219] init: plymouth-upstart-bridge main process ended, respawning"
00000364 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.945201] init: plymouth-upstart-bridge main process (14272) terminated with status 1"
00000365 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.945211] init: plymouth-upstart-bridge main process ended, respawning"
00000366 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.958314] init: plymouth-upstart-bridge main process (14276) terminated with status 1"
00000367 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.958327] init: plymouth-upstart-bridge main process ended, respawning"
00000368 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.965898] init: plymouth-upstart-bridge main process (14279) terminated with status 1"
00000369 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.965910] init: plymouth-upstart-bridge main process ended, respawning"
0000036A 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.970521] init: plymouth-upstart-bridge main process (14281) terminated with status 1"
0000036B 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.970533] init: plymouth-upstart-bridge main process ended, respawning"
0000036C 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.973068] init: plymouth-upstart-bridge main process (14283) terminated with status 1"
0000036D 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.973080] init: plymouth-upstart-bridge main process ended, respawning"
0000036E 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.977678] init: plymouth-upstart-bridge main process (14285) terminated with status 1"
0000036F 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.977692] init: plymouth-upstart-bridge main process ended, respawning"
00000370 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.983208] init: plymouth-upstart-bridge main process (14288) terminated with status 1"
00000371 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.983222] init: plymouth-upstart-bridge main process ended, respawning"
00000372 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.990099] init: plymouth-upstart-bridge main process (14291) terminated with status 1"
00000373 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.990112] init: plymouth-upstart-bridge main process ended, respawning"
00000374 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.994984] init: plymouth-upstart-bridge main process (14293) terminated with status 1"
00000375 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10295.994998] init: plymouth-upstart-bridge main process ended, respawning"
00000376 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10296.000381] init: plymouth-upstart-bridge main process (14296) terminated with status 1"
00000377 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10296.000395] init: plymouth-upstart-bridge respawning too fast, stopped"
00000378 2016-08-22 14:26:49.908 12301 12306 "KERN_UNKNOWN: <12>[10296.063247] init: wait-for-state (rcplymouth-shutdown) main process (14295) killed by TERM signal"
00000379 2016-08-22 14:26:50.785 12301 12301 "ESP Abort Handler..."
0000037A 2016-08-22 14:26:50.785 12301 12301 "select handler stopped."
00000001 2016-08-22 14:31:31.898 2280 2280 "Esp starting community_5.0.2-1"
00000002 2016-08-22 14:31:31.907 2280 2280 "componentfiles are under /opt/HPCCSystems/componentfiles"
00000003 2016-08-22 14:31:31.907 2280 2280 "ESP process name [myesp]"
00000004 2016-08-22 14:31:31.907 2280 2280 "Initializing DALI client [servers = 192.168.20.125:7070]"
00000005 2016-08-22 14:31:31.913 2280 2280 "Configuring Esp Platform..."
00000006 2016-08-22 14:31:31.913 2280 2280 "loadServices"
00000007 2016-08-22 14:31:32.139 2280 2280 "queueLabel=dfuserver_queue"
00000008 2016-08-22 14:31:32.139 2280 2280 "monitorQueueLabel=dfuserver_monitor_queue"
00000009 2016-08-22 14:31:32.139 2280 2280 "rootFolder=/c$/thordata"
0000000A 2016-08-22 14:31:32.292 2280 2280 "Initializing WsDfuXRef_EclWatch_myesp service [process = myesp]"
0000000B 2016-08-22 14:31:32.292 2280 2280 "Initializing WsDfu_EclWatch_myesp service [process = myesp]"
0000000C 2016-08-22 14:31:32.312 2280 2280 "Loaded DLL /opt/HPCCSystems/plugins/libpyembed.so"
0000000D 2016-08-22 14:31:32.312 2280 2280 "Current reported version is Python2.7 Embed Helper 1.0.0"
0000000E 2016-08-22 14:31:32.312 2280 2280 "Compatible version Python2.7 Embed Helper 1.0.0"
0000000F 2016-08-22 14:31:32.319 2280 2280 "Error loading /opt/HPCCSystems/plugins/libv8embed.so: libv8.so.3.14.5: cannot open shared object file: No such file or directory"
00000010 2016-08-22 14:31:32.319 2280 2280 "ERROR: 0: /var/lib/jenkins/workspace/CE-Candidate-with-plugins-5.0.2-1/CE/ubuntu-14.04-amd64/HPCC-Platform/common/dllserver/thorplugin.cpp(487) : Loading plugin : Failed to load plugin /opt/HPCCSystems/plugins/libv8embed.so"
00000011 2016-08-22 14:31:32.321 2280 2280 "Loaded DLL /opt/HPCCSystems/plugins/libworkunitservices.so"
00000012 2016-08-22 14:31:32.321 2280 2280 "Current reported version is WORKUNITSERVICES 1.0.1"
00000013 2016-08-22 14:31:32.321 2280 2280 "Compatible version WORKUNITSERVICES 1.0 "
00000014 2016-08-22 14:31:32.321 2280 2280 "Compatible version WORKUNITSERVICES 1.0.1"
00000015 2016-08-22 14:31:32.329 2280 2280 "Loaded DLL /opt/HPCCSystems/plugins/libauditlib.so"
00000016 2016-08-22 14:31:32.329 2280 2280 "Current reported version is AUDITLIB 1.0.1"
00000017 2016-08-22 14:31:32.329 2280 2280 "Compatible version AUDITLIB 1.0.0 [29933bc38c1f07bcf70f938ad18775c1]"
00000018 2016-08-22 14:31:32.329 2280 2280 "Compatible version AUDITLIB 1.0.1"
00000019 2016-08-22 14:31:32.424 2280 2280 "Loaded DLL /opt/HPCCSystems/plugins/libfileservices.so"
0000001A 2016-08-22 14:31:32.424 2280 2280 "Current reported version is FILESERVICES 2.1.3"
0000001B 2016-08-22 14:31:32.431 2280 2280 "Compatible version FILESERVICES 2.1 [a68789cfb01d00ef6dc362e52d5eac0e]"
0000001C 2016-08-22 14:31:32.431 2280 2280 "Compatible version FILESERVICES 2.1.1"
0000001D 2016-08-22 14:31:32.431 2280 2280 "Compatible version FILESERVICES 2.1.2"
0000001E 2016-08-22 14:31:32.431 2280 2280 "Compatible version FILESERVICES 2.1.3"
0000001F 2016-08-22 14:31:32.436 2280 2280 "Loaded DLL /opt/HPCCSystems/plugins/liblogging.so"
00000020 2016-08-22 14:31:32.436 2280 2280 "Current reported version is LOGGING 1.0.1"
00000021 2016-08-22 14:31:32.436 2280 2280 "Compatible version LOGGING 1.0.0 [66aec3fb4911ceda247c99d6a2a5944c]"
00000022 2016-08-22 14:31:32.436 2280 2280 "Compatible version LOGGING 1.0.1"
00000023 2016-08-22 14:31:32.444 2280 2280 "Error loading /opt/HPCCSystems/plugins/libRembed.so: libR.so: cannot open shared object file: No such file or directory"
00000024 2016-08-22 14:31:32.444 2280 2280 "ERROR: 0: /var/lib/jenkins/workspace/CE-Candidate-with-plugins-5.0.2-1/CE/ubuntu-14.04-amd64/HPCC-Platform/common/dllserver/thorplugin.cpp(487) : Loading plugin : Failed to load plugin /opt/HPCCSystems/plugins/libRembed.so"
00000025 2016-08-22 14:31:32.448 2280 2280 "Loaded DLL /opt/HPCCSystems/plugins/libdebugservices.so"
00000026 2016-08-22 14:31:32.448 2280 2280 "Current reported version is DEBUGSERVICES 1.0.1"
00000027 2016-08-22 14:31:32.449 2280 2280 "Loaded DLL /opt/HPCCSystems/plugins/libparselib.so"
00000028 2016-08-22 14:31:32.449 2280 2280 "Current reported version is PARSELIB 1.0.1"
00000029 2016-08-22 14:31:32.449 2280 2280 "Compatible version PARSELIB 1.0.0 [fa9b3ab8fad8e46d8c926015cbd39f06]"
0000002A 2016-08-22 14:31:32.449 2280 2280 "Compatible version PARSELIB 1.0.1"
0000002B 2016-08-22 14:31:32.452 2280 2280 "Loaded DLL /opt/HPCCSystems/plugins/libunicodelib.so"
0000002C 2016-08-22 14:31:32.452 2280 2280 "Current reported version is UNICODELIB 1.1.06"
0000002D 2016-08-22 14:31:32.452 2280 2280 "Compatible version UNICODELIB 1.1.01 [64d78857c1cecae15bd238cd7767b3c1]"
0000002E 2016-08-22 14:31:32.452 2280 2280 "Compatible version UNICODELIB 1.1.01 [e8790fe30d9627997749c3c4839b5957]"
0000002F 2016-08-22 14:31:32.452 2280 2280 "Compatible version UNICODELIB 1.1.02"
00000030 2016-08-22 14:31:32.452 2280 2280 "Compatible version UNICODELIB 1.1.03"
00000031 2016-08-22 14:31:32.452 2280 2280 "Compatible version UNICODELIB 1.1.04"
00000032 2016-08-22 14:31:32.452 2280 2280 "Compatible version UNICODELIB 1.1.05"
00000033 2016-08-22 14:31:32.453 2280 2280 "Loaded DLL /opt/HPCCSystems/plugins/libsqlite3embed.so"
00000034 2016-08-22 14:31:32.453 2280 2280 "Current reported version is SqLite3 Embed Helper 1.0.0"
00000035 2016-08-22 14:31:32.453 2280 2280 "Compatible version SqLite3 Embed Helper 1.0.0"
00000036 2016-08-22 14:31:32.461 2280 2280 "Loaded DLL /opt/HPCCSystems/plugins/libstringlib.so"
00000037 2016-08-22 14:31:32.461 2280 2280 "Current reported version is STRINGLIB 1.1.14"
00000038 2016-08-22 14:31:32.461 2280 2280 "Compatible version STRINGLIB 1.1.06 [fd997dc3feb4ca385d59a12b9dc4beab]"
00000039 2016-08-22 14:31:32.461 2280 2280 "Compatible version STRINGLIB 1.1.06 [f8305e66ca26a1447dee66d4a36d88dc]"
0000003A 2016-08-22 14:31:32.461 2280 2280 "Compatible version STRINGLIB 1.1.07"
0000003B 2016-08-22 14:31:32.461 2280 2280 "Compatible version STRINGLIB 1.1.08"
0000003C 2016-08-22 14:31:32.461 2280 2280 "Compatible version STRINGLIB 1.1.09"
0000003D 2016-08-22 14:31:32.461 2280 2280 "Compatible version STRINGLIB 1.1.10"
0000003E 2016-08-22 14:31:32.461 2280 2280 "Compatible version STRINGLIB 1.1.11"
0000003F 2016-08-22 14:31:32.461 2280 2280 "Compatible version STRINGLIB 1.1.12"
00000040 2016-08-22 14:31:32.461 2280 2280 "Compatible version STRINGLIB 1.1.13"
00000041 2016-08-22 14:31:32.470 2280 2280 "Error loading /opt/HPCCSystems/plugins/libjavaembed.so: libjvm.so: cannot open shared object file: No such file or directory"
00000042 2016-08-22 14:31:32.471 2280 2280 "ERROR: 0: /var/lib/jenkins/workspace/CE-Candidate-with-plugins-5.0.2-1/CE/ubuntu-14.04-amd64/HPCC-Platform/common/dllserver/thorplugin.cpp(487) : Loading plugin : Failed to load plugin /opt/HPCCSystems/plugins/libjavaembed.so"
00000043 2016-08-22 14:31:32.471 2280 2280 "Plugin /opt/HPCCSystems/plugins/libpyembed.so exports getECLPluginDefinition but does not export ECL - not loading"
00000044 2016-08-22 14:31:32.471 2280 2280 "Error loading /opt/HPCCSystems/plugins/libv8embed.so: libv8.so.3.14.5: cannot open shared object file: No such file or directory"
00000045 2016-08-22 14:31:32.471 2280 2280 "Loading plugin /opt/HPCCSystems/plugins/libworkunitservices.so[lib_WORKUNITSERVICES] version = WORKUNITSERVICES 1.0.1"
00000046 2016-08-22 14:31:32.471 2280 2280 "Loading plugin /opt/HPCCSystems/plugins/libauditlib.so[lib_auditlib] version = AUDITLIB 1.0.1"
00000047 2016-08-22 14:31:32.471 2280 2280 "Loading plugin /opt/HPCCSystems/plugins/libfileservices.so[lib_fileservices] version = FILESERVICES 2.1.3"
00000048 2016-08-22 14:31:32.471 2280 2280 "Loading plugin /opt/HPCCSystems/plugins/liblogging.so[lib_logging] version = LOGGING 1.0.1"
00000049 2016-08-22 14:31:32.471 2280 2280 "Error loading /opt/HPCCSystems/plugins/libRembed.so: libR.so: cannot open shared object file: No such file or directory"
0000004A 2016-08-22 14:31:32.471 2280 2280 "Loading plugin /opt/HPCCSystems/plugins/libdebugservices.so[lib_debugservices] version = DEBUGSERVICES 1.0.1"
0000004B 2016-08-22 14:31:32.471 2280 2280 "Loading plugin /opt/HPCCSystems/plugins/libparselib.so[lib_parselib] version = PARSELIB 1.0.1"
0000004C 2016-08-22 14:31:32.471 2280 2280 "Loading plugin /opt/HPCCSystems/plugins/libunicodelib.so[lib_unicodelib] version = UNICODELIB 1.1.06"
0000004D 2016-08-22 14:31:32.471 2280 2280 "Plugin /opt/HPCCSystems/plugins/libsqlite3embed.so exports getECLPluginDefinition but does not export ECL - not loading"
0000004E 2016-08-22 14:31:32.471 2280 2280 "Loading plugin /opt/HPCCSystems/plugins/libstringlib.so[lib_stringlib] version = STRINGLIB 1.1.14"
0000004F 2016-08-22 14:31:32.471 2280 2280 "Error loading /opt/HPCCSystems/plugins/libjavaembed.so: libjvm.so: cannot open shared object file: No such file or directory"
00000050 2016-08-22 14:31:33.015 2280 2280 "Initializing WsWorkunits_EclWatch_myesp service [process = myesp]"
00000051 2016-08-22 14:31:33.207 2280 2280 "CSmartSocketFactory::CSmartSocketFactory(192.168.20.125:9876)"
00000052 2016-08-22 14:31:33.254 2280 2280 "Load binding WsSMC_smc_myesp (type: ws_smcSoapBinding, process: myesp) succeeded"
00000053 2016-08-22 14:31:33.266 2280 2280 "Load binding WsWorkunits_smc_myesp (type: ws_workunitsSoapBinding, process: myesp) succeeded"
00000054 2016-08-22 14:31:33.278 2280 2280 "Load binding WsTopology_smc_myesp (type: ws_topologySoapBinding, process: myesp) succeeded"
00000055 2016-08-22 14:31:33.283 2280 2280 "Load binding WsDfu_smc_myesp (type: ws_dfuSoapBinding, process: myesp) succeeded"
00000056 2016-08-22 14:31:33.286 2280 2280 "Load binding WsDfuXRef_smc_myesp (type: ws_dfuxrefSoapBinding, process: myesp) succeeded"
00000057 2016-08-22 14:31:33.287 2280 2280 "Load binding ecldirect_smc_myesp (type: EclDirectSoapBinding, process: myesp) succeeded"
00000058 2016-08-22 14:31:33.296 2280 2280 "Load binding FileSpray_Serv_smc_myesp (type: FileSpray_Bind, process: myesp) succeeded"
00000059 2016-08-22 14:31:33.297 2280 2280 "Load binding WsFileIO_smc_myesp (type: WsFileIO, process: myesp) succeeded"
0000005A 2016-08-22 14:31:33.310 2280 2280 "Load binding WsPackageProcess_smc_myesp (type: WsPackageProcessSoapBinding, process: myesp) succeeded"
0000005B 2016-08-22 14:31:33.312 2280 2280 "Load binding ws_machine_smc_myesp (type: ws_machineSoapBinding, process: myesp) succeeded"
0000005C 2016-08-22 14:31:33.313 2280 2280 "Load binding ws_account_smc_myesp (type: ws_accountSoapBinding, process: myesp) succeeded"
0000005D 2016-08-22 14:31:33.314 2280 2280 "Load binding ws_access_smc_myesp (type: ws_accessSoapBinding, process: myesp) succeeded"
0000005E 2016-08-22 14:31:33.314 2280 2280 "Load binding ws_config_smc_myesp (type: ws_configSoapBinding, process: myesp) succeeded"
0000005F 2016-08-22 14:31:33.315 2280 2280 "Load binding ws_ecl_ws_ecl_myesp (type: ws_eclSoapBinding, process: myesp) succeeded"
00000060 2016-08-22 14:31:33.333 2280 2280 "binding WsSMC_smc_myesp, on 0.0.0.0:8010"
00000061 2016-08-22 14:31:33.333 2280 2280 " created server socket(14)"
00000062 2016-08-22 14:31:33.335 2280 2280 " Socket(14) listening."
00000063 2016-08-22 14:31:33.340 2280 2280 "binding WsWorkunits_smc_myesp, on 0.0.0.0:8010"
00000064 2016-08-22 14:31:33.340 2280 2280 "binding WsTopology_smc_myesp, on 0.0.0.0:8010"
00000065 2016-08-22 14:31:33.340 2280 2280 "binding WsDfu_smc_myesp, on 0.0.0.0:8010"
00000066 2016-08-22 14:31:33.340 2280 2280 "binding WsDfuXRef_smc_myesp, on 0.0.0.0:8010"
00000067 2016-08-22 14:31:33.340 2280 2280 "binding ecldirect_smc_myesp, on 0.0.0.0:8010"
00000068 2016-08-22 14:31:33.340 2280 2280 "binding FileSpray_Serv_smc_myesp, on 0.0.0.0:8010"
00000069 2016-08-22 14:31:33.340 2280 2280 "binding WsFileIO_smc_myesp, on 0.0.0.0:8010"
0000006A 2016-08-22 14:31:33.340 2280 2280 "binding WsPackageProcess_smc_myesp, on 0.0.0.0:8010"
0000006B 2016-08-22 14:31:33.340 2280 2280 "binding ws_machine_smc_myesp, on 0.0.0.0:8010"
0000006C 2016-08-22 14:31:33.340 2280 2280 "binding ws_account_smc_myesp, on 0.0.0.0:8010"
0000006D 2016-08-22 14:31:33.340 2280 2280 "binding ws_access_smc_myesp, on 0.0.0.0:8010"
0000006E 2016-08-22 14:31:33.340 2280 2280 "binding ws_config_smc_myesp, on 0.0.0.0:8010"
0000006F 2016-08-22 14:31:33.340 2280 2280 "binding ws_ecl_ws_ecl_myesp, on 0.0.0.0:8002"
00000070 2016-08-22 14:31:33.340 2280 2280 " created server socket(15)"
00000071 2016-08-22 14:31:33.340 2280 2280 " Socket(15) listening."
00000072 2016-08-22 14:31:33.340 2280 2280 "Creating sentinel file esp.sentinel for rerun from script"
00000073 2016-08-22 14:31:33.340 2280 2280 "ESP server started."
00000074 2016-08-22 14:31:43.992 2280 3041 "HTTP First Line: POST /WsSMC/Activity.json HTTP/1.1"
00000075 2016-08-22 14:31:43.992 2280 3041 "POST /WsSMC/Activity.json, from unknown@192.168.20.72"
00000076 2016-08-22 14:31:44.004 2280 3041 "CWsSMCEx::getActivityInfo - rebuild cached information"
00000077 2016-08-22 14:31:44.024 2280 3041 "Time taken for createActivityInfo: 46570230 cycles (46M) =

let me know if you need anymore information

Thanks again

Best Regards

Antony
amillar
 
Posts: 16
Joined: Fri Oct 16, 2015 7:32 am

Wed Aug 31, 2016 2:50 pm Change Time Zone

Hi Antony,

I would recommend that you install gmetad on any EclWatch server that is internal to the cluster that you want to view graphs on. In larger environments it would be expected to have multiple gmetad services running, often arranged in hierarchical structure. The gmetad service running internally in your cluster can be viewed as a convenient way to allow for metrics to be viewed from within EclWatch. The external monitoring node(s) likely wouldn’t have HPCC installed, and would use the ganglia web interface to surface metrics to the user, including the HPCC node and Roxie metrics. In our HPCC VM download you can view some basic HPCC customizations to the ganglia web interface.

If you do a remote mount approach, you will have to mount it to the default path for EclWatch to pull up the graphs (helpful link I found on remote mounts: https://www.digitalocean.com/community/ ... untu-14-04).

The esp.log would only contain entries relating to ganglia when initially binding the ws_rrd service at startup and when the ganglia graphs are generated for users in EclWatch. ws_rrd is the ESP service responsible for displaying ganglia graphs in EclWatch. Gmond and gmetad would have their own log files, but I found just examining the gmond and gemetad traffic helpful in debugging connectivity issues.
Gleb Aronsky
 
Posts: 22
Joined: Fri Feb 10, 2012 1:49 pm

Next

Return to System Health

Who is online

Users browsing this forum: No registered users and 1 guest

cron