Sat Nov 27, 2021 2:58 am
Login Register Lost Password? Contact Us


From VM to separate Linux box

Post questions specific to installation or configuration for the HPCC Systems platform

Wed Sep 28, 2011 1:35 pm Change Time Zone

Hi, all:

I was using the VM version of HPCC and got it to work.
Now I installed a version of HPCC on a separate linux
box. Questions related to this:

1. How do I tell the new box to use the large disk on /dev/hdb1
for the DFU?
2. How do I know which system the ECL IDE environment is pointed at
and repoint it?

Thanks,

Vic Kovacs
kovacsbv
 
Posts: 33
Joined: Fri Aug 05, 2011 12:53 pm

Fri Sep 30, 2011 6:31 pm Change Time Zone

Vic,

there are a couple different ways to tell HPCC to use the other drive for the data store, after you formatted the device and mounted it under, let's say, /mnt/large_drive:

1. You can run configmgr (/opt/HPCCSystems/sbin/configmgr) and use a graphical web browser interface (http://IP_address:8015/) to change the data and temp directories (under software->directories) to point at the new location (/mnt/large_drive/ in this case). Since configmgr saves the environment.xml file to /etc/HPCCSystems/source/environment.xml, please don't forget to copy /etc/HPCCSystems/source/environment.xml to /etc/HPCCSystems/environment.xml and restart the platform (/etc/init.d/hpcc-init restart);

2. or you could just move your /var/lib/HPCCSystems directory tree to the new drive and use a symbolic link (ln -s /var/lib/HPCCSystems /mnt/large_drive) to tell HPCC to use the new drive instead (you'll need to restart the HPCC environment too).

ECL IDE defines the IP address (or hostname) of the cluster within the preferences window, and allows you to have even several of clusters defined to quickly switch between them as needed. The preferences section can be accessed from the ribbon bar (icon on the top left corner) or upon restarting ECL IDE (button at the bottom of the login window).

I hope this helps,

Flavio
flavio
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 73
Joined: Wed Apr 27, 2011 8:59 pm

Tue Oct 04, 2011 1:30 pm Change Time Zone

Thanks, Flavio. It helped a lot.

After fixing a few goofs, like having the system clock 3 hours off and creating a mount point that only root had access to (it's owned by hpcc:hpcc now), I ended up getting soap errors when trying to log in.

Looking at the thor logs (which was the only log file I could see that had errors), it seems thor is shutting down.

Code: Select all
root@LAB-HPCC-01:/var/log/HPCCSystems# tail ./mythor/10_03_2011_06_44_58/THORMASTER.log
000000E0 2011-10-03 12:52:05 24575  3019 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=0.6 kw/s=2.4 bsy=0 [sdb] r/s=0.0 kr/s=0.0 w/s=0.0 kw/s=0.0 bsy=0 NIC: rxp/s=2.4 rxk/s=0.3 txp/s=0.7 txk/s=
000000E1 2011-10-03 12:53:05 24575  3019 SYS: PU=  0% MU=  1% MAL=253824 MMP=0 SBK=253824 TOT=364K RAM=275344K SWP=0K
000000E2 2011-10-03 12:53:05 24575  3019 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=0.5 kw/s=2.1 bsy=0 [sdb] r/s=0.0 kr/s=0.0 w/s=0.0 kw/s=0.0 bsy=0 NIC: rxp/s=2.1 rxk/s=0.3 txp/s=0.7 txk/s=
000000E3 2011-10-03 12:54:05 24575  3019 SYS: PU=  0% MU=  1% MAL=253824 MMP=0 SBK=253824 TOT=364K RAM=275564K SWP=0K
000000E4 2011-10-03 12:54:05 24575  3019 DSK: [sda] r/s=0.0 kr/s=0.0 w/s=0.7 kw/s=2.7 bsy=0 [sdb] r/s=0.0 kr/s=0.0 w/s=0.0 kw/s=0.0 bsy=0 NIC: rxp/s=4.1 rxk/s=0.7 txp/s=0.7 txk/s=
000000E5 2011-10-03 12:54:06 24575 24575 1: /var/jenkins/workspace/Release-3.2.0/src/dali/base/daclient.cpp(201) : CSDSServerStatus::stop : MP connect failed (138.12.249.27:7070)
000000E6 2011-10-03 12:54:06 24575 24575 Thor closing down 6
000000E7 2011-10-03 12:54:06 24575 24575 Thor closing down 5
000000E8 2011-10-03 12:54:06 24575 24575 Thor closing down 4
000000E9 2011-10-03 12:54:06 24575 24575 Thor closing down 3
kovacsbv
 
Posts: 33
Joined: Fri Aug 05, 2011 12:53 pm

Tue Oct 04, 2011 2:07 pm Change Time Zone

Vic,

It's hard to say by the log fragment that you posted above, but it seems that Dali is not running (and that's why Thor cannot start). My guess is that either a mount point is missing or is read-only, or a corresponding directory within that mount point is missing.

Can you please double check that?

Thanks,

Flavio
flavio
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 73
Joined: Wed Apr 27, 2011 8:59 pm

Tue Oct 04, 2011 3:16 pm Change Time Zone

Ok,

So here's the short of the story:
I look at the dali log, and see that it can't open port 7070 because it's in use.
Then I shut down everything with /etc/init.d/hpcc-init stop
Then I do a netstat -pln and find a number of ports, including dali's 7070 is still open.
The logs for dali give an error that the port is still open (no surprise).

So, the question is do I kill all the remaining processes that have ports open?
The only thing I have open other than hpcc-related things is ipv4-ssh; could you take a look at the netstat below and determine how I shut down remaining processes?

Thanks,

Vic

Code: Select all
root:/var/log/HPCCSystems/mydali/server# tail DaServer.log
00000006 2011-10-04 10:05:20 11279 11279 "loading store 1, storedCrc=343383a"
00000007 2011-10-04 10:05:20 11279 11279 "Loading delta: /mnt/hpcc_storage/HPCCSystems/hpcc-data/dali/daliinc1.xml"
00000008 2011-10-04 10:05:20 11279 11279 "store loaded"
00000009 2011-10-04 10:05:20 11279 11279 "loading external Environment from: /etc/HPCCSystems/environment.xml"
0000000A 2011-10-04 10:05:20 11279 11279 "Scanning store for external references"
0000000B 2011-10-04 10:05:20 11279 11279 "External reference count = 0"
0000000C 2011-10-04 10:05:20 11279 11279 "DASERVER[0] starting - listening to port 7070"
0000000D 2011-10-04 10:05:20 11279 11279 "ERROR: -7: /var/jenkins/workspace/Release-3.2.0/src/dali/server/daserver.cpp(465) : Exception : port in use
Target: S>138.12.249.27, port = 7070, Raised in: /var/jenkins/workspace/Release-3.2.0/src/system/jlib/jsocket.cpp, line 869"
0000000E 2011-10-04 10:05:20 11279 11286 "BackupHandler stopped"



root:/var/log/HPCCSystems/mydali/server# netstat -pln
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:7360            0.0.0.0:*               LISTEN      19448/dfuserver
tcp        0      0 0.0.0.0:6500            0.0.0.0:*               LISTEN      12178/thormaster_65
tcp        0      0 0.0.0.0:7205            0.0.0.0:*               LISTEN      11628/eclscheduler
tcp        0      0 0.0.0.0:6600            0.0.0.0:*               LISTEN      12175/thorslave_660
tcp        0      0 0.0.0.0:8877            0.0.0.0:*               LISTEN      11871/saserver
tcp        0      0 0.0.0.0:7245            0.0.0.0:*               LISTEN      11391/dfuserver
tcp        0      0 0.0.0.0:7118            0.0.0.0:*               LISTEN      11470/agentexec
tcp        0      0 0.0.0.0:7409            0.0.0.0:*               LISTEN      17339/eclccserver
tcp        0      0 0.0.0.0:9876            0.0.0.0:*               LISTEN      11785/roxie
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      849/sshd
tcp        0      0 0.0.0.0:7288            0.0.0.0:*               LISTEN      11549/eclccserver
tcp        0      0 0.0.0.0:7418            0.0.0.0:*               LISTEN      11706/esp
tcp        0      0 0.0.0.0:7164            0.0.0.0:*               LISTEN      11785/roxie
tcp        0      0 0.0.0.0:7070            0.0.0.0:*               LISTEN      23686/daserver
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node   PID/Program name    Path
unix  2      [ ACC ]     STREAM     LISTENING     3154     1/init              @/com/ubuntu/upstart




root:/var/log/HPCCSystems/mydali/server# /etc/init.d/hpcc-init stop
Stopping mythor...             [  OK  ]
Stopping mysasha...            [  OK  ]
Stopping myroxie...            [  OK  ]
Stopping myesp...              [  OK  ]
Stopping myeclscheduler...     [  OK  ]
Stopping myeclccserver...      [  OK  ]
Stopping myeclagent...         [  OK  ]
Stopping mydfuserver...        [  OK  ]
Stopping mydali...             [FAILED]
Already Stopped



root:/var/log/HPCCSystems/mydali/server# netstat -pln
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:7360            0.0.0.0:*               LISTEN      19448/dfuserver
tcp        0      0 0.0.0.0:7245            0.0.0.0:*               LISTEN      11391/dfuserver
tcp        0      0 0.0.0.0:7409            0.0.0.0:*               LISTEN      17339/eclccserver
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      849/sshd
tcp        0      0 0.0.0.0:7288            0.0.0.0:*               LISTEN      11549/eclccserver
tcp        0      0 0.0.0.0:7070            0.0.0.0:*               LISTEN      23686/daserver
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node   PID/Program name    Path
unix  2      [ ACC ]     STREAM     LISTENING     3154     1/init              @/com/ubuntu/upstart
kovacsbv
 
Posts: 33
Joined: Fri Aug 05, 2011 12:53 pm

Tue Oct 04, 2011 3:21 pm Change Time Zone

Can you run a "netstat -tanp | grep 7070" as root to see which process is bound to 7070?

It may just be that the port is in time-wait or fin-wait mode and that you need to wait for a minute or so before trying to restart Dali.

The netstat above will tell you for sure if something is still bound to that port.

Flavio
flavio
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 73
Joined: Wed Apr 27, 2011 8:59 pm

Tue Oct 04, 2011 3:38 pm Change Time Zone

Here it is.

I did wait about 20 minutes and re-ran netstat, but I didn't notice any ports going away.

Code: Select all
root:/var/log/HPCCSystems/mydali/server# netstat -tanp | grep 7070
tcp        0      0 0.0.0.0:7070            0.0.0.0:*               LISTEN      23686/daserver
tcp        0      1 138.12.249.27:56151     138.12.249.27:7070      SYN_SENT    17339/eclccserver
tcp        0      1 138.12.249.27:56150     138.12.249.27:7070      SYN_SENT    19448/dfuserver
tcp        0      1 138.12.249.27:56149     138.12.249.27:7070      SYN_SENT    11549/eclccserver
tcp        0      1 138.12.249.27:56148     138.12.249.27:7070      SYN_SENT    11391/dfuserver
kovacsbv
 
Posts: 33
Joined: Fri Aug 05, 2011 12:53 pm

Tue Oct 04, 2011 4:54 pm Change Time Zone

It seems that daserver is listening to 7070 already. And is thor still unable to start? There there may be something else going on...

Do you have space in that filesystem?

Thanks,

Flavio
flavio
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 73
Joined: Wed Apr 27, 2011 8:59 pm

Tue Oct 04, 2011 5:32 pm Change Time Zone

It looks like there is room. This was an newly installed
empty drive (see the df -h below).

Keep in mind that this is a partially shut down HPCC.
I did the "/etc/init.d/hpcc-init stop" and haven't restarted
it. It's curious that HPCC is shut down but still
has ports open and processes running.


The question was whether I should kill the existing processes,
or if there is a particular order, or if there is a "warm
start" I can do to keep the pid/lock files intact, etc.

If you want, I can do a start too. Dali always seems to wait
about one IP timeout before starting.

I also su'd to hpcc and made/deleted a directory to make
sure it could write in the filesystem.

Another question: our system was tightened down to the
point that the ethernet interface can't reach itself.
This has been opened up in iptables, but pinging the
ethernet interface from itself still doesn't work.

Code: Select all
root@LAB-HPCC-01:/var/log/HPCCSystems/mydali/server# su hpcc
$ cd /mnt/
$ ls -l
total 4
drwxr-xr-x 4 hpcc hpcc 4096 2011-09-30 12:48 hpcc_storage
$ cd hpcc_storage
$ ls -l
total 20
drwxr-xr-x 7 hpcc hpcc  4096 2011-10-03 07:27 HPCCSystems
drwx------ 2 hpcc hpcc 16384 2011-09-26 09:59 lost+found
$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/LAB--HPCC--01-root
                      220G  1.2G  208G   1% /
none                  1.9G  260K  1.9G   1% /dev
none                  1.9G  4.0K  1.9G   1% /dev/shm
none                  1.9G  116K  1.9G   1% /var/run
none                  1.9G     0  1.9G   0% /var/lock
none                  1.9G     0  1.9G   0% /lib/init/rw
/dev/sda1             228M   20M  197M   9% /boot
/home/kovacsvx/.Private
                      220G  1.2G  208G   1% /home/kovacsvx
/dev/sdb1             1.8T  204M  1.7T   1% /mnt/hpcc_storage
$ mkdir bleck
$ ls -l
total 24
drwxr-xr-x 2 hpcc hpcc  4096 2011-10-04 13:12 bleck
drwxr-xr-x 7 hpcc hpcc  4096 2011-10-03 07:27 HPCCSystems
drwx------ 2 hpcc hpcc 16384 2011-09-26 09:59 lost+found
$ rmdir bleck
$

kovacsbv
 
Posts: 33
Joined: Fri Aug 05, 2011 12:53 pm

Tue Oct 04, 2011 6:01 pm Change Time Zone

Vic,

Just feel free to kill the existing processes (or try issuing another "/etc/init.d/hpcc-init stop") and restart it.

Thanks,

Flavio
flavio
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 73
Joined: Wed Apr 27, 2011 8:59 pm

Next

Return to Installation

Who is online

Users browsing this forum: No registered users and 1 guest

cron