Thu Jul 18, 2019 11:42 am
Login Register Lost Password? Contact Us


HPCC on Chromebook

Topics related to alternative architectures including ARM and similar

Tue Jan 22, 2013 6:09 pm Change Time Zone

Hi Folks,

I've been trying to compile HPCC on a Chromebook. Just FYI, the Chromebook is a dual core A15 1.7GHz (Exynos 5) with 2GB RAM, which is about 4x faster than the Panda's dual A9 chip.

So, I got Ubuntu running on it (ChrUbuntu) and downloaded the HPCC's sources to it. Since it's Ubuntu, all packages are available via apt-get and getting it to configure is not hard.

The only package that was unavailable on ChrUbuntu was Xalan/Xerces, but I got them from the Panda's Ubuntu. Richard told me I could have used libxslt, which IS available in ChrUbuntu. If you're trying this at home, it might be easier, since HPCC will eventually use it by default on all platforms.

Typing "make" on the build directory gets you to about 13% (according to CMake) when two errors occur:

In src/system/jlib/jexcept.cpp:
In function 'void excsighandler(int, siginfo_t*, void*)':
error: 'mcontext_t' has no member named 'gregs'
error: 'REG_EIP' was not declared in this scope
error: 'mcontext_t' has no member named 'gregs'
error: 'REG_ESP' was not declared in this scope
...
REG_EAX, REG_EBX, REG_ECX etc.

Clearly, x86 register declarations.

In src/system/jlib/jdebug.cpp:
In function 'void calibrate_timing()':
error: impossible constraint in 'asm'

Clearly, x86 assembly on timing functions.

There are two ways to go, here:
1. Replicate each instance that assumes x86 and add ARM assembly/directives, with appropriate #ifdef __arm__ wrapping around.
2. Find libraries or function calls on the kernel that do the same without having to delve into specific code for specific architectures.

The way the code is structured today, solution 1 would be the easiest to add, but the hardest to maintain. Remember that ARM 64-bits (AArch64) is just around the corner and there are lots of changes.

However, finding an exact match on all architectures x all OSs is not an easy task, and might not have a generic solution anyway.

The jexcept seems to be registering an exception handling and printing pretty much what the standard library already does, and looks like redundant code. Can this be removed?

The jdebug issue can possibly be fixed by skipping that section altogether and proceed directly to the calibration.

I'll have a look at removing those parts of the code. Stay tuned! ;)
rengolin
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 12
Joined: Thu Aug 18, 2011 2:43 pm

Tue Jan 22, 2013 8:18 pm Change Time Zone

Another thing to remember when compiling on ARM: there is a va_list change in the ABI, and GCC warns on every single file that includes another with a va_list. In HPCC's case, that's pretty much all files.

So, configure your CMake with:
Code: Select all
-DCMAKE_CXX_FLAGS:String=-Wno-psabi


Regarding the two issues raised:
  • Timing Accuracy
    I've been reading a bit and the RDTSC required for timing is something that used to be important in the 90's, not so much nowadays. On ARM, the delay of interruptions won't be more than a few microseconds, and on x86 even less, so the timing really does not need to account for that, any more.

    I've changed the code to return zero on ARM. One could try to use the configuration register CP15 to fiddle with some values and time the vector catch, but I guess it's irrelevant to most operations HPCC will do.
  • Exception Handler
    Since the exception handler is very close to what the standard handler already does (print stack and registers) and that version is heavily relying on x86 architecture (register banks, macros, etc), I decided to play safe and commented out the whole section in case of ARM (with #ifndef __arm__).

After these two fixes, the compilation goes on until 39%, when the hqlfold meets another x86 asm region. This is the one I was waiting for, and the one that will cause serious problems.

This is the part of the code that folds external calls (IValue * foldExternalCall()), where the external call will be called using the base C Procedure Call Standard (PCS), which is to populate the stack for the function arguments.

Intel's x86 architecture used that extensively, and most architectures use it for variadic functions, but newer architectures (such as x86_64, AArch32, AArch64) use registers to pass the parameters.

It's much faster for functions with small number of parameters, but all architectures revert back if the function has more than a handful of them. This means following the ABI is not trivial.

There are two ways of fixing this:

  • Implement ARM PCSs
    It looks the quick way out, but ARM has multiple PCSs to choose from. APCS is the old one, and could be ignored by HPCC (not sure we want to run on a 7TDMI), AAPCS is the main one, in use by Cortex-A* family, and there's also the new 64-bit version, that HPCC ought to support if it want's to stay relevant in the next 5 years on ARM. ;)

    Not sure how relevant a Windows build would be for ARM, and they may very well have their own PCSs. The AAPCS is defined for bare-metal applications and GNU EABI follows it into the Linux world.
  • Use a standard RPC library
    This looks the clean way out, but most library functions would have to be re-written to support the particular flavour of RPC chosen.

    Also, RPC is OS-dependent. Windows use DCOM or on .NET, whatever they have invented next. Linux has RPC, which is standard in most Unices, I think. HPCC would have to support both (or ask developers to stop using Visual Studio ;)).

Both solutions are hard work, and both will take a while to complete. If I could chose, I'd chose standard RPC.

For the moment, I'll comment out the external folding and continue...
rengolin
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 12
Joined: Thu Aug 18, 2011 2:43 pm

Tue Jan 22, 2013 9:09 pm Change Time Zone

After those changes, the system compiled to completion, and a package was generated and installed correctly.

I haven't had a chance to test, since I upgraded a few packages and rebooted, only to find that I forgot the lid closed, in the server room, at the office. :(

When I open the lid again, and re-start, I'll report on the usability.
rengolin
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 12
Joined: Thu Aug 18, 2011 2:43 pm

Wed Jan 23, 2013 12:35 pm Change Time Zone

Renato, this is great work and quite a milestone, as it would be the first non x86 32-64 bits based platform that HPCC is being ported to.

Unfortunately, there are still no 64 bits ARM CPU's out there yet, but with the announcements of the Cortex A53 and A57, and a number of hardware manufacturers planning on supporting the A64 instruction set in upcoming silicon, 64 bits capable parts can't be that far.

Keep up the good work, and let us know how the port effort goes!

Flavio
flavio
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 73
Joined: Wed Apr 27, 2011 8:59 pm

Wed Jan 23, 2013 3:39 pm Change Time Zone

Flavio, ARM 64 is just around the corner, with AMD and Dell actively working on it, it might not be too long until we have a board to play with. I'd call it Apple Pie (PoPeye, the strong sailor). :)

Anyway, back to business. The ECL compiler segfaults, as Calxeda reported, probably due to the hack on external folding. I need debug symbols to know what's wrong, though.

Starting the services go well until it reaches Thor, when it fails on the Roxie memory manager:

Code: Select all
Build community_3.11.0-1trunk[heads/arm-hack-0-g33f624-dirty]
calling initClientProcess Port 20000
Found file 'thorgroup', using to form thor group
Global memory size = 1514 MB
RoxieMemMgr: Setting memory limit to 1587544064 bytes (1514 pages)
RoxieMemMgr: posix_memalign (alignment=1048576, size=1610612736) failed - ret=12
/home/user/devel/hpcc/src/thorlcr/master/thmastermain.cpp(714) : ThorMaster : RoxieMemMgr: Unable to create heap


Funny that the Chromebook has 2GB of RAM, of which 1.2GB are free. I don't know where this 1514MB number comes from...

Code: Select all
$ free
             total       used       free     shared    buffers     cached
Mem:       2067736     831660    1236076          0      54164     491552
-/+ buffers/cache:     285944    1781792
Swap:            0          0          0


That line (thmastermain.cpp:714) is the last catch on main(), on a large block of code, so it's hard to trace without debug symbols. I'll re-compile in debug mode, so I can run the unittests, which I think will catch the bug, too.

However, oddly, restarting again works!

Code: Select all
Global memory size = 1514 MB
RoxieMemMgr: Setting memory limit to 1587544064 bytes (1514 pages)
RoxieMemMgr: 1536 Pages successfully allocated for the pool - memsize=1610612736 base=0x14d00000 alignment=1048576 bitmapSize=48


Unfortunately, I can't run any job on it now, since the compiler is not producing any binary output (though, it does generate C++ and the sources look correct, which indicates it's not the lack of external folding).

Next step: Debug symbols!

On a side note, trying to run the regression, some of the Python modules are not available as an Ubuntu package, but installing them via CPAN was a breeze.
rengolin
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 12
Joined: Thu Aug 18, 2011 2:43 pm

Wed Jan 23, 2013 11:20 pm Change Time Zone

Wow! You're quickly making good progress!

I'll ping Jake so that he can take a look at the 1.5GB number (I believe it comes from the auto-detection code and/or config file).

Flavio
flavio
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 73
Joined: Wed Apr 27, 2011 8:59 pm

Thu Jan 24, 2013 12:05 am Change Time Zone

Oh, well... That can't be right... ;)

Code: Select all
at src/ecl/hqlcpp/hqlres.cpp:423


Code: Select all
        bfd_init ();
        bfd_set_default_target(target64bit ? "x86_64-unknown-linux-gnu" : "x86_32-unknown-linux-gnu");
        const bfd_arch_info_type *temp_arch_info = bfd_scan_arch ("i386");
#if defined __APPLE__
        file = bfd_openw(filename, NULL);//MORE: Quick fix to get working on OSX
#else
        file = bfd_openw(filename, target64bit ? "elf64-x86-64" : NULL);//MORE: Test on 64 bit to see if we can always pass NULL
#endif
        verifyex(file);
        verifyex(bfd_set_arch_mach(file, temp_arch_info->arch, temp_arch_info->mach));


Here's the debug trace...

Code: Select all
(gdb) run
Starting program: /usr/bin/eclcc superfile5.ecl
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x7694d8ec in ResourceManager::flush (this=0xb9cd8, filename=0x116668 "liba.out.res.o.so", flushText=false, target64bit=false)
    at /home/user/devel/hpcc/src/ecl/hqlcpp/hqlres.cpp:423
423           verifyex(bfd_set_arch_mach(file, temp_arch_info->arch, temp_arch_info->mach));
(gdb) p temp_arch_info->arch
Cannot access memory at address 0xc
(gdb) p temp_arch_info
$1 = (const bfd_arch_info_type *) 0x0
(gdb) bt
#0  0x7694d8ec in ResourceManager::flush (this=0xb9cd8, filename=0x116668 "liba.out.res.o.so", flushText=false, target64bit=false)
    at /home/user/devel/hpcc/src/ecl/hqlcpp/hqlres.cpp:423
#1  0x7687849c in HqlCppInstance::flushResources (this=0xb9c90, filename=0x24a308 "a.out.res.o", ctxCallback=0x115528)
    at /home/user/devel/hpcc/src/ecl/hqlcpp/hqlcpp.cpp:1326
#2  0x768cf0ec in HqlDllGenerator::flushResources (this=0xa1ee8) at /home/user/devel/hpcc/src/ecl/hqlcpp/hqlecl.cpp:562
#3  0x768cd506 in HqlDllGenerator::processQuery (this=0xa1ee8, parsedQuery=..., _generateTarget=EclGenerateExe) at /home/user/devel/hpcc/src/ecl/hqlcpp/hqlecl.cpp:216
#4  0x00015396 in EclCC::instantECL (this=0x7efff20c, instance=..., wu=0x51648, queryFullName=0x0, errs=0x5db58, outputFile=0x46470 "a.out")
    at /home/user/devel/hpcc/src/ecl/eclcc/eclcc.cpp:635
#5  0x00016670 in EclCC::processSingleQuery (this=0x7efff20c, instance=..., queryContents=0x7fd58, queryAttributePath=0x5d6b8 "superfile5")
    at /home/user/devel/hpcc/src/ecl/eclcc/eclcc.cpp:965
#6  0x0001746e in EclCC::processFile (this=0x7efff20c, instance=...) at /home/user/devel/hpcc/src/ecl/eclcc/eclcc.cpp:1165
#7  0x0001855e in EclCC::processFiles (this=0x7efff20c) at /home/user/devel/hpcc/src/ecl/eclcc/eclcc.cpp:1403
#8  0x00013e8c in doMain (argc=2, argv=0x7efff4a4) at /home/user/devel/hpcc/src/ecl/eclcc/eclcc.cpp:358
#9  0x00013ff8 in main (argc=2, argv=0x7efff4a4) at /home/user/devel/hpcc/src/ecl/eclcc/eclcc.cpp:387
rengolin
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 12
Joined: Thu Aug 18, 2011 2:43 pm

Thu Jan 24, 2013 2:46 pm Change Time Zone

The libbfd code is used to attach the workunit info (graph) to the workunit .so

Unfortunately the library it relies on seems to be very variable from distro to distro - I've not been able to get it fully working on OSX either, though I really should go back and revisit that sometime.

If you can think of a more portable way to achieve the same result, I'd be interested.

I wouldn't get too hung up on the constant-folding of plugin calls issue. The system can be used quite happily without that ability - sol long as you don't want to be able to do things like #if (stringlib.trim(myoption) = 'I). In fact, I'm quite tempted to deprecate that ability in general.
richardkchapman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 109
Joined: Fri Jun 17, 2011 8:59 am

Thu Jan 24, 2013 2:49 pm Change Time Zone

rengolin wrote:Starting the services go well until it reaches Thor, when it fails on the Roxie memory manager:

However, oddly, restarting again works!


I think it will need 1.5Gb contiguous address space. You can set the amount it uses in the environment.xml - I think it defaults to 75% of total ram (on the expectation you are running thor on dedicated servers...)
richardkchapman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 109
Joined: Fri Jun 17, 2011 8:59 am


Return to Alternative Architectures

Who is online

Users browsing this forum: No registered users and 0 guests