Special Structures

BEGINC++ Structure

resulttype funcname ( parameterlist ) := BEGINC++

code

ENDC++;

resulttypeThe ECL return value type of the C++ function.
funcname

The ECL definition name of the function.

parameterlistA comma separated list of the parameters to pass to the function.
codeThe C++ function source code.

The BEGINC++ structure makes it possible to add in-line C++ code to your ECL. This is useful where string or bit processing would be complicated in ECL, and would be more easily done in C++, typically for a one-off use. For more commonly used C++ code, writing a plugin would be a better solution (see the External Service Implementation discussion).

The implementation must be written to be thread safe and any calls to external libraries must be made to thread safe versions of those libraries.

You can use EMBED instead of BEGINC++ to embed C++ code and specify additional options (for example, DISTRIBUTED) using this form:

myFunction(string name) := EMBED(C++ [: options]) 
  ... text 
ENDEMBED

WARNING: This feature could create memory corruption and/or security issues, so great care and forethought are advised--consult with Technical Support before using.

ECL to C++ Mapping

Types are passed as follows:

//The following typedefs are used below:
typedef unsigned size32_t;
typedef wchar_t UChar; [ unsigned short in linux ]

The following list describes the mappings from ECL to C++. For embedded C++ the parameters are always converted to lower case, and capitalized in conjunctions (see below).

ECL                  C++ [Linux in brackets]
BOOOLEAN xyz         bool xyz
INTEGER1 xyz         signed char xyz
INTEGER2 xyz         int16_t xyz
INTEGER4 xyz         int32_t xyz
INTEGER8 xyz         signed __int64 xyz [ long long ]
UNSIGNED1 xyz        unsigned char xyz
UNSIGNED2 xyz        uint16_t xyz
UNSIGNED4 xyz        uint32_t xyz
UNSIGNED8 xyz        unsigned __int64 xyz [ unsigned long long xyz ]
REAL4 xyz            float xyz
REAL/REAL8 xyz       double xyz
DATA xyz             size32_t lenXyz, void * xyz
STRING xyz           size32_t lenXyz, char * xyz
VARSTRING xyz        char * xyz;
QSTRING xyz          size32_t lenXyz, char * xyz
UNICODE xyz          size32_t lenXyz, UChar * xyz
VARUNICODE xyz       UChar * xyz
DATA<nn> xyz         void * xyz
STRING<nn> xyz       char * xyz
QSTRING<nn> xyz      char * xyz
UNICODE<nn> xyz      UChar * xyz
SET OF ... xyz       bool isAllXyz, size32_t lenXyz, void *  xyz

Note that strings of unknown length are passed differently from those with a known length. A variable length input string is passed as a number of characters, not the size (i.e. qstring/unicode), followed by a pointer to the data, like this (size32_t is an UNSIGNED4):

STRING  ABC -> size32_t lenAbc, const char * abc;
UNICODE ABC -> size32_t lenABC, const UChar * abc;

A dataset is passed as a size/pointer pair. The length gives the size of the following dataset in bytes. The same naming convention is used:

DATASET(r)              ABC -> size32_t lenAbc, const void * abc
  The rows are accessed as x+0, x + length(row1), x + length(row1) + length(row2)

LINKCOUNTED DATASET(r)  ABC -> size32_t countAbc, const byte * * abc
  The rows are accessed as x[0], x[1], x[2]

NOTE: variable length strings within a record are stored as a 4 byte number of characters, followed by the string data.

Sets are passed as a set of parameters (all, size, pointer):

SET OF UNSIGNED4 ABC -> bool isAllAbc, size32_t lenAbc, const void * abc

Return types are handled as C++ functions returning the same types with some exceptions. The exceptions have some extra initial parameters to return the results in:

ECL                C++ [Linux in brackets]
DATA xyz           size32_t & __lenResult, void * & __result
STRING xyz         size32_t & __lenResult, char * & __result
CONST STRING xyz   size32_t lenXyz, const char * xyz
QSTRING xyz        size32_t & __lenResult, char * & __result
UNICODE xyz        size32_t & __lenResult, UChar * & __result
CONST UNICODE xyz  size32_t & __lenResult, const UChar * & __result
DATA<nn> xyz       void * __result
STRING<nn> xyz     char * __result
QSTRING<nn> xyz    char * __result
UNICODE<nn> xyz    UChar * __result
SET OF ... xyz     bool __isAllResult, size32_t &  __lenResult, void * & __result

DATASET(r)         size32_t & __lenResult, void * & __result

LINKCOUNTED DATASET(r)
                   size32_t & __countResult, byte * * & __result

STREAMED DATASET(r) 
                   returns a pointer to an IRowStream interface 
                   (see the eclhelper.hpp include file for the definition)

For example,

STRING process(STRING value, INTEGER4 len)

has the prototype:

void process(size32_t & __lenResult, char * & __result,
             size32_t lenValue, char * value, int len);

A function that takes a string parameter should also have the type prefixed by const in the ECL code so that modern compilers don't report errors when constant strings are passed to the function.

BOOLEAN isUpper(const string mystring) := BEGINC++
  size_t i=0;
  while (i < lenMystring)
  {
    if (!isupper((byte)mystring[i]))
        return false;
    i++;
  }
  return true;
ENDC++;
isUpper('JIM');

Parameters can also include streamed datasets.

If stream is specified on the dataset then the parameter is passed as an IRowStream. The next row from the dataset is obtained by calling:

dataset->nextRow(); 

After it has been processed the row must be freed by calling

rtlReleaseRow(next); 

For example:

traceDataset(STREAMED DATASET(r) ds, BOOLEAN isLocal = FALSE) := EMBED(C++)
#include <stdio.h>
#body
  for(;;)
  {
    const byte * next = (const byte *)ds->nextRow();
    if (!next)
      return;
    unsigned __int64 id = *(__uint64 *)(next);
    size32_t lenName = *(size32_t *)(next + sizeof(__uint64));
    const char * name = (char *)(next + sizeof(__uint64) + sizeof(size32_t));
    printf("id(%u) name(%.*s)\n", (unsigned)id, lenName, name);
    rtlReleaseRow(next);
  }
ENDEMBED;

If the result of a c++ function is a streamed dataset, then it needs to return an instance of an IRowStream interface. The function will also be passed an extra implicit parameter:

IEngineRowAllocator * _resultAllocator

which is used to allocate the rows that are returned from the function.

For example:

// This function takes two streamed inputs and outputs the result of two values 
// from the left multiplied together and added to a row from the right.

STREAMED DATASET(r) myDataset(STREAMED DATASET(r) ds1, STREAMED DATASET(r) ds2)
  := EMBED(C++ : activity)
#include <stdio.h>
#body
    class MyStreamInlineDataset : public RtlCInterface, implements IRowStream
    {
    public:

        MyStreamInlineDataset(IEngineRowAllocator * _resultAllocator, IRowStream * _ds1, 
                              IRowStream * _ds2)
          : resultAllocator(_resultAllocator), ds1(_ds1), ds2(_ds2)
        {
        }
        RTLIMPLEMENT_IINTERFACE        virtual const void *nextRow() override
        {
            const byte * next1a = (const byte *)ds1->nextRow();
            if (!next1a)
                return nullptr;
            const byte * next1b = (const byte *)ds1->nextRow();
            const byte * next2 = (const byte *)ds2->nextRow();
            if (!next1b || !next2)
                rtlFailUnexpected();
            unsigned __int64 value1a = *(const unsigned __int64 *)next1a;
            unsigned __int64 value1b = *(const unsigned __int64 *)next1b;
            unsigned __int64 value2 = *(const unsigned __int64 *)next2;
            rtlReleaseRow(next1a);
            rtlReleaseRow(next1b);
            rtlReleaseRow(next2);
            
            unsigned __int64 result = value1a * value1b + value2;
            RtlDynamicRowBuilder rowBuilder(resultAllocator);
            byte * row = rowBuilder.getSelf();
            *(__uint64 *)(row) = result;
            return rowBuilder.finalizeRowClear(sizeof(unsigned __int64));
        }
        virtual void stop() override
        {
            ds1->stop();
            ds2->stop();
        }
    protected:
        Linked<IEngineRowAllocator> resultAllocator;
        IRowStream * ds1;
        IRowStream * ds2;
    };    return new MyStreamInlineDataset(_resultAllocator, ds1, ds2);
ENDEMBED;

Note: If the resulting row does not have a fixed size, you should call:

byte * row = rowBuilder.ensureCapacity(<totalSize>, nullptr); 

instead of:

byte * row = rowBuilder.getSelf(); 

This code uses a RtlDynamicRowBuilder which is a class used by the code generator. Instead of using the RtlDynamicRowBuilder class, you could directly call resultAllocator->createRow().

When a data type is included in an input row, rather than being passed as a parameter, the format is the same as the parameters, except that instead of having a pointer to the string etc., the string follows the 4-byte length. The data in the row is not aligned; that is, it has packing of 1.

Available Options

#option pureBy default, embedded C++ functions are assumed to have side-effects, which means the generated code won't be as efficient as it might be since the calls can't be shared. Adding #option pure inside the embedded C++ code causes it to be treated as a pure function without side effects.
#option onceIndicates the function has no side effects and is evaluated at query execution time, even if the parameters are constant, allowing the optimizer to make more efficient calls to the function in some cases.
#option actionIndicates side effects, requiring the optimizer to keep all calls to the function.
#bodyDelimits the beginning of executable code. All code that precedes #body (such as #include) is generated outside the function definition; all code that follows it is generated inside the function definition.

Example:

//static int add(int x,int y) {
INTEGER4 add(INTEGER4 x, INTEGER4 y) := BEGINC++
  #option pure
  return x + y;
ENDC++;
         
OUTPUT(add(10,20));
          
//static void reverseString(size32_t & __lenResult,char *  & __result,
// size32_t lenValue,char * value) {
STRING reverseString(STRING value) := BEGINC++
   size32_t len = lenValue;
   char * out = (char *)rtlMalloc(len);
   for (unsigned i= 0; i < len; i++)
        out[i] = value[len-1-i];
   __lenResult = len;
   __result = out;
ENDC++;
OUTPUT(reverseString('Kevin'));
// This is a function returning an unknown length string via the
// special reference parameters __lenResult and  __result
         
//this function demonstrates #body, allowing #include to be  used
BOOLEAN nocaseInList(STRING search,
          SET OF STRING values) := BEGINC++
#include <string.h>
#body 
  if (isAllValues)
    return true; 
  const byte * cur = (const byte *)values; 
  const byte * end = cur + lenValues; 
  while (cur != end) 
  { 
  unsigned len = *(unsigned *)cur; 
  cur += sizeof(unsigned); 
  if (lenSearch == len && memicmp(search, cur, len) == 0) 
      return true; 
  cur += len; 
  } 
  return false;
ENDC++;
          
//and another example, generating a variable number of Xes
STRING buildString(INTEGER4 value) := BEGINC++
   char * out = (char *)rtlMalloc(value);
   for (unsigned i= 0; i < value; i++)
     out[i] = 'X';
     __lenResult = value;
     __result = out;
ENDC++;

//examples of embedded, LINKCOUNTED, and STREAMED DATASETs
inRec := { unsigned id };
doneRec := { unsigned4 execid };
out1rec := { unsigned id; };
out2rec := { real id; };

DATASET(doneRec) doSomethingNasty(DATASET(inRec) input) := BEGINC++
  __lenResult = 4;
  __result = rtlMalloc(8);
  *(unsigned *)__result = 91823;
ENDC++;

DATASET(out1Rec) extractResult1(doneRec done) := BEGINC++
   const unsigned id = *(unsigned *)done;
   const unsigned cnt = 10;
   __lenResult = cnt * sizeof(unsigned __int64);
   __result = rtlMalloc(__lenResult);
   for (unsigned i=0; i < cnt; i++)
       ((unsigned __int64 *)__result)[i] = id + i + 1;
ENDC++;

LINKCOUNTED DATASET(out2Rec) extractResult2(doneRec done) := BEGINC++
   const unsigned id = *(unsigned *)done;   
   const unsigned cnt = 10;
   __countResult = cnt;
   __result = _resultAllocator->createRowset(cnt);
   for (unsigned i=0; i < cnt; i++)
   {
       size32_t allocSize;
        void * row = _resultAllocator->createRow(allocSize);
        *(double *)row = id + i + 1;
        __result[i] =  (byte *)_resultAllocator->finalizeRow(allocSize, row, allocSize);
   }
ENDC++;

STREAMED DATASET(out1Rec) extractResult3(doneRec done) := BEGINC++
   class myStream : public IRowStream, public RtlCInterface
   {
    public:
        myStream(IEngineRowAllocator * _allocator, unsigned _id) : allocator(_allocator), id(_id), idx(0) {}
        RTLIMPLEMENT_IINTERFACE

        virtual const void *nextRow()
        {
            if (idx >= 10)
               return NULL;
            size32_t allocSize;
            void * row = allocator->createRow(allocSize);
            *(unsigned __int64 *)row = id + ++idx;
            return allocator->finalizeRow(allocSize, row, allocSize);
        }
        virtual void stop() {}
    private:
        Linked<IEngineRowAllocator> allocator;
        unsigned id;
        unsigned idx;
        
    };
    #body
    const unsigned id = *(unsigned *)done;
    return new myStream(_resultAllocator, id);
ENDC++;

ds := DATASET([1,2,3,4], inRec);

processed := doSomethingNasty(ds);

out1 := NORMALIZE(processed, extractResult1(LEFT), TRANSFORM(RIGHT));
out2 := NORMALIZE(processed, extractResult2(LEFT), TRANSFORM(RIGHT));
out3 := NORMALIZE(processed, extractResult3(LEFT), TRANSFORM(RIGHT));

SEQUENTIAL(OUTPUT(out1),OUTPUT(out2),OUTPUT(out3));

See Also: External Service Implementation, EMBED Structure