Once you've sprayed the data into the HPCC Systems cluster you must define the RECORD structure and DATASET. The following RECORD structure defines the result of the spray above:
imageRecord := RECORD STRING filename; DATA image; //first 4 bytes contain the length of the image data UNSIGNED8 RecPos{virtual(fileposition)}; END; imageData := DATASET('LE::imagedb',imageRecord,FLAT);
The key to this structure is the use of variable-length STRING and DATA value types. The filename field receives the complete name of the original .JPG or .BMP file that is now contained within the image field. The first four bytes of the image field contain an integer value specifying the number of bytes in the original file that are now in the image field.
The DATA value type is used here for the BLOB field because the JPG and BMP formats are essentially binary data. However, if the BLOB were to contain XML data from multiple files, then it could be defined as a STRING value type. In that case, the first four bytes of the field would still contain an integer value specifying the number of bytes in the original file, followed by the XML data from the file.
The upper size limit for any STRING or DATA value is 4GB.
The addition of the RecPos field (a standard ECL "record pointer" field) allows us to create an INDEX, like this:
imageKey := INDEX(imageData,{filename,fpos},'LE::imageKey'); BUILDINDEX(imageKey);
Having an INDEX allows you to work with the imageData file in keyed JOIN or FETCH operations. Of course, you can also perform any operation on the BLOB data files that you would do with any other file in ECL.