HIPIE 101 – Anatomy of a DUDE file

Note: This entry pertains to HIPIE which is an upcoming paid module for HPCC Systems Enterprise Edition Version.

As someone that was once introduced at an international programmers conference as a “nerd’s nerd” it was very interesting to leave a conference with the business people looking happy and the technical folks looking confused. Yet this was the feat that we achieved at the recent LexisNexis Visual Analytics Symposium. We were able to show that we have transformed a kick-ass big data platform into a high productivity visual analytics platform; but for those that know how HPCC Systems works – it was not at all clear how we had done it. The purpose of this blog series is to address some of the gap.

The hidden secret behind all of the magical looking visual tools is HIPIE – the HPCC Integrated Plug In Engine. At a most fundamental level, HIPIE exists to allow non-programmers to assemble a series of ECL Macros into a working program. Critical to this, is it requires the writer of the macro to describe the macro behavior in a detailed and specific matter. This description is referred to as a ‘contract’. This contract is written in DUDE. ‘Dude’ is not an acronym; dude is ‘hippie-speak’.

The rest of this blog entry is a primer on the structure of a DUDE file. Information on some of the individual pieces will come later.

The first part of a DUDE file is the meta-information; what the plugin is called, what it does and who can do what to it. The plugin writer has total control over what they expose – and total responsibility for exposing it.

Next comes the INPUTS section:

The INPUTS section gets translated into an HTML form which will be used to extract information from the user. There are lots of opportunities for prompting, validation etc. The ‘FIELD’ verb above specifies that after the user has selected a DATASET – one of the data fields should be selected too. The INPUTS section will typically be used to gather the input parameters to the macro being called.

After INPUTS come OUTPUTS – and this is where the magic starts:

This particular plugin provides THREE OUTPUTS. The first of these (dsOutput) is the ‘real’ output and a number of things need to be observed:

dsOutput(dsInput) means that the output is going to be the input file + all of the following fields

,APPEND says the macro appends columns to the file, but does not delete any rows or any columns and does not change any of them. HIPIE verifies this is true and errors if it is not

PREFIX(INPUTS.Prefix) allows the user to specify an EXTRA prefix before parse_email. This allows a plugin to be used multiple times on the same underlying file.

The other two have the ‘: SIDE’ indicator. A major part of HIPIEism is the notion that even a process that is wrapped inside a contract ought to give LOTS of aggregative information out to show HOW well the black box performed. SIDE outputs can be thought of as ‘diagnostic tables’.

Next comes the piece that has everyone most excited:

Any output (although usually it will be a side effect) can have a visualization defined. A single visualization correlates to a page of a dashboard. Each line of the VISUALIZE corresponds to one widget on the screen. The definition defines the side effect being visualized and how the visualization should look in the dashboard. The same definition also shows how the dashboard should interact (see the SELECTS option).

Finally comes the GENERATES section – this may be a little intimidating – although reallyit is mainly ECL:

The way to think of this is:

It all eventually has to be ECL

%blarg% means that ‘blarg’ was a variable used in the input section and whatever was filled in there is placed into the %blarg% before the ECL is produced.

%^ means ‘HIPIE is going to do something a little weird with this label’. In the case of ^e it generates an ‘EXPORT’ but also ensures that the label provided is unique between plugins

In summary – a HIPIE plugin is defined by a DUDE file. The DUDE file has five sections:

META DATA – what does the plugin do / who can use it

INPUTS – what the plugin user must tell me to enable me to execute

OUTPUTS – information about the data I put out (including side-effects)

VISUALIZE – what is a good way to view my side effects

GENERATES – a template to generate the ECL that constitutes the ‘guts’ of the plugin

In the next blog entry, we will answer the question: how do you drive the interactive dashboard (VISUALIZE)?