Thu Aug 11, 2022 8:56 pm
Login Register Lost Password? Contact Us

Please Note: The HPCC Systems forums are moving to Stack Overflow. We invite you to post your questions on Stack Overflow utilizing the tag hpcc-ecl ( This legacy forum will be active and monitored during our transition to Stack Overflow but will become read only beginning September 1, 2022.

Redundant data in raw files

Comments or questions on structuring and organizing your data

Mon Aug 01, 2011 6:14 pm Change Time Zone

I have about 10 files that are related by a 3 field composite key (roughly 45 bytes total). In the RDBMS world I would be inclined to convert the natural key into a numeric surrogate key to reduce the footprint and hopefully improve sorts and joins. The 3 key fields constitute 20-30% of the total data size.

From a performance perspective, does it make sense to do any sort of speculative pre-processing in HPCC?

Posts: 86
Joined: Wed Jul 13, 2011 7:40 pm

Mon Aug 01, 2011 6:38 pm Change Time Zone

Well - I don't know if I would describe it as 'speculative pre-processing' but essentially yes. Whilst HPCC is probably the fastest thing out there - we are still bound by the laws of physics. In general you should get your data model correct and TIGHT as early in your processing as possible.

By TIGHT I mean:
a) Fixed fields if possible (and as small as possible)
b) Into 'correct' types if possible (numbers as UNSIGNED/INTEGER etc)
c) Linking fields as UNSIGNED

Now - there is a slightly 'greyer' trade-off with regard to some of the more exotic but compressed types such as QSTRING and UNSIGNED3 etc. It costs more cycles to get data in and out of those types but they are smaller (which means they come off disk faster, go across network links faster and consume less memory). My general rule of thumb is that fields I use 'all the time' I will allow a fatter type that is natural to the system (UNSIGNED4/UNSIGNED8 etc) - fields that are just carried around for occasional use I will squeeze down.


Community Advisory Board Member
Community Advisory Board Member
Posts: 109
Joined: Fri Apr 29, 2011 1:35 pm

Mon Aug 01, 2011 6:41 pm Change Time Zone

That definitely helps, thanks for the quick reply.
Posts: 86
Joined: Wed Jul 13, 2011 7:40 pm

Return to Data Modeling

Who is online

Users browsing this forum: No registered users and 1 guest