Having blogged about being a fly on the wall at a meeting of the European contingent of the HPCC Systems development team, it’s time to look through a window into the world of the developer offsite meeting, which is attended by the wider HPCC Systems core development team. At these meetings, the Europeans are joined by a number of US based colleagues for these biannual, week long, residential gatherings.
This retreat narrowly missed being completely disrupted by a major snow event here in the UK, causing heavy snowfall in some areas of up to 56cm. Now I know to those of you who may have experienced the often extreme weather events of the north eastern seaboard in the US this may seem like child’s play, but for some of us here, it’s the most snow we’ve seen in decades and we aren’t used to coping with it. Airports were closed, roads were impassable, major snow drifts and deep snow cut off some villages for days. So, it was touch and go right up until the intended start date, given that the chosen location was in the north east of England, west of Darlington, with views over the North Pennine Moors. However, with a few adjustments, we were able to go ahead as planned and everyone made it.
Horsley Hall is a rather lovely, if somewhat gothic looking, ex hunting lodge. My over active imagination has me visualizing my arrival as a detective about to solve a mystery straight out of The Moonstone (by Wilkie Collins), the effect further compounded by creaking doors and floorboards!
I walk into the large living room to be greeted by the team who are gathered around a long table and immediately see my arrival as an opportunity to break for lunch. I fancy a coffee after my journey but apparently, they have drowned themselves in it this morning and can’t face another cup.
I always join these meetings for a day at the end. It’s a good time to arrive because the developers have had all week to discuss the big issues and collaborate on coding sessions which often lead to features being completed before they part company.
The purpose of these meetings changes depending on where we are in the development cycle. Often they involve deep dive strategic discussions about the future direction of the project. However, this time, we are firmly focused on wrapping up HPCC Systems 7.0.0. We intend to release a beta version quite soon and we want all major new features finished in time to make the beta release. If it doesn’t make the beta, it’s unlikely to make the gold version.
I’ve created a filter in JIRA to help us keep track of the features to be included. My first aim is to compare my list with theirs, allowing me to make adjustments needed as a result of any decisions made during the week. A major contributing factor is the fixing of the code complete date at the end of April, which means taking a realistic look at what is achievable. The list of major new features is pretty impressive and here is a preview:
- eclcc indexer
Speeds up syntax checks by avoiding the reparsing of ECL when compiling with a local repository
- Remote projection and filtering
When reading data remotely, brings back only the rows and columns which are needed rather than everything, which means discarding what is not needed
- Record translation
Decouples the declaration of record layouts from the code, which means code does not have to be updated when the layout changes, making it easier to manage data upgrades.
- Delayed spilling
Makes the process of sharing datasets between sub graphs available without spilling to disk in Thor
Major DESDL service management process improvements, new ESDL definition management features in ECL Watch and ECLIDE, along with various usability improvements in the UI and command line tools.
- Round robin spray
Non-partitioning spray implementation which streams the source file to the target nodes in turn.
- ECL for VS Code
An extension which adds rich language support for the ECL language to VS Code.
- Spark connector
Ability to read/write Thor files natively from Spark
- Memcached improvements
Caching added to ECL Watch for better performance
- Session management
ECL Watch logout, screen and tab locking which when unlocked, take you back to where you were
- Keyed joins improvements
Significantly better performance on Thor
- Persistent connections
Performance improvement for clients of ESP and ROXIE, which is particularly significant for secure connections
The ability for ECL code to access record fields by name, retrieve type information etc
- WS-SQL integration - From 7.0.0, this will be integrated into the platform download rather than available as a separate add on
- Default SSL to on
Automatically generate the self signed security certificate
- New Unicode implementations for standard ECL functions
These were added by a student from our intern program last summer (David Skaff), who was also the first high school student to join our intern program
- New EMBED activity - A new flag on the EMBED function allowing you to execute in parallel and perform a bulk update on, for example, an SQL database
- XREF improvements
Front end XREF support for ROXIE and some general usability enhancements
Click on each title to see the associated JIRA issue for each feature and keep checking back for blogposts providing additional details and usage information.
Having updated JIRA on the run, we now know what we’re all aiming for and my thoughts start wandering. I’m wondering what they have been doing all week. I’m always genuinely intrigued by the answer to this question which I ask every time. I find out that they have been putting their heads together to make some serious progress on how to implement remote projection and filtering; a major feature of the 7.0.0 release.
The coding session that followed their discussion meant that by the end of the week, they had a pull request for supporting remote projection and filtering from hThor disk read operations. This involved making changes in both dafilesrv and hThor, moving this new feature significantly closer towards completion.
Looking back over the week always throws up something a little surprising. This time I discover they have had a blue sky discussion about future development on the HPCC Systems open source project. They have been thinking about what HPCC Systems may look like in 5 years time. This prompted them to reflect on what HPCC Systems looked like 5 years ago, helping them to get a handle on the level of progress it is possible to achieve within a 5 year timespan, as well as the technology advances we have followed and responded to at the same time. I smile to myself because I think this is pretty interesting and worth sharing. So, what didn’t we have 5 years ago? The list is quite long, so I’m just going to mention some of the highlights as seen through the eyes of this team:
- Embedded languages
- Local compilation
- Multi-channel Thor
- Thor child queries
- Significantly improved performance - multi-processor aware, multiple processes per node
- Security improvements
- Dynamic ESDL
- Cassandra workunit support
- ECL language improvements - Remote read/filter/project and smart joins
- Support for JSON, Rest, Spark read
- Machine Learning Library
- More powerful ECL Watch
These days, we most likely take all these features for granted and probably can’t imagine life without them. So, let’s now look at some features the team would like to implement in the next 5 years:
- Thor to fully use 1000 cores (ROXIE does this already)
- Bi-directional file exchange with other ecosystems
- Dynamic Thor sizing/multi job Thor/Thor using published services
- GPU support
- Parallel R/Python
- Multi version support
- Intelligent IDE
- Multi-tenant support
- Remove/retire/replace: hThor, ECLServer (remote repository), DALI (for DFS and workunits), filespray, ECL IDE
- Improve programmer productivity, get more user feedback and receive more contributions from outside the platform team
The technology industry moves on constantly, meaning we don’t always know what we might want or need to respond to and so we remain focused on making sure we are following and driving the leading edge. This wish list was captured at one point in time and should we look back in 5 years, the list of actual changes made could look very different. I suspect that even if I ask the same question next year, the list will have moved on. But nonetheless, I found it fascinating to capture the thoughts and aspirations of this team who are constantly thinking about how we can improve HPCC Systems.
During the week long retreat, each developer presents a 10 minute session about a topic of their choice. These sessions can be quite varied, sometimes it’s a demo to show off an idea, others are used to bounce ideas around the group to help clear a stumbling block. The presentations may only last 10 minutes, but the ensuing discussions go on much longer and some result in a coding session which pushes the feature closer to completion. Given that this group work remotely most of the time, it’s a great opportunity to get heads together in a room working to get some immediate results. I’ve often observed in the past that while this is a team made up of individuals, when they get together, it’s like watching an organic machine kick into action with many heads acting as one with a common purpose.
While they achieve a lot in the week, it’s not all about the heads down grafting that goes on.
In the middle of the week, having been cooped up inside for a few days, they all went for a bracing walk in the snow. Just what was needed to clear the mind.
And the prize for hiking in the snow? This lovely view of the High Force waterfall:
A regular feature of this retreat is when the group divides into smaller teams, each providing dinner for the rest one evening during the week. There are no exceptions, so it’s my turn to join in, preparing a Mexican feast with Rodrigo and Attila. This is necessary to use up the rather alarming number of avocados that appear to have bred in the house the during the week.
The end result sampled by the Chief Taster - Richard Chapman
While some of us are cooking, others may take the opportunity to spend some quality time with a colleague, as they wait for dinner to be ready:
The evening is finished off with the ‘name game’. We all put a place, name, event and action in the hat and work through four different rounds in our teams to guess what’s on the paper, scoring points as we go. It gets quite competitive with bits of paper flying all over the place in the rush to win as many as possible! The funniest are the actions only round and the one where only facial expressions are allowed. But we have heard most of them before already and can guess.
Rodrigo Pastrana is the newest member to join the core platform team and I think we can definitely say we welcomed him on board in style!
The next morning there is a wrap up session before we all leave to travel home. It’s clear that this has been both a productive and enjoyable week.
Who’s who and what they do…
Lorraine Chapman – New release and intern program co-ordinator
Richard Chapman – Team leader and ROXIE guru
Gavin Halliday – Code generator guru
Jake Smith – Anything related to Thor and Dali
Attila Vamos – Expert on DFU spraying and keeps us honest with his automated testing
Shamser Ahmed – Code generator and statistics improvements
Gordon Smith – Anything UI related (e.g. ECL Watch, Visualizations and ECL IDE)
Mark Kelly - Thor, networking and performance tweaks
Tony Fishbeck - All things ESP and ROXIE packages
Yanrui Ma - ESP, especially ESDL
Rodrigo Pastrana - All things WS-SQL, DESDL, JAPI and connectors