Skip to main content

Our links and collaborations with a variety of academic institutions got us thinking how great it would be to invite students working on HPCC Systems related projects to showcase their work in the form of a poster presentation. As a result, the HPCC Systems Poster Presentation Competition was born and the first one was held at our Community Day Summit in 2016. It was such a success, we decided to hold the competition again at our Community Day Summit in 2017 and we expect to make it a regular feature of this annual event in the future.

This year’s students and poster entries were so impressive, that I simply can not leave it there. I want to take you on a virtual tour around the room so that you can meet the judges and all ten of the students, see their posters and learn about their projects.

Students Group Photo

Poster Presenters 2017
Back row: Zhe Yu, Cerise Walker, George Mathew, Yuheng Du, Vivek Nair, Chin-Yung Hsu
Front row: Nusrat Asrafi, Lili Zhang, Lily Xu, David Skaff

Before we take our tour, I want to draw your attention to a few interesting features. Firstly, we had a high school student present a poster this year. David Skaff is an 11th grader at NSU University School in Florida, who joined the HPCC Systems Intern Program this summer. Why is this so impressive? Well, until this year, our intern program and poster competition students were all Undergraduates, Masters and PhD students. I hope this encourages other high school students to follow in David’s footsteps.

Secondly, if you look down the list of poster presenters, you will notice that four out of the ten students are female. The gender gap in industry generally is something that is widely talked about these days, (it was on the agenda at the World Economic Forum in Davos earlier in 2017). So it’s great to see that four out of the ten entrants are women.

Lastly, I was surprised when I saw the title of George Mathew’s poster because I had assumed he would present about the intern project he completed with HPCC Systems in the summer of 2017, which involved implementing the Gradient Boosting Trees machine learning algorithm in ECL. In fact, the poster he presented was based on an idea that he wanted to share with us, which came to him as a result of working on his intern project. One of the speakers at our conference said that real data science is not about the eureka moments, it’s about those times when you say ‘hmmmm’, as you discover something you didn’t expect or know before. I guess George had one of those moments!

So let’s take that tour now and meet the judges, students and see the posters.

The judging of the posters was carried out by Victor Herrera, (a visiting judge from Florida Atlantic University who presented at our summit in 2016 and was also a judge for our 2016 Poster Competition) and four LexisNexis employees who have knowledge and experience of using HPCC Systems, ECL and data science:

Poster Competition 2017 Judges

Poster Competition Judges 2017
From the left, Victor Herrera, Tasha Reid, Becky Champion, Mark Kelly, Jo Prichard

The judging criteria focused on the level of interest the idea may have for our community, poster content and the presentation of the information; both visual and verbal. For the content, the judges were particularly looking for original ideas, the relevance to HPCC Systems and the overall theme of the Community Day Summit which was ‘Smart Data’. The design of the poster and presentation of the ideas needed to be well organized, clear and flow well from one idea to the next.

The poster projects were so diverse and the quality so high, that I know the judges found it hard to choose the winners. They did a great job of spending time with each student, listening to their explanations and asking questions to help them make their decisions. They took a little extra time at the end to consolidate their thoughts. I thank them for their thorough and fair scoring.

Let’s walk around the room, visiting the students in the same order in which they were positioned on the day. To see any poster in more detail, click on the image.

Chin-Yung Hsu – PhD Student, North Carolina State University
Haas – HPCC Systems as a Service

This poster presents the tools Chin created and explains and justifies the reference architecture and many of the configuration options for managing HPCC Systems clusters in AWS.

Chin-Yung Hsu Poster

Notes from the judges’ scorecards:
Chin provided a live demo which one of the judges said was ‘amazing’, while another commented ‘and it works!’. Overall, the judges thought this was a great idea and something that would be of immense help to those who are not expert HPCC Systems users. This was reflected in the across the board high scores Chin received for his idea.

Nusrat Asrafi – PhD Student, Kennesaw State University
Malware detection using frequency based graph mining
Nusrat’s poster focuses on the very topical area of malicious software. This poster, represents a method for extracting statistically malicious behavior from system call graph (obtained by running malware in a sandbox).

Nusrat Asrafi Poster

Notes from the judges’ scorecards:
Nusrat’s work rated highly as an original idea with our judges, who also scored her highly for presenting her ideas clearly.

George Mathew – PhD Student, North Carolina State University
Cohesive framework for legislative documents and research papers
George’s idea focuses on finding answers to questions such as:
What is the connection between the laws we write and the papers generated by researchers? Do government directives guide research? Does government legislation respond appropriately to new research results? How can we check?
To answer these questions, his project explores text mining and LDA for legislative and research documents.

George Mathew Poster

Notes from the judges’ scorecards:
The judges thought George had come up with a very original idea that was well presented. One of the judges commented that it was ‘the first time I have seen this’ idea.

Cerise Walker – Undergraduate, Wayne State University
Is the secret to longevity eating chocolate?

In this study, Cerise used correlation and regression analysis to determine whether a strong correlation exists between amounts of chocolate consumption worldwide, average life expectancy and the happiness index of the corresponding countries.

Cerise Walker's Poster Entry

The judges noted the use of the HPCC Systems Machine Learning Library and Visualizer Bundle. A well presented idea by Cerise that communicated her ideas clearly.

Zhe Yu – PhD Student, North Carolina State University
Fast retrieval of relevant information through HPCC Systems
Zhe’s study leverages the scalability of HPCC Systems and its fast, distributed data storage, to look at the implementation of active learning solutions which allow multiple human experts to work on one project collaboratively. His poster demonstrated the use of his FASTREAD_ECL tool with the HPCC Systems Machine Learning, to support human experts on identifying “useful” information with reduced cost of time and effort. His results suggest that a large portion of manual work can be reduced with the help of this tool.

Zhe Yu Poster

Notes from the judges’ scorecards:
It was clear from the scoring by the judges on this poster that Zhe presented his ideas extremely well and is a very good communicator. Some of the judges were particularly impressed by the relevance of this idea to HPCC Systems and our summit theme of smart data.

Lily Xu – PhD Student, Clemson University
Optimizing performance of ECL-ML YinYang K-Means clustering algorithm

Lily’s poster idea builds on the work she completed during her internship with HPCC Systems in 2016. In her poster, she shows how she has worked to improve the performance of the Yinyang K-Means algorithm. The results she obtained show that its performance is largely improved compared to the previous work.

Lily Xu Poster

Notes from the judges’ scorecards:
The judges noted that this piece of work is extremely relevant to HPCC Systems and the summit theme of smart data which was reflected in the high scores she received for these judging criteria. She also presented her ideas to the judge very clearly through her well organized poster.

Vivek Nair – PhD Student, North Carolina State University
Spark and HPCC Systems; Strangers no more?

This solution seamlessly integrates two big data systems, HPCC Systems, and Spark. Vivek has implemented a solution (as part of his 2017 HPCC Systems internship) which allows Spark users to use HPCC Systems as a data store and ECL users to use subroutines written in (Py)Spark.

Vivek Nair Poster

Notes from the judges’ scorecards:
Vivek’s scores in the originality judging criteria stood out here, almost achieving a full mark score in this category. One judge remarked that this was extremely ‘relevant to today’. The scores in the presentation section also show that they enjoyed hearing about his project and the delivery of his presentation was very professional.

David Skaff – 11th Grade High School Student, NSU University School, Florida
Unicode implementations for HPCC Systems Standard Library Functions

David’s poster showcases the work he completed as part of his 2017 HPCC Systems internship.

The capability to effectively manipulate unstructured text continues to gain importance as HPCC Systems faces increasing variations across the documents that it works with. These variations present challenges in maintaining efficiency despite accounting for the different possibilities; however, implementing Unicode helps to universalize the range of input of the HPCC Systems standard library functions.

David Skaff Poster

Notes from the judges’ scorecards:
This was a very well received presentation. David scored highly for the design of his poster and the delivery of his explanation. One judge scored David’s poster so highly that he only dropped one point from the total number of marks possible. Two judges gave him full marks for his poster design.

Lili Zhang – PhD Student, Kennesaw State University
Implementing a sentiment-change-driven event discovery system on HPCC Systems

The emergence and prevalence of social sites provides the public platforms for people to exchange information and express their opinions, which makes a huge amount of data available for study. Lili presented a system leveraging the HPCC Systems platform, to automatically discover important events that have significantly driven people’s sentiment changes towards a target based on data from Twitter.

Lili Zhang Poster

Notes from the judges’ scorecards:
The judges noted the relevance of Lili’s idea to our smart data theme. She scored highly for the organization for her poster which presented her ideas very clearly.

Yuheng Du – PhD Student, Clemson University
Representativeness of latent dirichlet allocation topics estimated from data samples with application to common crawl

Common Crawl is a massive multi-petabyte dataset hosted by Amazon which has been widely used for text mining purposes. Yuheng’s poster shows how he performed a systematic test on the representativeness of topics estimated from Common Crawl compared to topics estimated from the full data of online forums. His research will be of interest to analysts who wish to use Common Crawl and HPCC Systems to study topics of interest in user forum data.

Yuheng Du Poster

Notes from the judges’ scorecards:
Yuheng’s poster was rated highly by the judges for its clarity and flow. The judges also indicated that he presented very well, clearly explaining his project in detail.

And the winners...
Trish McCall and I spent time tucked away in secret that evening, tallying the scores and the results were announced during our Community Day Summit the following day as follows:

1st place winner
Chin-Jung Hsu – HaaS: HPCC Systems as a Service
North Carolina State University

2nd place winner
David Skaff – Provide Unicode implementations for HPCC Systems standard library functions
NSU University School, Florida

3rd place winner
George Mathew - Cohesive framework for legislative documents and research papers
North Carolina State University

Congratulations to our three winners of the 2017 HPCC Systems Poster Presentation Competition!

All posters were available for viewing by delegates attending our 2017 Community Day Summit. I’m sure anyone who saw the posters and spoke to the students at this event, would agree that the work they presented was of an extremely high standard. It is clear that all our poster presenters are incredibly hard working, have extremely bright futures ahead and I want to thank them all for taking part.

One last thing, while this was a competition, this group of students supported each other throughout our summit. I noticed how they took an active interest in each other’s work, going to look at the other posters and asking for an explanation of the projects ideas. They also hung out together during Community Day and although there had to be winners, they all congratulated each other on their work regardless. This was a wonderful group of young, talented people and it was a privilege to have them present at the 2017 HPCC Systems Poster Presentation Competition.

Find out more about our 2017 Technical Poster Presentations, including more detailed extracts about each poster project. More information is also available about the HPCC Systems Intern Program and our Academic Program.