Data for Social Good – KSU Hackathon and The ADAM Program
In March 2021, HPCC Systems took part in the Kennesaw State University College of Computing and Software Engineering Hackathon. We have been sponsors of this event since 2017. The aim of the March Hackathon event was to demonstrate to students how big data analytics may be used for social good to solve problems in society that make a real difference in people’s lives. HPCC Systems provided a challenge giving KSU students the opportunity to learn more about The ADAM Program and take a look at missing children trends.
The National Center for Missing & Exploited Children and The ADAMTM Program
There is a tragic story at the heart of this partnership. In 1984, John Walsh and his wife, Revé founded the National Center for Exploited & Missing Children (NCMEC) after their son, Adam, was abducted and murdered in 1981. At that time, resources to help find missing children were non existent in the USA, so they founded NCMEC to plug that gap. Over the years, NCMEC has developed effective and efficient ways to search for missing children, helping to reunite hundreds of thousands of them with their families. In the year 2000, LexisNexis Risk Solutions became partners with NCMEC, developing the Automated Delivery of Alerts on Missing children, now known as The ADAMTM Program.
Time is of the essence when searching for abducted and missing children, so the automation of the missing children alerts provides the means to distribute the details as soon as the information becomes available. The ADAMTM Program has also developed over the years, adding new features to enhance the information provided, such as being able to pinpoint locations on a map or isolate a highway corridor, allowing NCMEC to share photos of missing children with people in those specific locations, so they can be on alert.
(1) Pinpoint locations of missing children in the city of Atlanta, GA
(2) Pinpoint locations of missing children along a highway corridor between Atlanta, GA and Birmingham, AL
LexisNexis Risk Solutions also assists NCMEC with complex cases by providing access to the expertise of data analysts and law enforcement officers via the Special Investigations Unit.
The Challenge – Identify sex offenders in an area where a child went missing
Students were challenged to devise a weighted scale to determine possible suspects for each missing child, using three datasets provided:
- A dataset of missing children in January 2021 for the state of Georgia, USA from the NCMEC public feed
- USA geographical data (providing the latitude, longitude and fips points for every single county and city)
- Registered sex offenders in the state of Georgia
When a missing child is reported, it is important to establish the risk to that child from offenders in the area with a criminal history involving children. The goals of this challenge were to:
- Identify the offenders in the area where the child was reported missing
- Given a route of travel/location, map the registered sex offenders to investigate as part of the case
- Look at possible inferences that might be made from the data, such as missing children hotspots or locations where there are higher numbers of people on the sex offenders register
- Create a weighting system based on the number of sex offenders and location within a radius, map at county level and use the ECL Visualizations Bundle to display the results
Students worked in teams using HPCC Systems and ECL to provide a solution for getting the required results. Each team approached the challenge in different ways. Here are the methods used by the top three winning teams.
Team Big-O – First Place Winners: Neel Patel and Yagna Patel
Using a 5 mile radius for each missing child, they set out to identify sex offenders within that radius:
Their next aim was to establish a ranked list of ‘highly likely’ suspects using a weighted scale, making the following assumptions:
- All people are less likely to commit crime
- If an offender is incarcerated they are incarcerated for life
- If they committed a crime recently they are more likely to commit a crime again
They created dummy variables called Incarcerated, Predator and Absconder. Where someone who is incarcerated has the lowest ranking, someone who is a known predator has a higher ranking and someone who is an absconder is at the highest ranking level on the weighted scale.
They also looked at the age the offender was when the child went missing and how recently they committed a crime.
Results and Reflections
When they looked at the results, the number one hit focused on one of the youngest offenders. He had committed a crime in the last 12 months and fell into the highest level ranking on their weighted scale, based on the fact that he was known to be an absconder.
Using this method, they were able to get ranked results for every single missing child.
Team Big-O reflected that with more time and more accurate data, they would be able to optimise their weighted scale and replace their assumptions with facts based on the data in the improved dataset. For example, the assumption that once an offender is incarcerated they stay incarcerated, is not an accurate assumption. Prison terms are often reduced after offenders have served a certain percentage of their total sentence. An accurate and more up to date dataset that contains prison release information, would make it possible to factor this into the weighted scale.
Team HPCC2 – Second Place Winner: Lauren Pope
Lauren’s starting point was to establish the weighted risk order to be used awarding a percentage risk score for each of the following categories:
- Gender (30%)
An assessment of the data content showed that females were 5 time more likely to go missing as males. - Age (30%)
The data showed that the highest risk groups in terms of age, were children between the ages of 15 and 17, with the lowest risk between 9 and 11 years old and another spike for children between the ages of 0-8 years. - Distance of the child from the sex offender (20%)
To also take into account the potentially close proximity of a sex offender within the child’s social sphere. Weightings were higher for the city area to reflect the population concentration and higher numbers of sex offenders present. - Currently or have ever been incarcerated(10%)
Weighting the risk for those who had been incarcerated at some point in the past as well as those currently incarcerated, was important in order to reflect that at some point in their lives, the offenders concerned had been considered dangerous enough in some way to warrant a custodial prison sentence. - Predator (10%)
A weighting value was required to reflect whether a person was considered to be a sexual predator.
Lauren noticed that not all the information on the posters was available in the dataset, so she added the information required into the dataset to increase the accuracy of her analysis. Using these weightings, Lauren was able to establish an overall risk score for each child.
As well as the risk weighting order shown above, Lauren suggested some other variables to use that might help to establish a more accurate risk score:
- Time since an offender’s last conviction to give more weighting to offenders that have been active recently.
- How long a child has been missing. Potentially, the longer a child has been missing, the greater their risk.
- The number of sexual offenders in a given area. City areas tend to have more offenders per square mile than more rural areas.
- Zip code for a missing child. This could be the child’s home or the zip code provided by law enforcement with knowledge of the last known place the child was seen. In highly populated cities, more granularity at this level may help with targeting specific areas.
Results and Reflections
Lauren did a lot of impressive research to come up with her risk weighting scores and she reflected on how bias might affect these scores. To try to avoid bias, she focused on the maths, specifically looking at what the data was telling her about categories like gender and age. She was concerned about the weighting attached to those who have been incarcerated at some point in their lives and thought about how it might be possible to reduce the potential bias of assuming that because someone once offended, they may offend again. She concluded that the dataset would need to include more variables to give more granularity to that score.
During the challenge she had to learn the ECL language. Lauren is new to the field of technology having started to learn how to code just 2 months before the hackathon. She had to apply her relatively limited knowledge to learn how to code what she needed in ECL. Clearly, she is a natural, given what she has achieved during this hackathon and her second place award is very well deserved. She produced the map below via Google to show the sex offenders in an area for a missing child in the dataset.
Third Place Winners: Taylor Blade, David Sousley and Mia Wimbish
KSU CCSE Hackathons are usually only open to students who are studying within the College of Computing and Software Engineering. This hackathon, however, was open to students of all majors. This team reflected this diversity, with students studying different majors in Computer Science (Taylor Blade), Cybersecurity (David Sousley) and Interactive Design (Mia Wimbish).
They found the latitude and longitude of a missing child and applied a radius circle, looking at points within the circle that matched the latitude and longitude of registered sex offenders in the dataset. Having identified the sex offenders, they then needed to assess them by applying a risk score based on the following criteria:
- Incarceration Status
They filtered out offenders that were incarcerated when the child went missing, since they could not have perpetrated the crime while in prison. - Level
Based on a ratings score where for example, a level 1 offender may have a 13-35% chance of reoffending. - Crime
Not all crimes are equal and some may not include offences against children. - Absconder Status
Fugitives who are not following the rules of their prison sentence which may indicate they are at more risk to communities. - Location and Proximity to the child’s ‘last seen’ location
Results and Reflections
While focusing on solving the challenge, the team were also aware of potential constraints such as not having all the information needed in the dataset and having to make certain assumptions and inferences.
They also needed to understand and learn ECL in a short period of time and how to use it to perform their analysis. The team took the time to understand the syntax and identify the structures and functions they might need to use and put a lot of work into coding their solution, which they were able to use to get results for a missing child.
Comments from Mentors
This hackathon event lasted for one week and was a huge success. We would like to congratulate all teams who worked on our challenge and produced such well thought out and executed solutions. The ECL language was new to most students and it was great for us to see how easily they adjusted to using ECL to code their solutions in what was a fairly short space of time.
We take great pride in the fact that the ECL language is quick to learn and easy to use. The ECL language was designed specifically for creating queries into big data and anyone who is inspired to learn how to use it by the progress of these students, can go and take an online training course, read the documentation and follow a tutorial to get started quickly!
Students were supported during the challenge by three LexisNexis Risk Solutions colleagues.
The HPCC Systems mentors, were impressed by the speed and ease with which the students learnt a new platform and language while developing their ideas for solving the problem and executing their solutions in just a few days. While the subject matter focused on sad and worrying real life events, the students channelled their motivations towards trying to help and it was interesting to see the results they found and how they interpreted them.
“The 2021 Spring Hackathon at Kennesaw State University was one of the most memorable because of the focus on The ADAM Program. Helping students to understand The ADAM Program and its relevance to finding missing children was especially fulfilling as it also gave perspective to some of the sadder realities of life. It was very encouraging to see the students immerse themselves in finding solutions to the problem while using their skills for social good. A sincere congratulations to the winners, participants and the KSU organizers for a well coordinated and successful hackathon.” – Arjuna Chala
“Congratulations to all teams that managed to show results in a short space of time, given all the challenges to work through. The results were a testament to their persistence, enthusiasm and passion for the subject while providing the perfect opportunity to showcase potential career opportunities using data analytics for social good.” – Bahar Fardanian
“The KSU Hackathon was a great event. Many projects showed ingenuity and the kinds of innovative thinking we look for at hackathon events. It was wonderful to interact with students while they worked through the problems and learnt how to use HPCC Systems and the ECL language.” – Dan Camper
More about The ADAM Program and how you can get involved too!
Trish McCall, Director, Program Management, LexisNexis Risk Solution is the Co-founder of The ADAM Program and Kennesaw State University is her Alma Mater. As part of the hackathon, Trish presented to students, providing them with some background information, including some statistics about missing children and the powerful story behind the work NCMEC does and the contribution The ADAM Program has made.
Since NCMEC was founded:
- Their national toll-free hotline has taken more than 5 million calls and circulated billions of photos of missing children
- They have assisted law enforcement in the recovery of more than 348,000 children and trained more than 379,000 law enforcement, criminal/juvenile justice and healthcare professionals
- In 2020, they have assisted law enforcement and families with more than 29,800 missing children cases.
According to the FBI, in 2019, there were 421,394 NCIC entries for missing children. Of the nearly 26,500 runaways reported to NCMEC in 2020, one in six were likely victims of child sex trafficking. Statistics taken from NCMEC Key Facts.
In 2020, The ADAM Program distributed over 1.7 million poster alerts on over 2,100 missing children cases and has directly helped in the recovery almost 200 missing children, as well as assisting in the recovery of countless others.
“It was great to be involved as a sponsor in the KSU CCSE 2021 Hackathon for Social Good. It was impressive to see the results from the competing teams using HPCC Systems in innovative ways to identify possible trends to help in the recovery of missing children. The vast number of missing children in the USA is a serious problem. Working together in partnership with NCMEC, academic institutions and the community is extremely important for raising awareness about the issues involved in missing children cases. We can greatly improve the chance of recovering missing children by increasing participation in The ADAM Program.” – Trish McCall
The ADAM Program is a free service. You can also help bring a missing child home by signing up for alerts to receive posters for missing children in your US based location.
HPCC Systems is Celebrating 10 Years as an Open Source Big Data Analytics Platform
Join us as we mark this anniversary event with users, colleagues, ambassadors and collaborators via a series of video podcasts. It’s great to reflect on how we got to where we are today with the stories shared in this series and look forward to what may lie ahead in the future. View the full list of podcasts on our 10 Year Anniversary Podcast Series Wiki.
Featured Podcast
Click on the image below to join Flavio Villanustre (VP Technology and CISO, LexisNexis Risk Solutions Group) and Hackathon Mentor Dan Camper (Senior Architect, LexisNexis Risk Solutions Group). Find out how Dan stumbled upon HPCC Systems by accident when he was looking for something to manage lots and lots of data, finding HPCC Systems to be a natural fit. He also talks about the Data Patterns bundle and interoperability between HPCC Systems and other languages and datastores.