In pursuit of perils : Geo-spatial risk analysis through HPCC Systems

I have always wanted to say I was working on big data projects and in my own time I have dabbled with frameworks such as Hadoop on Amazon EC2 but could not make the case for merging my day job with the big data aspiration. Now enter LexisNexis® Risk Solutions, who acquired our company in 2013, who, guess what, excel in big data through HPCC Systems® (http://hpccsystems.com/), their open source big data processing platform that powers their data services business. At last, there was potential to merge the big data aspiration with the day job. Nice.

Our solution is called Map View, which is is used by over 8,000 home and commercial property underwriters worldwide. Map View is a geo-spatial risk intelligence solution for the insurance industry; which for the most part consists of correlating existing and potential policy property locations with commercial and open peril data. This peril data informs the insurer what existing and potential risk exposure exists at the provided locations. Peril data consists of models pertaining to such risks as coastal flood, river flood, surface water, earthquake, subsidence, crime, wind and more. The insurer uses this risk assessment to decide what the suitable premium should be based on the severity of the discovered risks. Insurers also use this information to review clusters of risk exposure at various locations.

So that’s what we have been doing, and it has been going well from a technical point of view, but the bottleneck has always been scale, although we optimised our algorithms to play nice with the spatial relational database platform we were using, it still did not realise the order of magnitude performance improvements we needed.

Now for the crux of what this blog entry is about: leveraging HPCC Systems as our geo-spatial analysis engine and repository of our vector geometry and raster data.

The geo-spatial algorithms and binary geo-reference file handlers that we have been using did not exist in HPCC Systems, but Enterprise Control Language (ECL) the programming language for HPCC Systems, is extendable. It was clear, that to bring our geospatial processes into the realm of big data, we needed to extend ECL and that’s when we reached out to our colleagues in the greater LexisNexis family. We reached out to the HPCC Systems algorithms group within LexisNexis technology, who meet on a regular basis. The group was the perfect sounding board for our first experiments in using bespoke C++ with HPCC Systems. Thanks to these guys we got a C++ plug-in skeleton up and running.

A few of us met in the Dublin office to determine the core spatial capabilities we would need in HPCC Systems. Having established this list, rather than implementing algorithms from scratch, we were familiar with some geo-spatial libraries that include most if not all of what we were looking for, so no need to cherry pick, let’s just integrate the entire library and expose what we need to ECL.

The main capabilities we are looking for fall into 4 categories:

  1. Spatial filtering of vector geometries
  2. Spatial operations using vector geometries
  3. Spatial reference projection and transformation
  4. Reading of compressed geo-raster files

The libraries we focused on were GDAL/OGR, GEOS, and Proj.4; The GDAL/OGR library built to include the use of the GEOS library provides spatial filtering satisfying the use of DE-9IM spatial predicates (e.g. within, touches, intersects etc…) and also provides various spatial operations such as distance and convex-hull etc.; OGR which is attached at the hip to GDAL is a simple feature library and it accepts vector geometries as Well Known Text (WKT) wrapped as a string literal. The Proj.4 library encapsulates functionality for the projection of spatial references e.g. WGS84, UTM, Web Mercator, or NAD83 etc.; it also is used for the transformation of coordinates from one spatial reference system to another. GDAL provides functionality which allows for the reading of GEOTiff binary raster files.

Sample use case: augmenting building data with derived peril fields

The following map shows various buildings (labelled 1-7), a fire station, a police station, and a large flood zone based on a 500m buffer either side of a river.

For this sample use case, the business objective is to augment an existing dataset of buildings with peril information by correlating their physical location with available geographic peril data. The ECL executed within HPCC Systems will use the GDAL library to perform the distance and point-in-polygon spatial operations.

Image showing a sample map showing elements used for augmenting building data with derived peril fields

In particular we want to augment the building data with 3 perils:

  1. Fire risk category based on distance away from fire station
  2. Crime risk category based on distance away from police station
  3. Flood risk flag based on being within a flood zone

We start off using this buildings dataset.

An image illustrating the use of an ECL Transform to augment building data with inferred geo-spatial values

The ECL used to produce the above result:

import $.GeometryLite as Geometry; 

BuildingLayout := RECORD
  STRING geom;
  INTEGER4 id;
END;

PerilAugmentedBuildingLayout := RECORD
  BuildingLayout;
  INTEGER4 distanceToFireStation;
  STRING1 fireRiskCategory;
  INTEGER4 distanceToPoliceStation;
  STRING1 crimeRiskCategory;
  BOOLEAN isAtRiskOfFlood;
END;

// A recordset of points in WKT and using WGS84 as the coordinate system
buildings_RSet := DATASET(
  [
    {'POINT(-84.26180076961152565 34.07911885643056848)',	7},
    {'POINT(-84.28096707737662996 34.07386384848369687)',	6},
    {'POINT(-84.28034594703238724 34.06441871524530995)',	5},
    {'POINT(-84.27511070555951278 34.07632602535307598)',	4},
    {'POINT(-84.26401909226950693 34.07132809899221115)',	3},
    {'POINT(-84.2606472418293464 34.06904953470633046)',	2},
    {'POINT(-84.25332677705793571 34.07173235399688593)',	1}
  ],
  BuildingLayout
);

// SRID = Spatial Reference System Identifier, and in this case correlates to the matching EPSG id (http://www.epsg.org/)
// Universal Transverse Mercator (UTM) Zone 16 North... X,Y in meters, good for showing local distances
UTMZ16N_SRID := 32616; 

// World Geodetic System (WGS) ... Longitude,Latitude good for using as the base coordinate system
WGS84_SRID := 4326; 

// The location of the fire station given as WGS84 and also converted to a local UTM point
fire_station_point := 'POINT(-84.27361647195701266 34.07592838651884648)';
fire_station_point_UTMZ16N := Geometry.toSRID(fire_station_point,WGS84_SRID,UTMZ16N_SRID);

// The location of the police station given as WGS84 and also converted to a local UTM point
police_station_point := 'POINT(-84.28388903577211977 34.06841445050786632)';
police_station_point_UTMZ16N := Geometry.toSRID(police_station_point,WGS84_SRID,UTMZ16N_SRID);

// The large flood zone...pre-generated from a line representing a segment of river path and buffered 500m both sides of that line
river_flood_buffer_polygon := 'POLYGON ((-84.275569480814269 34.053149448508812,-84.275502368216152 34.053622020238336,-84.275461577548469 34.053856279819868,-84.275406062884485 34.054088433147875,-84.275335977314512 34.054317839528537,-84.275251514146206 34.054543865844074,-84.275152906372298 34.054765888300402,-84.275040426028681 34.05498329414894,-84.274914383444596 34.055195483377922,-84.274775126387269 34.055401870368861,-84.274623039102778 34.055601885513035,-84.274458541256465 34.055794976783979,-84.274282086775159 34.055980611261425,-84.274094162595034 34.056158276602602,-84.273895287317842 34.056327482456609,-84.273686009780079 34.056487761818282,-84.27346690753815 34.056638672317504,-84.271467966412999 34.057943350956521,-84.271449849097706 34.057955123901209,-84.269457062666064 34.059244375758553,-84.267464263063459 34.060541242500669,-84.267230385493164 34.060685184060439,-84.266987555911015 34.060818462943388,-84.26673647534858 34.060940694367893,-84.266477868667863 34.061051525446047,-84.266212482468021 34.061150636202917,-84.265941082929231 34.06123774050058,-84.265664453599811 34.061312586864624,-84.265383393133249 34.06137495921039,-84.265341784003652 34.061382226124834,-84.265314901719449 34.061585486783336,-84.265310183323066 34.061619943334122,-84.265111905163181 34.063019950080957,-84.26507118648162 34.063254301019093,-84.265015731142569 34.063486548072767,-84.264945692179822 34.063716049839776,-84.264861262907829 34.063942172494826,-84.264762676389054 34.064164291540052,-84.264650204791437 34.064381793530238,-84.264524158637713 34.064594077767168,-84.264384885948843 34.064800557959067,-84.264232771283574 34.065000663840181,-84.264068234677239 34.065193842746154,-84.263891730482143 34.06537956114078,-84.263703746113279 34.065557306090064,-84.26350480070252 34.06572658667919,-84.263295443665044 34.065886935368937,-84.263076253182007 34.066037909287367,-84.262751452284149 34.066249907042753,-84.262614616529277 34.06728718811172,-84.262609735088645 34.067322879116531,-84.262411456952634 34.068722885868624,-84.262370565581591 34.06895807170757,-84.262314831409327 34.069191136361503,-84.262244409341704 34.069421431556577,-84.262159495144019 34.069648316717327,-84.262060324897817 34.06987116074864,-84.26194717434521 34.070089343791523,-84.261820358123074 34.070302258947599,-84.261680228888764 34.070509313967584,-84.261527176339996 34.070709932899128,-84.261361626131588 34.070903557689206,-84.261184038692122 34.071089649736948,-84.260994907943669 34.071267691392244,-84.260794759928146 34.071437187396057,-84.260584151344403 34.071597666258612,-84.260363667999641 34.071748681571322,-84.258364726197328 34.073045730610005,-84.258134929191982 34.073186884602521,-84.257896506679955 34.073317758170241,-84.257650122125995 34.07343798711355,-84.257396461160667 34.073547236854345,-84.257136229671985 34.073645203367583,-84.257122298651311 34.073649727584275,-84.256948086173011 34.073809590460101,-84.256741016186922 34.073980186060624,-84.256523217046393 34.07414126483485,-84.254624040614956 34.075471446270598,-84.254724400109581 34.075597079684485,-84.254869780406139 34.075801719022742,-84.255001988909029 34.076012453715521,-84.255120657537674 34.076228697222078,-84.255225455889686 34.076449847663952,-84.255316092161522 34.076675289500066,-84.255392313961806 34.076904395239524,-84.255453909015031 34.077136527188117,-84.25550070575369 34.077371039223024,-84.255532573797112 34.077607278591138,-84.255549424315745 34.077844587725856,-84.255551210279833 34.078082306077405,-84.255537926591671 34.078319771951413,-84.255509610101271 34.078556324350842,-84.255466339505119 34.078791304815994,-84.255161247280483 34.080198945384261,-84.255103785139212 34.080429419521792,-84.255031936455268 34.080657082246773,-84.254945897195938 34.08088131225167,-84.254868679322044 34.081051343080176,-84.237240155349298 34.081049738427836,-84.237280208789684 34.080978287916651,-84.237412814623895 34.080770733580913,-84.237558247861628 34.080569215695952,-84.237716112647789 34.080374282655967,-84.237885979306 34.080186464932545,-84.238067385508458 34.080006273631433,-84.238259837534528 34.079834199102123,-84.238462811614468 34.079670709603825,-84.238675755355032 34.07951625003173,-84.238898089242582 34.079371240706635,-84.239129208220191 34.079236076231616,-84.23936848333372 34.079111124418603,-84.239615263443213 34.07899672528783,-84.239868876994265 34.078893190142935,-84.240128633844932 34.07880080072416,-84.240393827143151 34.078719808441974,-84.240663735249441 34.078650433693163,-84.240937623699878 34.078592865261356,-84.241214747203784 34.078547259803486,-84.241494351670951 34.078513741423592,-84.241775676262534 34.078492401335303,-84.242057955460552 34.078483297613602,-84.242276383939895 34.078485739280964,-84.242492265400855 34.078344838751413,-84.242669896687488 34.07823372485575,-84.243425452296762 34.077780983762665,-84.243369611642905 34.077645512126423,-84.243292744497253 34.077422098004902,-84.243229822889347 34.077195683226336,-84.243181014084243 34.0769668700799,-84.243146447802232 34.07673626722972,-84.243126215874952 34.076504488095637,-84.243120372002366 34.076272149221445,-84.24312893161121 34.076039868634815,-84.243151871815115 34.075808264203388,-84.243189131476854 34.075577951991299,-84.243461269006659 34.074171169484238,-84.243661428789082 34.072767093882938,-84.243859572282346 34.071365470820695,-84.243900570306963 34.071129462053882,-84.243956514650279 34.070895588727439,-84.244027248504821 34.070664505863085,-84.244112573646277 34.07043686066212,-84.244212250989861 34.070213290693019,-84.244326001261058 34.06999442210612,-84.244453505778935 34.069780867880262,-84.244594407349553 34.069573226106428,-84.244748311267287 34.069372078313052,-84.244914786420949 34.069177987837776,-84.245093366501834 34.068991498250192,-84.245283551310166 34.068813131829963,-84.24548480815632 34.068643388104555,-84.24569657335293 34.068482742450705,-84.245918253793548 34.06833164476361,-84.246149228613717 34.068190518197326,-84.246388850929605 34.068059757980151,-84.246636449649188 34.067939730308254,-84.246891331351378 34.06783077132043,-84.247152782227332 34.067733186157128,-84.247420070078917 34.067647248106269,-84.247692446368404 34.06757319783808,-84.24796914831397 34.067511242731527,-84.248249401024921 34.067461556293573,-84.248295264997253 34.067455515234712,-84.248322821978107 34.067402432756531,-84.248449719882316 34.067189575141107,-84.248589924187186 34.066982580508565,-84.248743044927195 34.066782024461098,-84.248908656228465 34.066588464693275,-84.249086297493619 34.066402439441589,-84.249275474683074 34.066224465988277,-84.249475661689175 34.066055039223322,-84.249686301799471 34.065894630268701,-84.249906809244763 34.065743685168833,-84.250136570827962 34.065602623650683,-84.250374947629027 34.065471837957027,-84.250621276781317 34.06535169175622,-84.250874873314345 34.065242519131296,-84.25113503205796 34.065144623651378,-84.251401029602519 34.065058277527839,-84.251672126309671 34.064983720857754,-84.251947568368251 34.064921160956473,-84.252035386818392 34.064905301684796,-84.25204925128071 34.064800299544302,-84.252093345968433 34.064537746115043,-84.252398611067974 34.063042402369028,-84.252453765203541 34.06280954236135,-84.252523575220749 34.062579422357807,-84.252607847320334 34.062352680810513,-84.252706347584663 34.062129946793526,-84.252818802627843 34.061911838257828,-84.25294490035526 34.06169896031723,-84.253084290830429 34.061491903569909,-84.253236587246619 34.061291242460207,-84.253401367000805 34.061097533685391,-84.253578172866739 34.060911314651555,-84.253766514263859 34.060733101983146,-84.253965868618735 34.060563390090095,-84.254175682815074 34.0604026497967,-84.254395374728318 34.060251327035772,-84.254617325955067 34.06010647236338,-84.254750577269093 34.059099870797915,-84.254794130832508 34.058840403128563,-84.255099396160134 34.057341244692481,-84.255154469659445 34.05710822660803,-84.255224216885622 34.056877947209394,-84.255308443981519 34.056651046156581,-84.255406916874094 34.056428153720624,-84.255519361925792 34.056209889033113,-84.255645466695697 34.055996858366782,-84.255784880808292 34.055789653451782,-84.255937216927606 34.055588849832475,-84.256102051833835 34.055395005269098,-84.256278927599539 34.055208658188938,-84.25646735286206 34.055030326191243,-84.256666804188725 34.054860504609906,-84.256876727531036 34.054699665138081,-84.257096539763594 34.054548254518501,-84.258948432267829 34.053339603420973,-84.275569480814269 34.053149448508812))';



// A transform used to augment the details of a building with dereived peril information
PerilAugmentedBuildingLayout PerilAugmentTransform(BuildingLayout L) := TRANSFORM
  // convert coordinate reference system of point to UTM 16 N so that 
  // distance is calculated in meters
  STRING geom_UTMZ16N := Geometry.toSRID(L.geom,WGS84_SRID,UTMZ16N_SRID);
  
  // Perform distance calculations
  SELF.distanceToFireStation := Geometry.distanceBetween(geom_UTMZ16N, fire_station_point_UTMZ16N,UTMZ16N_SRID);
  SELF.distanceToPoliceStation := Geometry.distanceBetween(geom_UTMZ16N, police_station_point_UTMZ16N,UTMZ16N_SRID);
  
  // Perform point in polygon assertion
  SELF.isAtRiskOfFlood := Geometry.isWithin(L.GEOM,river_flood_buffer_polygon,WGS84_SRID);
  
  // Apply rules for fire and crime
  SELF.fireRiskCategory := MAP(
              SELF.distanceToFireStation <= 500 => 'D',
              SELF.distanceToFireStation <= 1000 => 'C',
              SELF.distanceToFireStation <= 1500 => 'B',
              'A'
            );
  SELF.crimeRiskCategory := MAP(
              SELF.distanceToPoliceStation <= 600 => 'D',
              SELF.distanceToPoliceStation <= 1200 => 'C',
              SELF.distanceToPoliceStation <= 1900 => 'B',
              'A'
            );
  SELF := L;
END;

// Generate a new recordset with Peril details augmented
buildingsWithPerilsRset := PROJECT(buildings_RSet, PerilAugmentTransform(LEFT));

OUTPUT(buildingsWithPerilsRset,NAMED('Buildings_With_Perils'));

Integrating spatial libraries into HPCC Systems

Integrating the spatial libraries into ECL via Inline C++:

Inline C++ within ECL is a convenient mechanism to extend ECL; it does not require building from the HPCC Systems source code and we do not need any external team to support incremental releases.

Installing the spatial C++ libraries, getting your Ubuntu based HPCC platform ready:

To use the spatial libraries within ECL BEGINC++ functions, you will still need to install the spatial libraries [once] to each node on the HPCC Systems cluster.

We installed this onto the 4.X Ubuntu VM image available for download on the HPCC Systems website (http://hpccsystems.com/download/hpcc-vm-image).

The minimum versions of the spatial libraries used in this article:

  • GEOS >= 3.2.2
  • PROJ.4 >= 4.7.0
  • GDAL >= 1.10.1
File : [ install.sh ]
  • Copy “install.sh” via ssh to a folder on the HPCC Systems Ubuntu VM
  • Uncomment the dependencies line if required
  • Execute “install.sh” to install the spatial libraries
#!/bin/bash

# dependencies : if not already installed
# sudo apt-get -y install g++ gcc make cmake bison flex binutils-dev libldap2-dev libcppunit-dev libicu-dev libxalan110-dev zlib1g-dev libboost-regex-dev libssl-dev libarchive-dev  libapr1-dev libaprutil1-dev subversion

# install PROJ.4
sudo apt-get -y install libproj-dev

# install GEOS
sudo apt-get -y install libgeos-dev

# compile and install GDAL
svn co https://svn.osgeo.org/gdal/branches/1.10/gdal gdal_stable

cd gdal_stable
./configure  --prefix=/usr --with-threads --with-ogr --with-geos --without-libtool --with-libz=internal --with-libtiff=internal --with-geotiff=internal --with-gif --without-pg --without-grass --without-libgrass --without-cfitsio --without-pcraster --without-netcdf --with-png --with-jpeg --without-ogdi --without-fme --without-hdf4 --without-hdf5 --without-jasper --without-ecw --without-kakadu --without-mrsid --without-jp2mrsid --without-bsb --without-grib --without-mysql --without-ingres --with-xerces --without-expat --without-odbc --with-curl --without-sqlite3 --without-dwgdirect --without-panorama --without-idb --without-sde --without-perl --without-php --without-ruby --without-python --without-ogpython --with-hide-internal-symbols
sudo make install
cd ../
File : [GeometryLite.ecl]

Contains a Geometry module “GeometryLite”, truncated for the purposes of demonstration just to include 4 spatial functions : SpatialReferenceForSRID, distanceBetween, tranformToProjection , and hasSpatialRelation.

IMPORT $;

EXPORT GeometryLite := MODULE

      EXPORT ExtLibSpatial := MODULE
        /* SpatialReferenceForSRID

              Given a SRID return the WKT representing the Spatial Projection Details for that SRID
        */
        export STRING SpatialReferenceForSRID(INTEGER4 srid) :=
                BEGINC++
#option library 'geos'
#option library 'proj'
#option library 'gdal'

// #option once : Indicates the function has no side effects and is evaluated at query execution time, even if the parameters are constant, allowing the optimizer to make more efficient calls to the function in some cases.
#option once

#include <iostream>
#include <sstream>
#include <string>
#include "ogrsf_frmts.h" // GDAL
#include "cpl_conv.h"
#include "gdal_priv.h"

using namespace std;

#body
// #body : tell HPCC that everything up to this point is in global scope and that
// the following section is to be encapsulated into a function/procedure

char *wktOut;

// determine the spatial reference details
OGRSpatialReference * poSRS = new OGRSpatialReference(NULL);
poSRS->importFromEPSG(srid);

poSRS->exportToWkt(&wktOut);

// copy string into a char array
unsigned len = strlen(wktOut);
char * out = (char *) malloc(len);
for(unsigned i=0; i < len; i++) {
    out[i] = wktOut[i];
}

// free resources
free(wktOut);
OGRSpatialReference::DestroySpatialReference(poSRS);


//return result to ECL
__result = out;

// set length of return string
__lenResult = len;
ENDC++;
/* tranformToProjection

             Transform a geometry from one SRID projection to another
*/
EXPORT string tranformToProjection(const string  geom,  STRING srs1, STRING srs2):=
        BEGINC++
#option library 'geos'
#option library 'proj'
#option library 'gdal'

// #option once : Indicates the function has no side effects and is evaluated at query execution time, even if the parameters are constant, allowing the optimizer to make more efficient calls to the function in some cases.
#option once

#include <iostream>
#include <sstream>
#include <string>
#include "ogrsf_frmts.h" // GDAL
#include "cpl_conv.h"
#include "gdal_priv.h"

        using namespace std;

#body
// #body : tell HPCC that everything up to this point is in global scope and that
// the following section is to be encapsulated into a function/procedure

OGRGeometry *thisGeom;
char *wkt;
char* wktIn = (char*) geom;


// determine the spatial reference details
char* wktSRSSourceIn = (char*) srs1;
OGRSpatialReference *sourceSRS = new OGRSpatialReference(NULL);
sourceSRS->importFromWkt(&wktSRSSourceIn);

char* wktSRSTargetIn = (char*) srs2;
OGRSpatialReference *targetSRS = new OGRSpatialReference(NULL);
targetSRS->importFromWkt(&wktSRSTargetIn);


// create geometry from given WKT
OGRErr err = OGRGeometryFactory::createFromWkt(&wktIn, sourceSRS, &thisGeom);

thisGeom->transformTo(targetSRS);

thisGeom->exportToWkt(&wkt);

unsigned len = strlen(wkt);

// copy string into a char array
char * out = (char *) malloc(len);
for(unsigned i=0; i < len; i++) {
    out[i] = wkt[i];
}

//return result to ECL
__result = out;

// set length of return string
__lenResult = len;

free(wkt);
OGRSpatialReference::DestroySpatialReference(sourceSRS);
OGRSpatialReference::DestroySpatialReference(targetSRS);
OGRGeometryFactory::destroyGeometry(thisGeom);
ENDC++;


/* distanceBetween

        Get the distance between the 2 given WKT geometries, the distance unit returned depdends on the SRID used
*/
EXPORT REAL8 distanceBetween(const string  geom1, const string  geom2, STRING srs):=
        BEGINC++
#option library 'geos'
#option library 'proj'
#option library 'gdal'

// #option once : Indicates the function has no side effects and is evaluated at query execution time, even if the parameters are constant, allowing the optimizer to make more efficient calls to the function in some cases.
#option once

#include <iostream>
#include <sstream>
#include <string>
#include "ogrsf_frmts.h" // GDAL
#include "cpl_conv.h"
#include "gdal_priv.h"

        using namespace std;

#body

// #body : tell HPCC that everything up to this point is in global scope and that
// the following section is to be encapsulated into a function/procedure


// determine the spatial reference details
char* wktSRSIn = (char*) srs;
OGRSpatialReference * poSRS = new OGRSpatialReference(NULL);
poSRS->importFromWkt(&wktSRSIn);

bool hasAtLeastOneValidRelation = false;

char* wktInLeft = (char*) geom1;
char* wktInRight = (char*) geom2;

OGRGeometry *leftOGRGeom;
OGRGeometry *rightOGRGeom;

bool loadedOK = false;
OGRErr err =  NULL;

err = OGRGeometryFactory::createFromWkt(&wktInLeft, poSRS, &leftOGRGeom);
loadedOK = (err == OGRERR_NONE);

err = OGRGeometryFactory::createFromWkt(&wktInRight, poSRS, &rightOGRGeom);
loadedOK = (err == OGRERR_NONE);

double distance = leftOGRGeom->Distance(rightOGRGeom);

OGRGeometryFactory::destroyGeometry(leftOGRGeom);
OGRGeometryFactory::destroyGeometry(rightOGRGeom);
OGRSpatialReference::DestroySpatialReference(poSRS);

return distance;
ENDC++;

/* hasSpatialRelation

     Do the two given WKT geometries have at least one of the expected relations defined in relationTypeORBits [a single INT containing OR'd bits]

     @see <a href="http://en.wikipedia.org/wiki/DE-9IM">Wikipedia</a>

     usage:
     hasSpatialRelation("POINT(? ?)","POLYGON((? ?,? ?,? ?,? ?,? ?))", rel.WITHIN | rel.OVERLAPS, SRS(4326));


     @param geom1 STRING containing a WKT geometry, left side of predicate assertion
     @param geom2 STRING containing a WKT geometry, right side of predicate assertion
     @param rel INTEGER contains one or more bits representing what spatial relations should be evaluated
     @param srs the WKT Spatial reference details as got from Operation.SRS
*/
EXPORT boolean hasSpatialRelation(const string  geom1, const string  geom2, INTEGER rel, STRING srs):=
        BEGINC++
#option library 'geos'
#option library 'proj'
#option library 'gdal'

// #option once : Indicates the function has no side effects and is evaluated at query execution time, even if the parameters are constant, allowing the optimizer to make more efficient calls to the function in some cases.
#option once

#include <iostream>
#include <sstream>
#include <string>
#include "ogrsf_frmts.h" // GDAL
#include "cpl_conv.h"
#include "gdal_priv.h"

    /**
    Enumeration of all supported relation types
    */
        namespace RelationType {
    enum SpatialPredicate {
        INTERSECTS = 1 << 0,
        TOUCHES = 1 << 1,
        DISJOINT = 1 << 2,
        CROSSES = 1 << 3,
        WITHIN = 1 << 4,
        CONTAINS = 1 << 5,
        OVERLAPS = 1 << 6,
        EQUALS = 1 << 7
    };

    bool isBitwiseSpatialPredicate(int packedInteger, RelationType::SpatialPredicate predicate) {
        return (packedInteger & predicate) == predicate ;
    }
}

using namespace std;

#body

  // #body : tell HPCC that everything up to this point is in global scope and that
// the following section is to be encapsulated into a function/procedure


// determine the spatial reference details
char* wktSRSIn = (char*) srs;
OGRSpatialReference * poSRS = new OGRSpatialReference(NULL);
poSRS->importFromWkt(&wktSRSIn);

bool hasAtLeastOneValidRelation = false;

char* wktInLeft = (char*) geom1;
char* wktInRight = (char*) geom2;

OGRGeometry *leftOGRGeom;
OGRGeometry *rightOGRGeom;

bool loadedOK = false;
OGRErr err =  NULL;

// parse geom 1
err = OGRGeometryFactory::createFromWkt(&wktInLeft, poSRS, &leftOGRGeom);
loadedOK = (err == OGRERR_NONE);

if(loadedOK) {
    // parse geom 2
    err = OGRGeometryFactory::createFromWkt(&wktInRight, poSRS, &rightOGRGeom);
    loadedOK = (err == OGRERR_NONE);

    if(loadedOK) {
        // assert if a relation exists
        int relationTypePackedBitwise = rel;
        if( !hasAtLeastOneValidRelation && RelationType::isBitwiseSpatialPredicate(relationTypePackedBitwise , RelationType::INTERSECTS)) {
            hasAtLeastOneValidRelation = leftOGRGeom->Intersects(rightOGRGeom);
        } 

        if( !hasAtLeastOneValidRelation && RelationType::isBitwiseSpatialPredicate(relationTypePackedBitwise , RelationType::TOUCHES)) {
            hasAtLeastOneValidRelation = leftOGRGeom->Touches(rightOGRGeom);
        } 

        if( !hasAtLeastOneValidRelation && RelationType::isBitwiseSpatialPredicate(relationTypePackedBitwise , RelationType::DISJOINT)) {
            hasAtLeastOneValidRelation = leftOGRGeom->Disjoint(rightOGRGeom);
        } 

        if( !hasAtLeastOneValidRelation && RelationType::isBitwiseSpatialPredicate(relationTypePackedBitwise , RelationType::CROSSES)) {
            hasAtLeastOneValidRelation = leftOGRGeom->Crosses(rightOGRGeom);
        } 

        if( !hasAtLeastOneValidRelation && RelationType::isBitwiseSpatialPredicate(relationTypePackedBitwise , RelationType::WITHIN)) {
            hasAtLeastOneValidRelation = leftOGRGeom->Within(rightOGRGeom);
        } 

        if( !hasAtLeastOneValidRelation && RelationType::isBitwiseSpatialPredicate(relationTypePackedBitwise , RelationType::CONTAINS)) {
            hasAtLeastOneValidRelation = leftOGRGeom->Contains(rightOGRGeom);
        } 

        if( !hasAtLeastOneValidRelation && RelationType::isBitwiseSpatialPredicate(relationTypePackedBitwise , RelationType::OVERLAPS)) {
            hasAtLeastOneValidRelation = leftOGRGeom->Overlaps(rightOGRGeom);
        } 

        if( !hasAtLeastOneValidRelation && RelationType::isBitwiseSpatialPredicate(relationTypePackedBitwise , RelationType::EQUALS)) {
            hasAtLeastOneValidRelation = leftOGRGeom->Equals(rightOGRGeom);
        }
        // clean right
        OGRGeometryFactory::destroyGeometry(rightOGRGeom);
    }
    // clean left
    OGRGeometryFactory::destroyGeometry(leftOGRGeom);
}
// return result
return hasAtLeastOneValidRelation;
ENDC++;

EXPORT SRS :=  SpatialReferenceForSRID;
END;

EXPORT Filter :=  MODULE
      /*
      Bitwise enumeration for all possible Spatial Relations

      Can be combined e.g.

      All WITHIN or TOUCHING OR INTERSECTING = RelationType.WITHIN | RelationType.TOUCHES | RelationType.INTERSECTS
      */
      EXPORT RelationType := ENUM
        (
            INTERSECTS = 1 << 0,
            TOUCHES = 1 << 1,
            DISJOINT = 1 << 2,
            CROSSES = 1 << 3,
            WITHIN = 1 << 4,
            CONTAINS = 1 << 5,
            OVERLAPS = 1 << 6,
            EQUALS = 1 << 7
        );


/*
     hasSpatialRelation

     Does [this] and [thatOther] have one of the bitwise RelationTypes given in [relationTypeORBits] ?
*/
EXPORT BOOLEAN hasSpatialRelation(const string  this, const string  thatOther, INTEGER relationTypeORBits,
                                  INTEGER4 srid) := FUNCTION
                                              STRING srs := ExtLibSpatial.SRS(srid);
return ExtLibSpatial.hasSpatialRelation(this,thatOther,relationTypeORBits, srs);
END;

/*
    isWithin
*/
EXPORT BOOLEAN isWithin(const string  thisGeom, const string  thatOtherGeom, INTEGER4  srid) := FUNCTION
            return hasSpatialRelation(thisGeom,thatOtherGeom,RelationType.WITHIN, srid);
END;
END;

EXPORT Operation :=  MODULE
      EXPORT STRING tranformToProjection(const string geometryWKT, INTEGER4 sourceSRID, INTEGER4 targetSRID) := FUNCTION
                  STRING srs1 :=  ExtLibSpatial.SRS(sourceSRID);
                  STRING srs2 :=  ExtLibSpatial.SRS(targetSRID);

           return ExtLibSpatial.tranformToProjection(geometryWKT,srs1,srs2);
END;

/*
    distanceBetween

    Calculate the distance between 2 points, using the projection given by srid
*/
EXPORT REAL8 distanceBetween(const string  point_A_WKT, const string  point_B_WKT, INTEGER4 srid) := FUNCTION
            STRING srs := ExtLibSpatial.SRS(srid);
return ExtLibSpatial.distanceBetween(point_A_WKT,point_B_WKT,srs);
END;
END;

EXPORT toSRID :=  Operation.tranformToProjection;
EXPORT distanceBetween :=  Operation.distanceBetween;
EXPORT isWithin :=  Filter.isWithin;
END;

What about the binary raster files?

Raster files like the GEOTiff format represent a large matrix of peril values where each pixel correlates to x meters; x relates to a fixed resolution. Using the dfuplus HPCC Systems client tool we were able to spray all raster files contained in the HPCC Systems drop-zone into the data/blob fields of a dataset. We then used GDAL through inline C++ to read the blob data for each record (1 record per raster file) and retrieve the raster values corresponding to a X,Y coordinate.

Representing geometries in datasets, best practice not prescription:

EXPORT Layout := RECORD
  STRING GEOM := '';
END;

All the geometries used in this blog post have been defined as a WKT (Well-Known-Text) string literal. WKT is widely accepted and widely supported. In pursuit of a canonical way of defining geometries I think that a record layout with a GEOM field containing a WKT string literal is a good start.

For XML datasets based on geo reference data like GML (Geography Mark-up Language); the geometry elements could be translated into WKT for the purposes of canonical search and retrieval. Once relevant matching WKT GEOM records are found through a process of spatial filtering, the associated detail required could then be retrieved from the original GML record.

About the author:

Andrew is a software engineer based in the LexisNexis Risk, Dublin, Ireland office. His role primarily consists of developing geo-spatial risk solutions using Java, he is an advocate of clean code, and has a keen interest in working with distributed systems.

Acknowledgements:

Thanks to everyone for the help along the way:

  • Algorithms team:
    • Charles Kaminski
    • Greg Thompson
    • Jack Coleman
  • GIS Modelling Team, Dublin, Ireland:
    • Daniel Bang
  • Product Development Team, Dublin, Ireland:
    • Jason Moore
    • Greg McRandal
  • HPCC Systems Core Team:
    • Richard Chapman
    • Gordon Smith
    • Gavin Halliday
  • Training:
    • Bob Foreman
    • Alan Daniels

More information: