Input with Complex XML Formats

XML data comes in many possible formats, and some of them make use of "child datasets" such that a given tag may contain multiple instances of other tags that contain individual field tags themselves.

Here's an example of such a complex structure using UCC data. An individual Filing may contain one or more Transactions, which in turn may contain multiple Debtor and SecuredParty records:

<UCC>
  <Filing number='5200105'>
    <Transaction ID='5'>
  <StartDate>08/01/2001</StartDate>
  <LapseDate>08/01/2006</LapseDate>
  <FormType>UCC 1 FILING STATEMENT</FormType>
  <AmendType>NONE</AmendType>
  <AmendAction>NONE</AmendAction>
  <EnteredDate>08/02/2002</EnteredDate>
  <ReceivedDate>08/01/2002</ReceivedDate>
  <ApprovedDate>08/02/2002</ApprovedDate>
  <Debtor entityId='19'>
    <IsBusiness>true</IsBusiness>
    <OrgName><![CDATA[BOGUS LABORATORIES, INC.]]></OrgName>
    <Status>ACTIVE</Status>
    <Address1><![CDATA[334 SOUTH 900 WEST]]></Address1>
    <Address4><![CDATA[SALT LAKE CITY 45 84104]]></Address4>
    <City><![CDATA[SALT LAKE CITY]]></City>
    <State>UTAH</State>
    <Zip>84104</Zip>
    <OrgType>CORP</OrgType>
    <OrgJurisdiction><![CDATA[SALT LAKE CITY]]></OrgJurisdiction>
    <OrgID>654245-0142</OrgID>
    <EnteredDate>08/02/2002</EnteredDate>
  </Debtor>
  <Debtor entityId='7'>
    <IsBusiness>false</IsBusiness>
    <FirstName><![CDATA[FRED]]></FirstName>
    <LastName><![CDATA[JONES]]></LastName>
    <Status>ACTIVE</Status>
    <Address1><![CDATA[1038 E. 900 N.]]></Address1>
    <Address4><![CDATA[OGDEN 45 84404]]></Address4>
    <City><![CDATA[OGDEN]]></City>
    <State>UTAH</State>
    <Zip>84404</Zip>
    <OrgType>NONE</OrgType>
    <EnteredDate>08/02/2002</EnteredDate>
  </Debtor>
  <SecuredParty entityId='20'>
    <IsBusiness>true</IsBusiness>
    <OrgName><![CDATA[WELLS FARGO BANK]]></OrgName>
    <Status>ACTIVE</Status>
    <Address1><![CDATA[ATTN: LOAN OPERATIONS CENTER]]></Address1>
    <Address3><![CDATA[P.O. BOX 9120]]></Address3>
    <Address4><![CDATA[BOISE 13 83707-2203]]></Address4>
    <City><![CDATA[BOISE]]></City>
    <State>IDAHO</State>
    <Zip>83707-2203</Zip>
    <Status>ACTIVE</Status>
    <EnteredDate>08/02/2002</EnteredDate>
  </SecuredParty>
  <Collateral>
    <Action>ADD</Action>
    <Description><![CDATA[ALL ACCOUNTS]]></Description>
    <EffectiveDate>08/01/2002</EffectiveDate>
  </Collateral>
    </Transaction>
    <Transaction ID='375799'>
  <StartDate>08/01/2002</StartDate>
  <LapseDate>08/01/2006</LapseDate>
  <FormType>UCC 3 AMENDMENT</FormType>
  <AmendType>TERMINATION BY DEBTOR</AmendType>
  <AmendAction>NONE</AmendAction>
  <EnteredDate>02/23/2004</EnteredDate>
  <ReceivedDate>02/18/2004</ReceivedDate>
  <ApprovedDate>02/23/2004</ApprovedDate>
    </Transaction>
  </Filing>
</UCC>

The key to working with this type of complex XML data are the RECORD structures that define the layout of the XML data.

CollateralRec := RECORD
  STRING Action        {XPATH('Action')}; 
  STRING Description   {XPATH('Description')}; 
  STRING EffectiveDate {XPATH('EffectiveDate')}; 
END;


PartyRec := RECORD
  STRING   PartyID         {XPATH('@entityId')}; 
  STRING   IsBusiness      {XPATH('IsBusiness')};
  STRING   OrgName         {XPATH('OrgName')};
  STRING   FirstName       {XPATH('FirstName')};
  STRING   LastName        {XPATH('LastName')};
  STRING   Status          {XPATH('Status[1]')};
  STRING   Address1        {XPATH('Address1')};
  STRING   Address2        {XPATH('Address2')};
  STRING   Address3        {XPATH('Address3')};
  STRING   Address4        {XPATH('Address4')};
  STRING   City            {XPATH('City')};
  STRING   State           {XPATH('State')};
  STRING   Zip             {XPATH('Zip')};
  STRING   OrgType         {XPATH('OrgType')};
  STRING   OrgJurisdiction {XPATH('OrgJurisdiction')};
  STRING   OrgID           {XPATH('OrgID')};
  STRING10 EnteredDate     {XPATH('EnteredDate')};
END;

TransactionRec := RECORD
  STRING            TransactionID  {XPATH('@ID')}; 
  STRING10          StartDate      {XPATH('StartDate')};   
  STRING10          LapseDate      {XPATH('LapseDate')};   
  STRING            FormType       {XPATH('FormType')};        
  STRING            AmendType      {XPATH('AmendType')};   
  STRING            AmendAction    {XPATH('AmendAction')};
  STRING10          EnteredDate    {XPATH('EnteredDate')};
  STRING10          ReceivedDate   {XPATH('ReceivedDate')};
  STRING10          ApprovedDate   {XPATH('ApprovedDate')};
  DATASET(PartyRec) Debtors        {XPATH('Debtor')};
  DATASET(PartyRec) SecuredParties {XPATH('SecuredParty')};
  CollateralRec     Collateral     {XPATH('Collateral')}
END;

UCC_Rec := RECORD 
  STRING                  FilingNumber {XPATH('@number')}; 
  DATASET(TransactionRec) Transactions {XPATH('Transaction')};
END; 
UCC := DATASET('~PROGGUIDE::EXAMPLEDATA::XML_UCC',UCC_Rec,XML('UCC/Filing')); 

Building from the bottom up, these RECORD structures combine to create the final UCC_Rec layout that defines the entire format of this XML data.

The XML option on the final DATASET declaration specifies the XPATH to the record tag (Filing) then the child DATASET "field" definitions in the RECORD structures handle the multiple instance issues. Because ECL is case insensitive and XML syntax is case sensitive, it is necessary to use the XPATH to define all the field tags. The PartyRec RECORD structure works with both the Debtors and SecuredParties child DATASET fields because both contain the same tags and information.

Once you've defined the layout, how can you extract the data into a normalized relational structure to work with it in the supercomputer? NORMALIZE is the answer. NORMALIZE needs to know how many times to call its TRANSFORM, so you must use the TABLE function to get the counts, like this:

XactTbl := TABLE(UCC,{INTEGER XactCount := COUNT(Transactions), UCC});

OUTPUT(XactTbl);

This TABLE function gets the counts of the multiple Transaction records per Filing so that we can use NORMALIZE to extract them into a table of their own.

Out_Transacts := RECORD
   STRING            FilingNumber;
   STRING            TransactionID; 
   STRING10          StartDate;     
   STRING10          LapseDate;     
   STRING            FormType;         
   STRING            AmendType;     
   STRING            AmendAction;
   STRING10          EnteredDate;
   STRING10          ReceivedDate;
   STRING10          ApprovedDate;
   DATASET(PartyRec) Debtors;
   DATASET(PartyRec) SecuredParties;
   CollateralRec     Collateral;
END;

Out_Transacts Get_Transacts(XactTbl L, INTEGER C) := TRANSFORM
  SELF.FilingNumber := L.FilingNumber;
  SELF              := L.Transactions[C]; 
END;

Transacts := NORMALIZE(XactTbl,LEFT.XactCount,Get_Transacts(LEFT,COUNTER));

OUTPUT(Transacts);

This NORMALIZE extracts all the Transactions into a separate recordset with just one Transaction per record with the parent information (the Filing number) appended. However, each record here still contains multiple Debtor and SecuredParty child records.

PartyCounts := TABLE(Transacts,
                     {INTEGER DebtorCount := COUNT(Debtors), 
                      INTEGER PartyCount := COUNT(SecuredParties),
                      Transacts});

OUTPUT(PartyCounts);

This TABLE function gets the counts of the multiple Debtor and SecuredParty records for each Transaction.

Out_Parties := RECORD
   STRING   FilingNumber;
   STRING   TransactionID;
   PartyRec; 
END;

Out_Parties Get_Debtors(PartyCounts L, INTEGER C) := TRANSFORM
  SELF.FilingNumber  := L.FilingNumber;
  SELF.TransactionID := L.TransactionID; 
  SELF               := L.Debtors[C]; 
END;
TransactDebtors := NORMALIZE( PartyCounts,
                              LEFT.DebtorCount,
                              Get_Debtors(LEFT,COUNTER));

OUTPUT(TransactDebtors);

This NORMALIZE extracts all the Debtors into a separate recordset.

Out_Parties Get_Parties(PartyCounts L, INTEGER C) := TRANSFORM
  SELF.FilingNumber  := L.FilingNumber;
  SELF.TransactionID := L.TransactionID; 
  SELF               := L.SecuredParties[C]; 
END;

TransactParties := NORMALIZE(PartyCounts,
                             LEFT.PartyCount,
                             Get_Parties(LEFT,COUNTER));

OUTPUT(TransactParties);

This NORMALIZE extracts all the SecuredParties into a separate recordset. With this, we've now broken out all the child records into their own normalized relational structure that we can work with easily.