XML data comes in many possible formats, and some of them make use of "child datasets" such that a given tag may contain multiple instances of other tags that contain individual field tags themselves.
Here's an example of such a complex structure using UCC data. An individual Filing may contain one or more Transactions, which in turn may contain multiple Debtor and SecuredParty records:
<UCC> <Filing number='5200105'> <Transaction ID='5'> <StartDate>08/01/2001</StartDate> <LapseDate>08/01/2006</LapseDate> <FormType>UCC 1 FILING STATEMENT</FormType> <AmendType>NONE</AmendType> <AmendAction>NONE</AmendAction> <EnteredDate>08/02/2002</EnteredDate> <ReceivedDate>08/01/2002</ReceivedDate> <ApprovedDate>08/02/2002</ApprovedDate> <Debtor entityId='19'> <IsBusiness>true</IsBusiness> <OrgName><![CDATA[BOGUS LABORATORIES, INC.]]></OrgName> <Status>ACTIVE</Status> <Address1><![CDATA[334 SOUTH 900 WEST]]></Address1> <Address4><![CDATA[SALT LAKE CITY 45 84104]]></Address4> <City><![CDATA[SALT LAKE CITY]]></City> <State>UTAH</State> <Zip>84104</Zip> <OrgType>CORP</OrgType> <OrgJurisdiction><![CDATA[SALT LAKE CITY]]></OrgJurisdiction> <OrgID>654245-0142</OrgID> <EnteredDate>08/02/2002</EnteredDate> </Debtor> <Debtor entityId='7'> <IsBusiness>false</IsBusiness> <FirstName><![CDATA[FRED]]></FirstName> <LastName><![CDATA[JONES]]></LastName> <Status>ACTIVE</Status> <Address1><![CDATA[1038 E. 900 N.]]></Address1> <Address4><![CDATA[OGDEN 45 84404]]></Address4> <City><![CDATA[OGDEN]]></City> <State>UTAH</State> <Zip>84404</Zip> <OrgType>NONE</OrgType> <EnteredDate>08/02/2002</EnteredDate> </Debtor> <SecuredParty entityId='20'> <IsBusiness>true</IsBusiness> <OrgName><![CDATA[WELLS FARGO BANK]]></OrgName> <Status>ACTIVE</Status> <Address1><![CDATA[ATTN: LOAN OPERATIONS CENTER]]></Address1> <Address3><![CDATA[P.O. BOX 9120]]></Address3> <Address4><![CDATA[BOISE 13 83707-2203]]></Address4> <City><![CDATA[BOISE]]></City> <State>IDAHO</State> <Zip>83707-2203</Zip> <Status>ACTIVE</Status> <EnteredDate>08/02/2002</EnteredDate> </SecuredParty> <Collateral> <Action>ADD</Action> <Description><![CDATA[ALL ACCOUNTS]]></Description> <EffectiveDate>08/01/2002</EffectiveDate> </Collateral> </Transaction> <Transaction ID='375799'> <StartDate>08/01/2002</StartDate> <LapseDate>08/01/2006</LapseDate> <FormType>UCC 3 AMENDMENT</FormType> <AmendType>TERMINATION BY DEBTOR</AmendType> <AmendAction>NONE</AmendAction> <EnteredDate>02/23/2004</EnteredDate> <ReceivedDate>02/18/2004</ReceivedDate> <ApprovedDate>02/23/2004</ApprovedDate> </Transaction> </Filing> </UCC>
The key to working with this type of complex XML data are the RECORD structures that define the layout of the XML data.
CollateralRec := RECORD STRING Action {XPATH('Action')}; STRING Description {XPATH('Description')}; STRING EffectiveDate {XPATH('EffectiveDate')}; END; PartyRec := RECORD STRING PartyID {XPATH('@entityId')}; STRING IsBusiness {XPATH('IsBusiness')}; STRING OrgName {XPATH('OrgName')}; STRING FirstName {XPATH('FirstName')}; STRING LastName {XPATH('LastName')}; STRING Status {XPATH('Status[1]')}; STRING Address1 {XPATH('Address1')}; STRING Address2 {XPATH('Address2')}; STRING Address3 {XPATH('Address3')}; STRING Address4 {XPATH('Address4')}; STRING City {XPATH('City')}; STRING State {XPATH('State')}; STRING Zip {XPATH('Zip')}; STRING OrgType {XPATH('OrgType')}; STRING OrgJurisdiction {XPATH('OrgJurisdiction')}; STRING OrgID {XPATH('OrgID')}; STRING10 EnteredDate {XPATH('EnteredDate')}; END; TransactionRec := RECORD STRING TransactionID {XPATH('@ID')}; STRING10 StartDate {XPATH('StartDate')}; STRING10 LapseDate {XPATH('LapseDate')}; STRING FormType {XPATH('FormType')}; STRING AmendType {XPATH('AmendType')}; STRING AmendAction {XPATH('AmendAction')}; STRING10 EnteredDate {XPATH('EnteredDate')}; STRING10 ReceivedDate {XPATH('ReceivedDate')}; STRING10 ApprovedDate {XPATH('ApprovedDate')}; DATASET(PartyRec) Debtors {XPATH('Debtor')}; DATASET(PartyRec) SecuredParties {XPATH('SecuredParty')}; CollateralRec Collateral {XPATH('Collateral')} END; UCC_Rec := RECORD STRING FilingNumber {XPATH('@number')}; DATASET(TransactionRec) Transactions {XPATH('Transaction')}; END; UCC := DATASET('~PROGGUIDE::EXAMPLEDATA::XML_UCC',UCC_Rec,XML('UCC/Filing'));
Building from the bottom up, these RECORD structures combine to create the final UCC_Rec layout that defines the entire format of this XML data.
The XML option on the final DATASET declaration specifies the XPATH to the record tag (Filing) then the child DATASET "field" definitions in the RECORD structures handle the multiple instance issues. Because ECL is case insensitive and XML syntax is case sensitive, it is necessary to use the XPATH to define all the field tags. The PartyRec RECORD structure works with both the Debtors and SecuredParties child DATASET fields because both contain the same tags and information.
Once you've defined the layout, how can you extract the data into a normalized relational structure to work with it in the supercomputer? NORMALIZE is the answer. NORMALIZE needs to know how many times to call its TRANSFORM, so you must use the TABLE function to get the counts, like this:
XactTbl := TABLE(UCC,{INTEGER XactCount := COUNT(Transactions), UCC}); OUTPUT(XactTbl);
This TABLE function gets the counts of the multiple Transaction records per Filing so that we can use NORMALIZE to extract them into a table of their own.
Out_Transacts := RECORD STRING FilingNumber; STRING TransactionID; STRING10 StartDate; STRING10 LapseDate; STRING FormType; STRING AmendType; STRING AmendAction; STRING10 EnteredDate; STRING10 ReceivedDate; STRING10 ApprovedDate; DATASET(PartyRec) Debtors; DATASET(PartyRec) SecuredParties; CollateralRec Collateral; END; Out_Transacts Get_Transacts(XactTbl L, INTEGER C) := TRANSFORM SELF.FilingNumber := L.FilingNumber; SELF := L.Transactions[C]; END; Transacts := NORMALIZE(XactTbl,LEFT.XactCount,Get_Transacts(LEFT,COUNTER)); OUTPUT(Transacts);
This NORMALIZE extracts all the Transactions into a separate recordset with just one Transaction per record with the parent information (the Filing number) appended. However, each record here still contains multiple Debtor and SecuredParty child records.
PartyCounts := TABLE(Transacts, {INTEGER DebtorCount := COUNT(Debtors), INTEGER PartyCount := COUNT(SecuredParties), Transacts}); OUTPUT(PartyCounts);
This TABLE function gets the counts of the multiple Debtor and SecuredParty records for each Transaction.
Out_Parties := RECORD STRING FilingNumber; STRING TransactionID; PartyRec; END; Out_Parties Get_Debtors(PartyCounts L, INTEGER C) := TRANSFORM SELF.FilingNumber := L.FilingNumber; SELF.TransactionID := L.TransactionID; SELF := L.Debtors[C]; END; TransactDebtors := NORMALIZE( PartyCounts, LEFT.DebtorCount, Get_Debtors(LEFT,COUNTER)); OUTPUT(TransactDebtors);
This NORMALIZE extracts all the Debtors into a separate recordset.
Out_Parties Get_Parties(PartyCounts L, INTEGER C) := TRANSFORM SELF.FilingNumber := L.FilingNumber; SELF.TransactionID := L.TransactionID; SELF := L.SecuredParties[C]; END; TransactParties := NORMALIZE(PartyCounts, LEFT.PartyCount, Get_Parties(LEFT,COUNTER)); OUTPUT(TransactParties);
This NORMALIZE extracts all the SecuredParties into a separate recordset. With this, we've now broken out all the child records into their own normalized relational structure that we can work with easily.