Sun Oct 13, 2019 10:46 pm
Login Register Lost Password? Contact Us


Check if two record structure match

Comments and questions related to the Enterprise Control Language

Tue Oct 08, 2019 11:33 pm Change Time Zone

I'm putting together a macro to check if two record structures match between two given datasets. If I have the dataset I pass that in so I can compare the xml value of the rec structure for the datasets. If I don't have a dataset defined and was just given a filename then I attempt to lookup the record structure. If the file name does not exist it throws a warning and shows as a mismatch and I'm good with that. My question is how do I get around file names that are not constant at runtime and I don't have a dataset definition. I need to get this working without having the dataset handy and without being able to create it because I don't know the the layout and with the file name built in this exact manner. Trying to integrate into existing common code that everyone uses without any code changes to the builds.

Thanks for your ideas. I'm so close I can taste it I just need a way to interrogate the file structure by name at runtime :-)

PS. I tried to use get column mapping and GetLogicalFileAttribute first.

Example: macro defined below

Code: Select all
#CONSTANT('myfileprefix','~thor::tmsn');

prefix := '~thor' : stored('myfileprefix');


Test1 := dataset([{1,'one'},{2,'two'}],{integer id , string desc});
Test2 := dataset([{2,'two'},{3,'three'}],{integer id , string desc});

filename := prefix + '::testfile';
//run first wuid then comment out
// output(Test2,,filename,thor);

//uncomment below and run second time after test file created. you can just syntex check it it will say
//Error:    LOOKUP attribute requires a constant filename MAC_Check_Rec_Struct_Match.ecl
#IF(MAC_Check_Rec_Struct_Match(Test1,filename))
output('They Match',named('match'));
#ELSE
output('NO LUCK',named('NOPE'));
#END


//pass in two file names and this code will tell you if the record structure is //identical two datasets will also work as RECORDOF will use the dataset and extract //the known structure newly added functionality in 6.4  //https://hpccsystems.com/blog/file-layout-resolution-compile-time
EXPORT MAC_Check_Rec_Struct_Match(file1,file2) := functionmacro
import std;

   #uniquename(typ1);
   #uniquename(typ2);
   #uniquename(r);
   #uniquename(r2);
   #uniquename(out);
   #uniquename(out2);

   //check if the parms are datasets or strings
   %typ1%  := STD.Str.ToLowerCase(#GETDATATYPE(file1)[..6])  = 'string';
   %typ2%  := STD.Str.ToLowerCase(#GETDATATYPE(file2)[..6])  = 'string';
   #IF(%typ1%)
     %r% := RECORDOF(file1,LOOKUP); //if string look up record def assuming file exists
   #ELSE
     %r% := RECORDOF(file1); //assume it is a dataset and has been loaded into memory
   #END
   
   #IF(%typ2%) %r2% := RECORDOF(file2,LOOKUP);
   #ELSE       %r2% := RECORDOF(file2); #END
   #EXPORT(out, %r%);
   #EXPORT(out2, %r2%);

   return %'out'% = %'out2'%;

endmacro;


Tim N
newportm
 
Posts: 13
Joined: Tue Nov 15, 2016 2:48 pm

Wed Oct 09, 2019 2:04 pm Change Time Zone

Here is an option. I can pull the data from the DFU with a soapcall at runtime. and parse out the record struct. One caveat is this does not return any information if it is a superfile.
Code: Select all
DFUInfoRequest := RECORD, MAXLENGTH(100)
      STRING  Name              {XPATH('Name'               )} := filename;
      STRING  Cluster           {XPATH('Cluster'            )} := cluster;
      STRING  UpdateDescription {XPATH('UpdateDescription'  )} := '0';
      STRING  FileName          {XPATH('FileName'           )} := '';
      STRING  FileDesc          {XPATH('FileDesc'           )} := '';
END;
   
DFUInfoOutRecord := RECORD, MAXLENGTH(100000)
      STRING Ecl                {XPATH('FileDetail/Ecl'              )};   
END;

esp            := pesp + ':8010';
results := SOAPCALL('http://' + esp + '/WsDfu'
                     ,'DFUInfo'
                     ,DFUInfoRequest
                     ,DATASET(DFUInfoOutRecord)
                     ,XPATH('DFUInfoResponse')
                     );
   
results;
newportm
 
Posts: 13
Joined: Tue Nov 15, 2016 2:48 pm

Wed Oct 09, 2019 7:07 pm Change Time Zone

Tim,

Not a simple problem, but I managed to find a fairly simple way to do it! :)

First I wrote a FUNCTIONMACRO that uses #EXPORTXML to get the structure information from any declared DATASET (inline or on disk) and used Template Language to format the result exactly the same as the GetLogicalFileAttribute function's return result:
Code: Select all
GetStructTxt(ds) := FUNCTIONMACRO
  #DECLARE(Ctr);
  #SET(Ctr,0);
  #DECLARE(OutString);
  #SET(OutString,'{ ');
  #EXPORTXML(Fred,ds);
  #FOR (Fred)
    #FOR (Field)
      #IF(%Ctr%=0)
         #APPEND(OutString,%'{@ecltype}'% + ' ' + %'{@name}'% )
         #SET(Ctr,1);
      #ELSE   
         #APPEND(OutString,', ' + %'{@ecltype}'% + ' ' + %'{@name}'% )
      #END
    #END
  #END
  #APPEND(OutString,' };\n'); //add \n to duplicate GetLogicalFileAttribute() return
  RETURN %'OutString'%;
ENDMACRO; 

Now you can use the GetLogicalFileAttribute function to get the structure when you only have the filename. The "trick" to this function that I learned through hard effort is that it appends a newline character to the end of its return result, so I had to make sure the FUNCTIONMACRO duplicated that format exactly to allow a simple string compare between the two results.

Then you can compare any two dataset structures, like this:
Code: Select all
#CONSTANT('myfileprefix','~thor::test::RT');
prefix := '~thor' : stored('myfileprefix');
filename  := prefix + '::testfile';

Test1 := dataset([{1,'one'},{2,'two'}],{integer id , string desc});
Test2 := dataset([{2,'two'},{3,'three'}],{integer id , string desc}); //disk file
Test3 := dataset([{1,'one'},{2,'two'}],{UNSIGNED id , string10 desc});

IMPORT Std;

recstruct1 := GetStructTxt(Test1);                             
recstruct2 := STD.File.GetLogicalFileAttribute(filename,'ECL');
recstruct3 := GetStructTxt(Test3);                             

OUTPUT(recstruct1,NAMED('recstruct1_raw'));
OUTPUT(recstruct2,NAMED('recstruct2_raw'));
OUTPUT(recstruct3,NAMED('recstruct3_raw'));
OUTPUT(recstruct1 = recstruct2,NAMED('Compare_1_2')); 
OUTPUT(recstruct1 = recstruct3,NAMED('Compare_1_3')); 
OUTPUT(recstruct2 = recstruct3,NAMED('Compare_2_3')); 
Thanks for the interesting problem.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1481
Joined: Wed Oct 26, 2011 7:40 pm

Wed Oct 09, 2019 11:41 pm Change Time Zone

Richard,

I really appreciate you taking the time to put this together. I agree it has been a fun thing to work on. I don;t get to do much code problem solving these days. Your solution is pretty slick and gets around the one issue I was coming up with. A note that the macro GetStructTxt only works for inline dataset definitions or data that has been transformed/referenced in some way other than an output. If I instead use a dataset defined like this

Code: Select all
DS := Dataset('~thor::base::test', TestFolder.layouts.sampLayout,thor);


assuming I am going to pass the dataset around and do stuff with it later, it returns { } as the layout.

In other news,
if I do the same thing but the file name is built as in my example above.

Code: Select all
#CONSTANT('myfileprefix','~thor::tmsn');
prefix := '~thor' : stored('myfileprefix');
filename := prefix + '::testfile';

Test1 := dataset(filename,TestFolder.layouts.sampLayout,thor);


GetStructTxt(Test1 ); the compiler creates a local workunit and says it completed but never actually submits the job. L20191009-123456

Doing an output to read in a sequential does not change the behavior. Now if I take an altering action on the dataset say a project or sort the layout format actually changes for a file with a child dataset or 50...

simplified items in layout for here.

/////RESULT OF NOTHOR(STD.File.GetLogicalFileAttribute(file2,'ECL'));
Code: Select all
coverage_info := RECORD
   string4 child1;
  END;

finance_company_info := RECORD
   string15 child2;
  END;

RECORD
  string6 rec1
  string20 rec2
  DATASET(coverage_info) coverages{maxcount(18)};
  DATASET(finance_company_info) finance_info{maxcount(4)};
END;


////////REsult of GetStructTxt //////
Code: Select all
{ string6 rec1, string20 rec2,  table of <unnamed> coverages, string4 child1,  coverages, table of <unnamed> finance_info, string15 child2, finance_info };

I guess I can write another wrapper to convert all layout with child datasets into the {} format.
newportm
 
Posts: 13
Joined: Tue Nov 15, 2016 2:48 pm

Thu Oct 10, 2019 1:19 pm Change Time Zone

Tim,
A note that the macro GetStructTxt only works for inline dataset definitions or data that has been transformed/referenced in some way other than an output.
This simple solution makes it work for me:
Code: Select all
IMPORT TrainingYourName;

ds1 := TrainingYourName.File_Persons_Slim.file[1..2];
ds2 := TrainingYourName.Accounts[1..2];

recstruct1a:= (STRING)GetStructTxt(ds1);                             
recstruct2a:= (STRING)GetStructTxt(ds2);                             
OUTPUT(recstruct1a,NAMED('recstruct1a_rawEXPORT'));//OUTPUT file
OUTPUT(recstruct2a,NAMED('recstruct2a_rawEXPORT'));//sprayed file
You just need to make the dataset you pass a subset (like the first 2 recs, as I did here) and then it works correctly.

The problem is, sometime in the last 20 years the #EXPORT and #EXPORTXML format was expanded to include file information. Unfortunately, that info was added as a set of enclosing tags (whose info is only in XML attributes) instead of a simple self-contained tag. The problem is, the tag name is different for each filetype, so I would need to write several separate versions to handle this. Here's what it looks like:
Code: Select all
<Data>
<CsvTable exported="false" name="csv^class::rt::intro::accounts">
  <Field ecltype="unsigned8"
         label="personid"
         name="personid"
         position="0"
         rawtype="524545"
         size="8"
         type="unsigned"/>
</CsvTable>
</Data>

and ...

<Data>
<FlatTable exported="false" name="flat^class::rt::intro::persons" recordLength="155">
  <Field ecltype="unsigned8"
         label="id"
         name="id"
         position="0"
         rawtype="524545"
         size="8"
         type="unsigned"/>
</FlatTable>
</Data>
The fact that GetLogicalFileAttrbute returns totally different text for nested child datasets means separate code to match that. :(

I'll see what I can do with that, but I'm traveling now so ... :)

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1481
Joined: Wed Oct 26, 2011 7:40 pm

Thu Oct 10, 2019 2:06 pm Change Time Zone

Tim,

OK, upon further exploration, it appears that GetLogicalFileAttribute returns whatever text appears in the ECL tab in ECL Watch for that logical file.
Here is an option. I can pull the data from the DFU with a soapcall at runtime. and parse out the record struct. One caveat is this does not return any information if it is a superfile.
And that appears to be the same thing that GetLogicalFileAttribute is doing.

For small/simple files, that record structure is expressed as
Code: Select all
{ unsigned4 recid, string10 homephone };

For larger record structures (including nested Child Datasets) that takes the form:
Code: Select all
RECORD
  unsigned4 recid;
  string10 homephone;
  string10 cellphone;
  string20 fname;
  string20 mname;
  string20 lname;
  string10 new_homephone;
  string10 new_cellphone;
  string20 new_fname;
  string20 new_mname;
  string20 new_lname;
END;
So I'll have to reconsider how to duplicate that structure.

Otherwise, could you simply default to using GetLogicalFileAttribute on both sides of your comparison? That would make it much simpler. :)

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1481
Joined: Wed Oct 26, 2011 7:40 pm


Return to ECL

Who is online

Users browsing this forum: No registered users and 1 guest