Sun Oct 25, 2020 10:42 am
Login Register Lost Password? Contact Us


Accessing subfile names from ecl code

Comments and questions related to the Enterprise Control Language

Mon Jun 01, 2020 11:24 pm Change Time Zone

The data transformation on the superfile that I am doing consists of using information from the name of the individual subfiles.

i.e.,
In order to transform each record, I need to extract information from the name of the subfile that the particular record belongs to.

Please let me know if this can be achieved, and a possible document if yes.

Thanks
Gurman
 
Posts: 6
Joined: Thu May 21, 2020 4:05 pm

Tue Jun 02, 2020 1:15 pm Change Time Zone

Gurman,

If each subfile record has an identifier field (UID) that is unique within the superfile, AND each subfile's UID field contains a single contiguous range of values, then that could be used to imply the file it came from.

Given that, you could code an inline DATASET that would reference those ranges back to the subfilename, like this:
Code: Select all
//the dataset of ranges
SubNameDS := DATASET([{'sub1',0,10},{'sub2',11,20},{'sub3',21,30}],
               {STRING filename, UNSIGNED lo, UNSIGNED hi});
//and a function to get the subfilename, passing the UID value:
GetSubName(UNSIGNED val) := SubNameDS(val BETWEEN lo AND hi)[1].filename;
GetSubName(22); //returns "sub3"

You might be able to build that dataset automatically as a TABLE by using the STD.File.SuperFileContents() function to get the current list of subfiles, then use that to get the range of UID values for each, something like this:
Code: Select all
//NOT TESTED, PSEUDOCODE ONLY
IMPORT Std;
SubFiles := Std.File.SuperFileContents('Superfilename');
Rec := RECORDOF(Superfile);
ThisDS(STRING name) := DATASET(name,rec,FLAT);
Tbl := TABLE(SubFiles,
             {name,
              UNSIGNED lo := MIN(ThisDS(name),ThisDS.UID),
              UNSIGNED hi := MAX(ThisDS(name),ThisDS.UID)});
GetSubName(UNSIGNED val) := Tbl(val BETWEEN lo AND hi)[1].filename;

Of course, if your data doesn't support this scheme, then I think the answer would have to be NO, I have no idea how you can do this in ECL. :)

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1560
Joined: Wed Oct 26, 2011 7:40 pm

Tue Jun 02, 2020 8:07 pm Change Time Zone

Gurman,

I wanted to test this concept myself, so I created three subfiles (from my training data), like this:
Code: Select all
IMPORT Training.IntroECL_P2 AS T;
ds := T.UID_Persons;
OUTPUT(ds(RecID BETWEEN 1 AND 100),,'~rttest::SF::UID_Persons_1');
OUTPUT(ds(RecID BETWEEN 101 AND 200),,'~rttest::SF::UID_Persons_2');
OUTPUT(ds(RecID BETWEEN 201 AND 300),,'~rttest::SF::UID_Persons_3');
Then I went into ECL Watch and added them to a new SuperFile.

Next, I defined the SuperFile:
Code: Select all
SF_rec := RECORD
  unsigned4 recid;
  unsigned8 id;
  string15 firstname;
  string25 lastname;
  string15 middlename;
  string2 namesuffix;
  string8 filedate;
  unsigned2 bureaucode;
  string1 maritalstatus;
  string1 gender;
  unsigned1 dependentcount;
  string8 birthdate;
  string42 streetaddress;
  string20 city;
  string2 state;
  string5 zipcode;
END;

Superfile := DATASET('~rttest::sf::superfile',SF_rec,FLAT);
and then I wrote the process I described above:
Code: Select all
IMPORT Std;
SubFiles := NOTHOR(Std.File.SuperFileContents('~rttest::sf::superfile'));
Rec := RECORDOF(Superfile);

ThisDS(STRING name) := DATASET(name,rec,FLAT);
Tbl := TABLE(SubFiles,
             {name,
              UNSIGNED lo := MIN(ThisDS('~' + name),RecID),
              UNSIGNED hi := MAX(ThisDS('~' + name),RecID)});
                                 
GetSubName(UNSIGNED val) := Tbl(val BETWEEN lo AND hi)[1].name;

P := PROJECT(SuperFile,
             TRANSFORM({UNSIGNED RecID,STRING name},
                       SELF.name := GetSubName(LEFT.RecID),
                       SELF := LEFT));

OUTPUT(P,ALL);                                 
I wrapped NOTHOR around the Std.File.SuperFileContents() function call because it only works with DFU metadata and doesn't need to be run from every node in your Thor.

Anyway, this code works with my training data, so good luck with your project.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1560
Joined: Wed Oct 26, 2011 7:40 pm

Wed Jun 03, 2020 1:32 am Change Time Zone

Dear Richard,

Thank you for the effort you put in, but unfortunately the data does not have any such values that can be mapped for indexing, like you mentioned.

I am trying the following flow as a work around:
  • First looping over all the subfiles using the superfilecontents method.
  • In the loop body, I am using a join between the individual filename entry and the corresponding dataset for that file.
  • Finally, in the transform function of the join I can use the filename in my data for altering the records.
Let me know your views on this.

I'm facing a simple '2035: Output dataset must match the source dataset type' error at the following line for now while implementing the above approach(at location where fjoin is written), will update here once that works out.
Code: Select all
loop(namesTable, count(namesTable), fJoin(rows(left)));


Thanks,
Gurman
Gurman
 
Posts: 6
Joined: Thu May 21, 2020 4:05 pm

Wed Jun 03, 2020 12:58 pm Change Time Zone

Gurman,

OK, so your approach gave me an idea. Here's how I would do it (this code goes at the bottom of my previous example, and it works on my small test superfile):
Code: Select all
//if you don't have a UID to use,
// you can create a nested child dataset:
Prj := PROJECT(SubFiles,
               TRANSFORM({STRING name,DATASET(rec) child},
                         ds := ThisDS('~' + LEFT.name);
                         SELF.child := ds;
                         SELF := LEFT));

// then NORMALIZE it to do the "work" with the subfilename
// in this case, just adding it to each record                                 
WrkRec := {STRING name,rec};
WrkRec XF(rec L, STRING name) := TRANSFORM
  SELF.name := name; 
  SELF := L;
END;

Wrk := NORMALIZE(Prj,LEFT.child,XF(RIGHT,LEFT.name));

OUTPUT(Wrk,ALL);
I'm creating a nested child dataset first to attach the file names to their proper subfile records, then using NORMALIZE to do the "work" you want to do.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1560
Joined: Wed Oct 26, 2011 7:40 pm

Wed Jun 03, 2020 8:19 pm Change Time Zone

Yes that works Richard,
And I think it is more efficient than LOOP.

Thank you for your help.
-Gurman
Gurman
 
Posts: 6
Joined: Thu May 21, 2020 4:05 pm


Return to ECL

Who is online

Users browsing this forum: No registered users and 2 guests

cron