Fri Oct 22, 2021 6:50 pm
Login Register Lost Password? Contact Us


Need Incremental Index solution

Comments and questions related to the Enterprise Control Language

Tue Mar 12, 2013 11:59 am Change Time Zone

Hi,

I am having one scenario in which after every few mins new data is added to Superfile and as and when new data comes, we are creating INDEX on Superfile.

This latest INDEX we are using in Package map to update our Roxie queries.

ECL code for Spraying and adding file to Superfile and creating INDEX on Superfile is:

Code: Select all
IMPORT STD;

VARSTRING timeStamp := '' : stored('timeStamp');
VARSTRING fileName := '' : stored('filename');
VARSTRING thorip := '' : stored('thorip');
VARSTRING destinationlogicalname  := '~sprayed::' +fileName + '_' + timeStamp;
VARSTRING sourceIP := '' : stored('roxieip');;
VARSTRING sourcepath := '/var/lib/HPCCSystems/mydropzone/buzzmonitoring/' +fileName +'.csv';
VARSTRING srcCSVseparator := ';';
VARSTRING destinationgroup := 'mythor';
VARSTRING espserverIPport := 'http://' +thorip  + ':8010/FileSpray';
VARSTRING subFileDestinationLogicalname := '~sapphire::subfile::buzzmonitoring::' +fileName  + '_' + timeStamp;

VARSTRING superfile_name := '~sapphire::superfile::buzzmonitoring';
VARSTRING indexfile_name := '~sapphire::index::buzzmonitoring::buzzmonitoring_' +timeStamp;

/*Spray the csv file from the dropzone*/
SprayCSVFile :=STD.File.fSprayVariable(sourceIP,sourcepath,,srcCSVseparator,,,
destinationgroup,destinationlogicalname,,espserverIPport,
,TRUE,TRUE,FALSE);

/*Create Dataset of sprayed file*/
Layout_buzzmonitoring := RECORD
STRING100 UserID;
STRING1000 Search_Keyword;
INTEGER8 TwitterUniqueID;
INTEGER8 TwitterUserID;
STRING1000 TwitterUserName;
STRING1000 TwitterProfileName;
INTEGER8 NoOfFollowers;
INTEGER8 NoOfFriends;
STRING1000 Search_Date;
STRING1000 Tweets_Date;
END;

File_Layout_Subfile_Dataset :=
DATASET(destinationlogicalname,Layout_buzzmonitoring,CSV(SEPARATOR(';')));

/*create logical file with record structure*/
subfileCreation := OUTPUT(File_Layout_Subfile_Dataset,,subFileDestinationLogicalname,THOR,OVERWRITE);

/*delete previous logical file without record structure*/
deleteSprayedLogicalFile := STD.File.DeleteLogicalFile(destinationlogicalname);

SuperFile_Dataset := DATASET(superfile_name,{Layout_buzzmonitoring,UNSIGNED8 fpos{virtual(fileposition)}},THOR);

IDX_SuperFile := INDEX(SuperFile_Dataset,{UserID,TwitterUniqueID,Search_Keyword,Tweets_Date},
{TwitterUserID,TwitterUserName,TwitterProfileName,NoOfFollowers,NoOfFriends,Search_Date,fpos},indexfile_name);
idx := BUILDINDEX(IDX_SuperFile,OVERWRITE);

SEQUENTIAL(
SprayCSVFile,
subfileCreation,
deleteSprayedLogicalFile,
Std.File.StartSuperFileTransaction(),
Std.File.AddSuperFile(superfile_name,subFileDestinationLogicalname),
Std.File.FinishSuperFileTransaction(),
idx
);



The newly (latest) created INDEX is having all the contents of the Superfile.
So the problem here is that previously created INDEXes are of no use now and number of INDEXes are increasing. We can say that only latest INDEX is of use and all previously created INDEXes are STALE. Inshort redundancy is occuring on THOR and ROXIE (on ROXIE because we are using PackageMap to update Roxie queries).

How to overcome stale indexes?? (Note. if unpublish Roxie queries is the solution then we cant unpublish queries)

The core point is that instead of building a new INDEX every time a sub-file is added(which is slow as it built on the entire super-file), is there a way wherein the super-key can get the 'incremental update' i.e a new/overwritten INDEX which has the latest data ?
prachi
 
Posts: 46
Joined: Mon Jul 23, 2012 11:50 am

Tue Mar 12, 2013 3:19 pm Change Time Zone

The main question seems to actually be:

The core point is that instead of building a new INDEX every time a sub-file is added(which is slow as it built on the entire super-file), is there a way wherein the super-key can get the 'incremental update' i.e a new/overwritten INDEX which has the latest data?


I think the only way to have an incremental superkey of a superfile with multiple sub files is to create a “Payload Index”… that is an index that has all the data you need in it rather than needing to resolve the related records from a datafile via the filepos. If you can use a Payload index, you should be able to index the new subfile, and just append that index to the Super Key…. But if you need to fetch records from the data superfile, you need to re-index the whole superfile every time.

How to overcome stale indexes?? (Note. if unpublish Roxie queries is the solution then we cant unpublish queries)


Not sure exactly what the question is here, but if you do re-index every time and only refer to the index via superkey you should be able to delete the old index once the superkey is updated, right? i.e. all queries go through the superkey and that no longer refers to the old index, so it can be deleted?

Regards,

Bob and Tony
bforeman
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1005
Joined: Wed Jun 29, 2011 7:13 pm

Wed Mar 13, 2013 4:57 am Change Time Zone

Hi,
Bob and Tony,

To clear confusions, if any, listing the facts :

    In our Roxie queries, we are using super-keys which in turn use payload indexes built on super-files
      Once such a Roxie query is published, one cannot update the super-key i.e addition of new indexes, removal of the old etc. cannot be done. This is because Roxie acquires a lock on the super-key and its child components. To get around this, the packagemap needs to be used
      Now, as it is obvious, every time a sub-file is added to a super-file, a new PAYLOAD index needs to be created. There are two problems now - the overhead of creating a new PAYLOAD index every time on the entire super-file AND updating the super-key in such a way that it has access to the latest data with least no. of PAYLOAD indexes

    The background and the known issues are already posted on the forums :

    http://hpccsystems.com/bb/viewtopic.php ... d028#p3477

    I'm listing my queries again in as terse manner as possible :

      How to get a SINGLE/ONE index on a super-file such that it has the latest data and also is built in an 'incremental' manner i.e NOT BUILT on the entire super-file
      Without using packagemap, how to update the super-key which is already locked by Roxie?

    Thanks and regards !!!
    prachi
     
    Posts: 46
    Joined: Mon Jul 23, 2012 11:50 am

    Wed Mar 13, 2013 12:57 pm Change Time Zone

    Good post here regarding updating indexes in superkeys:

    https://hpccsystems.com/bb/viewtopic.php?f=8&t=837
    bforeman
    Community Advisory Board Member
    Community Advisory Board Member
     
    Posts: 1005
    Joined: Wed Jun 29, 2011 7:13 pm


    Return to ECL

    Who is online

    Users browsing this forum: No registered users and 1 guest