Mon Oct 25, 2021 1:57 am
Login Register Lost Password? Contact Us


Roxie TOPN for sort field chosen at query time

Questions around writing code and queries

Mon Jan 30, 2017 11:29 pm Change Time Zone

Hi what is the best approach for doing TOPN in Roxie with a result set > 1 million and where a user can can choose a sort field at run time?

Our current thinking is to use ALLNODES

Code: Select all
CHOOSEN( // global choosen
  SORT( // global sort
    ALLNODES(
      LOCAL(
        CHOOSEN( // local choosen
          SORT( // local sort
            inx,
            sortField
          ),
          50
        ) //end choosen
      ) //end local
    ), // end allnodes
    sortfield
  ), // end global sort
  50
) // end global choosen


or is there a smarter way of doing this large sort at query time... the average case will always be > 1 million records to sort
afarrell
 
Posts: 14
Joined: Fri Nov 07, 2014 2:39 pm

Tue Jan 31, 2017 4:00 pm Change Time Zone

afarrell,

How large are the records? IOW, can > 1 million records all fit into memory on a single node? If the answer is no, then your approach seems to me to be pretty reasonable, but I would still do some testing of both against real datasets (if that's at all possible).

The obvious trade-off with using ALLNODES is overall Roxie performance, since each query would then involve all the nodes. Given this solution you might need to dedicate that Roxie to servicing only this single query.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1600
Joined: Wed Oct 26, 2011 7:40 pm

Wed Feb 01, 2017 10:39 am Change Time Zone

Hi Richard,

RT > How large are the records? IOW, can > 1 million records all fit into memory on a single node?

AF > Depending on what needs to be sorted, upwards of 40GB might need to be retrieved from slave nodes to be processed
on the worker, I am in favour of a strategy that keeps records in place and applies divide and conquer thinking, with just
the result or a substatially reduced part-result being transmitted accross the network.

RT > If the answer is no, then your approach seems to me to be pretty reasonable, but I would still do some testing of both against real datasets (if that's at all possible).

AF > I suppose no, I think we struggle with the large volumes transporting records on to a single node.

RT > The obvious trade-off with using ALLNODES is overall Roxie performance, since each query would then involve all the nodes. Given this solution you might need to dedicate that Roxie to servicing only this single query.

AF > That is a fair point, as it stands we engage all slave nodes to retrieve data in their respective index parts,
I think with the use of ALLNODES we will engage all slave nodes to greater effect and hopefully see reduced
latency with regard to responding to end-user requests as a whole.

thanks,

-Andrew
afarrell
 
Posts: 14
Joined: Fri Nov 07, 2014 2:39 pm


Return to Programming

Who is online

Users browsing this forum: No registered users and 1 guest

cron