Quantile 2 – Test cases
When adding new features to the system, or changing the code generator, the first step is often to write some ECL test cases. They have proved very useful for several reasons:
- Developing the test cases can help clarify issues, and other details that the implementation needs to take into account. (E.g., what happens if the input dataset is empty?)
- They provide something concrete to aim towards when implementing the feature.
- They provide a set of milestones to show progress.
- They can be used to check the implementation on the different engines.
As part of the design discussion we also started to create a list of useful test cases (they follow below in the order they were discussed). The tests perform varying functions. Some of the tests are checking that the core functionality works correctly, while others check unusual situations and that strange boundary cases are covered. The tests are not exhaustive, but they are a good starting point and new tests can be added as the implementation progresses.
The following is the list of tests that should be created as part of implementing this activity:
- Compare with values extracted from a SORT.
Useful to check the implementation, but also to ensure we clearly define which results we are expecting. - QUANTILE with a number-of-ranges = 1, 0, and a very large number. Should also test the number of ranges can be dynamic as well as a constant.
- Empty dataset as input.
- All input entries are duplicates.
- Dataset smaller than number of ranges.
- Input sorted and reverse sorted.
- Normal data with small number of entries.
- Duplicates in the input dataset that cause empty ranges.
- Random distribution of numbers without duplicates.
- Local and grouped cases.
- SKEW that fails.
- Test scoring functions.
- Testing different skews that work on the same dataset.
- An example that uses all the keywords.
- Examples that do and do not have extra fields not included in the sort order. (Check that the unstable flag is correctly deduced.)
- Globally partitioned already (e.g., globally sorted). All partition points on a single node.
- Apply quantile to a dataset, and also to the same dataset that has been reordered/distributed. Check the resulting quantiles are the same.
- Calculate just the 5 and 95 centiles from a dataset.
- Check a non constant number of splits (and also in a child query where it depends on the parent row).
- A transform that does something interesting to the sort order. (Check any order is tracked correctly.)
- Check the counts are correct for grouped and local operations.
- Call in a child query with options that depend on the parent row (e.g., num partitions).
- Split points that fall in the middle of two items.
- No input rows and DEDUP attribute specified.
Ideally any test cases for features should be included in the runtime regression suite, which is found in the testing/regress directory in the github repository. Tests that check invalid syntax should go in the compiler regression suite (ecl/regress). Commit https://github.com/ghalliday/HPCC-Platform/commit/d75e6b40e3503f85126567… contains the test cases so far. Note, the test examples in that commit do not yet cover all the cases above. Before the final pull request for the feature is merged the list above should be revisited and the test suite extended to include any missing tests.
In practice it may be easier to write the test cases in parallel with implementing the parser – since that allows you to check their syntax. Some of the examples in the commit were created before work was started on the parser, others during, and some while implementing the feature itself.