Sat Aug 18, 2018 9:50 pm
Login Register Lost Password? Contact Us


Keys usting UTF8 fields

Questions around writing code and queries

Fri Jul 07, 2017 7:52 pm Change Time Zone

AS our ECL code is used in other countries, we are experiencing the problem of needing UTF8 fields in the keys. How can that be done? When I try to compile I get the error message that I cannot use variable length fields.

Any suggestions?
georgeb2d
 
Posts: 93
Joined: Wed Dec 24, 2014 3:36 pm

Fri Jul 07, 2017 8:02 pm Change Time Zone

Don,

Discover the max length of the UTF data you want to use as the key field and define it as that (or larger).

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1370
Joined: Wed Oct 26, 2011 7:40 pm

Fri Jul 07, 2017 8:11 pm Change Time Zone

Something like this?
UTF8 FieldName { MAXLENGTH (266) };

That did not work..but this did:
UNICODE266 FieldName ;
georgeb2d
 
Posts: 93
Joined: Wed Dec 24, 2014 3:36 pm

Fri Jul 07, 2017 8:41 pm Change Time Zone

Yes, or maybe like this:
Code: Select all
UTF8_266 fieldname;
But a 266-character Unicode string is not going to be a terribly efficient key. You might want to limit it to the first 20 or 30 characters, something like this:
Code: Select all
IDX := INDEX(ds,{UTF8_20 First20 := ds.LongUTF8_field},{ds.UID},'keyfilename');
Caveat: this code is "off the top of my head" so it may need tweaking. :)

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1370
Joined: Wed Oct 26, 2011 7:40 pm

Fri Jul 07, 2017 8:56 pm Change Time Zone

UTF8_255 Fieldname; is still giving the error message:

Error: Variable size fields (Fieldname) are not supported inside indexes

UNICODE255 Fieldname; does work.
georgeb2d
 
Posts: 93
Joined: Wed Dec 24, 2014 3:36 pm

Fri Jul 07, 2017 9:02 pm Change Time Zone

OK, now I see the problem. That's because UTF8 is an inherently variable-length data type where each character can be expressed as anything from 1 to 6 bytes, depending on the character itself.

But either way, such a long key field is not going to be efficient, so I suggest you think about shortening it for the index.

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1370
Joined: Wed Oct 26, 2011 7:40 pm

Mon Jul 10, 2017 1:11 pm Change Time Zone

And more about this issue, per the developers:
**********************************************************************************
There are problems with having Unicode in indexes – especially our current implementation of indexes. There are generally two solutions:

1) If you only need exact matches:

In this case the simplest method is to store a HASH64() of the string in the keyed portion, and the original string in the payload. Your search then becomes
Code: Select all
BUILD(myDataset, {hashText = HASH64(text)}, {payloadText = text },'indexfile');

myIndex(KEYED(hashText = HASH64(searchText)) AND (payloadText = searchText))

2) If you need ordering (e.g., for range searches) then you need to use KEYUNICODE.

Unfortunately there is no upper limit to the length that KEYUNICODE returns (even for a single character string!). You have a similar problem that you cannot have variable length strings in the keyed portion. So you need to store the first N bytes in the keyed portion, and the rest in the payload:

Code: Select all
BUILD(myDataset, { data20 keyedText := KEYUNICODE(text)[1..20]}, {payloadText := KEYUNICODE(text)[21..] });

myIndex(KEYED(keyedText = KEYUNICODE(searchText)[1..20]) AND (payloadText = KEYUNICODE(searchText)[21..]))

As far as Unicode vs utf8 go they are different representations of the same information and can contain the same characters. Unicode is implemented as UTR-16LE
**********************************************************************************

HTH,

Richard
rtaylor
Community Advisory Board Member
Community Advisory Board Member
 
Posts: 1370
Joined: Wed Oct 26, 2011 7:40 pm


Return to Programming

Who is online

Users browsing this forum: No registered users and 1 guest

cron