Celestial Software

...better by design

Home Support User Forums
Welcome, Guest
Please Login or Register.    Lost Password?
Arabic Script Bidirectionality and Letter Shaping (1 viewing) (1) Guest
Go to bottom Favoured: 0
  • Page:
TOPIC: Arabic Script Bidirectionality and Letter Shaping
#1422
Re: Arabic Script Bidirectionality and Letter Shaping 2 Years, 3 Months ago  
is any way Absolute Telnet could do an internal translation between UTF-8 2-byte characters to 1-byte (0:255) values? We still strongly prefer to be able to store a character in a byte.I believe you may have a misunderstanding about the terminal's role in your application's design. Absolute doesn't care how you store things, or how you sort things, convert things, etc. It simply takes the data in any of the supported formats (your choice) and displays it. If your database only supports single-byte characters, then you're stuck with one of the single-byte legacy encodings as we discussed before. This approach will probably require you to create a custom sorting algorithm, as sorting by binary values will give the wrong order. So be it. That may just be work you can't avoid. The downside to this is that as you add additional languages, you'll have to support additional legacy encodings and special sorts, etc. This is the kind of work that Unicode was designed to help you avoid.

There isn't any algorithm that can take 36 unicode characters and store it in 36 bytes unless you convert them to some single-byte legacy encoding. Then, you're back to square one.

Typically, a Unicode application does not store data internally in UTF8 format. UTF8 is *not* a 2-byte encoding. It is a variable length multi-byte encoding that can store a character in 1, 2, 3, or even 4 bytes! This variability makes it a poor choice for data storage, as it becomes very difficult to determine the lengths of strings necessary to store a certain number of characters or how many characters can fit in your 36-byte field. Applications tend to store data in the UCS2 format where every unicode character takes exactly two bytes. Of course, this requires quite a bit of application modification and extra storage as you said. However, once this work is done, adding new languages is trivial.

Regardless of how you store it, when you send the data to the terminal, it has to be in one of the encodings the terminal supports (Win1256, ISO8859-6, etc)
bpence (Admin)
Admin
Posts: 1138
graph
User Offline Click here to see the profile of this user
Logged Logged  
 
Brian Pence
Celestial Software
SSH , SFTP, and Telnet in a tabbed interface for Windows XP, Vista, Mobile, and others
 
The administrator has disabled public write access.  
Go to top
  • Page: