Monday, 18 July 2016

Simple higher level protocols and Unicode Plain Text hieroglyphic writing

Simple (plain) text has limitations in what can be expressed without additional information. There are various 'Higher Level Protocols' (HLP) used to enrichen all kinds of writing systems. HTML/CSS and various document formats such as those used in 'office' applications are kinds of HLP managed as international standards. To use Unicode Egyptian hieroglyphic along with these complex protocols it is desirable for Egyptologists and others to develop conventions of how to work with these protocols in consistent ways so as to be able to share rich data. Something for the future.

However, not all protocols need be complex like HTML/CSS.

Consider hieroglyph combinations already encoded as characters in Unicode. 𓃁 (ab) is 𓂝 (a) on top of  𓃀 (b). Gardiner calls these combinations Monograms and the current practice in Unicode is to encode monograms as separate characters.

As things stand in Unicode there is no character for 𓂧 (d) on top of  𓃀 (b) - (db) rare as a monogram. For many applications and users this is not a problem but this may crop up in some situations like corpus database.

The answer is to define or adopt a simple higher level protocol. For instance use the character '+' to indicate 'on top of' so (db) is encoded 𓂧+𓃀. Similarly for any other monograms not yet encoded. This is still plain text just not pure hieroglyphic so can be used in applications and databases. To render this combination visually software will need adaptation but this is something that needs to be done in general for hieroglyphs not yet encoded in Unicode so there is little impact.

This reasoning applies to other specialist writing features for which there is not sufficiently strong evidence to warrant direct support in plain text at this time. I expect various conventions will evolve as users gain more familiarity with Unicode.

There is an active discussion as whether the policy of encoding monograms as separate characters should be continued or an additional control character should be introduced to take the role of + as given here. Either way, the method described here enables work on Unicode solutions to continue regardless.

Bob Richmond


No comments:

Post a Comment