Tuesday 11 October 2016

Unicode plain text proposal status (October 2016)

Summary

Plain text hieroglyphic writing in Unicode is currently on hold while some technical points are investigated. These are use of EGYPTIAN HIEROGLYPH LIGATURE JOINER, extensions for rare forms of writing and extensions for vertical text (the initial proposal was focused on the forms of horizontal writing that account for the vast majority of hieroglyphic in print and first generation digital formats).

This means the earliest that hieroglyphic writing will be released as part of the Unicode Standard is Unicode 11 (2018), a year after previously planned for Unicode 10. This delay unfortunately means it won't coincide with the 150th anniversary of the first print publication using a hieroglyphic typeface (mentioned in Some remarks about Unicode Hieroglyphic fonts).

This is a nuisance but in practical terms there is no reason to hold up work on building an ecosystem for Unicode hieroglyphic writing. It simply means it will be necessary to use an approach such as the web font referenced in Unicode Hieroglyphs in web browsers: Web Fonts as the basis of fonts and tools, accepting some limitations in what can be done until the Unicode standard is updated and implemented by web browsers, system software, and applications such as word processors.


Discussions

This Summer featured much discussion of the Proposal to encode three control characters for Egyptian Hieroglyphs (L2/16-018R) at the Informatique et Égyptologie - Cambridge - 2016 meeting in July. And afterwards. In early August, several of us from the I&E meeting participated in a telephone discussion with members of the Unicode Technical Committee (UTC).

There were also discussions about expanding the repertoire of Unicode hieroglyphs. In practical terms, this is the main obstacle to fully encoding a range of ancient sources one might expect to be represented in a plain text writing system. Repertoire is not entirely unconnected with control characters but the two can proceed separately in the standardisation process so I'll treat this topic another time.

The purpose of the proposed three control characters is to enable Hieroglyphic writing in Unicode (the current situation is 1071 basic hieroglyph characters were incorporated in the standard in 2009 but there is no way to form quadrats so there is no authentic writing system as such). Most participants were in agreement with the principle of enabling the writing system. However, the Thesaurus Linguae Aegyptiae (TLA) and Ramses corpsus projects have objected to moving forward with the three until additional features are provided.

The reason these extensions were not proposed for the first release was to focus on the vast majority of modern hieroglyphic in typeset books and digital formats which use horizontal writing and do not need additions to enable digital encoding.

Interest was expressed in extending the scope of  L2/16-018R to deal with vertical writing and the related 'tall quadrat' orthography (used in some horizontal writings). Some are of the opinion it is important this is done for the first release of support for the writing system rather than as a second stage. I've described two extended control characters, EGYPTIAN HIEROGLYPH HORIZONTAL GROUP JOINER and EGYPTIAN HIEROGLYPH HORIZONTAL GROUP JOINER I've been using for vertical text evaluation in a discussion document An Extension to the three control characters for Egyptian Hieroglyphs and some additional remarks (L2/16-214).

Examples of several instances of rare quadrat arrangements were noted that cannot be represented elegantly or unambiguously using L2/16-018R. Analysis so far suggests these account for order 0.01% (sic) of the digitised corpus but may be more common in certain ancient contexts.

Additional controls can be added to deal with rare quadrats but the issue needs to be better characterised and agreed by Egyptologists before deciding what to do. As with the basic 3 characters, data needs to be studied and evaluated before submitting a formal proposal.

Discussions in July yielded a consensus with TLA and Ramses projects on implementation of two of the three characters namely EGYPTIAN HIEROGLYPH HORIZONTAL JOINER and EGYPTIAN HIEROGLYPH VERTICAL JOINER, published as L2/16-227.

M-J Nederhof presented a discussion document L2/16-177 at the I&E meeting based on adapting his RES scheme as an alternative approach to control characters for Unicode quadrat sequences without using the horizontal or vertical joiners. This was followed by a revised document L2/16-210 with addendum L2/16-233 which outline two alternative versions and notations of his system.

There are many alternative ways one might define quadrat sequences using various levels of complexity but there would need to be convincing evidence to drop the vertical and horizontal joiners and/or require complex or hard to read sequences for simple quadrats. It is obviously important that any proposed alternatives are capable of implementation in current technology. 

Discussions continue on the Egyptian Hieroglyphs in the UCS mailing list (see archives at http://evertype.com/pipermail/egyptian_evertype.com/). Egyptologists and others with an interest in the topic are encouraged to join and participate in or follow the list.

Status of the L2/16-018R proposal

L2/16-018R was published in January 2016 as a revision to the original May 2015 publication L2/15-123. No objections had been received by UTC during May-January so the proposal was put out to international ballot as a UTC recommendation in January 2016. Comments were received by UTC in April 2016 ( L2/16-090, my reply at L2/16-104) where specific objections were made to the EGYPTIAN HIEROGLYPH LIGATURE JOINER.

Following discussions outlined above, L2/16-018R is on hold until it is determined what additional features are required to obtain consensus. I suspect the earliest this can be reviewed by UTC is January 2017.

Over the next few months it would be really useful if comments, requirements or objections about any additions can be made in a timely fashion to UTC in future to avoid further unnecessary delays.

Bob Richmond

No comments:

Post a Comment