Friday, 28 October 2016

Analysis of Unicode Egyptian hieroglyphs in a collection of MdC-coded transcriptions

This is a follow-up to my post last week MdC analysis for Unicode Repertoire Extensions.

I've applied the web app to a collection of 180 MdC files and summarised the results in Analysis of Unicode Egyptian hieroglyphs in a collection of MdC-coded transcriptions [PDF].

There is also a minor update to the MdC analysis for Repertoire Extensions web app itself fixing a couple of bugs and increasing the number of repertoire candidates to 200.

Bob Richmond


Wednesday, 19 October 2016

MdC analysis for Unicode Repertoire Extensions

As part of discussions on expanding the hieroglyph repertoire in Unicode it is useful to be able to inspect existing digital documents in Manuel de Codage (MdC) format. I've therefore made a web app available for this purpose: MdC analysis for Repertoire Extensions.

Most users of MdC will probably find the app instructive, whether interested in Unicode developments or not.

MdC methods of encoding Egyptian hieroglyphs have been around for over 25 years. MdC has proved by far the most popular method of digitally encoding hieroglyphic for publishing and database-type applications.

One complication is the fact that MdC was never technically defined in detail and work on the system appears to have stopped after the publication of the second edition of Hieroglyphica (2000) and before documentation was made available online. Therefore, several interpretations, extensions, variations and subsets of MdC are in existence (e.g. WinHiero, JSesh 5.5, WinGlyph and InScribe 2004). The web app attempts to be fairly permissive on what variation of MdC is analysed.

There is something of the chicken and the egg about releasing an app before there is a clear vision of the first expansion of the Unicode hieroglyph set. Bear that in mind.

I hope to evolve and improve the app over the next few months so feel free to send feedback via www.egpz.org.

Bob Richmond

Tuesday, 11 October 2016

Unicode plain text proposal status (October 2016)

Summary

Plain text hieroglyphic writing in Unicode is currently on hold while some technical points are investigated. These are use of EGYPTIAN HIEROGLYPH LIGATURE JOINER, extensions for rare forms of writing and extensions for vertical text (the initial proposal was focused on the forms of horizontal writing that account for the vast majority of hieroglyphic in print and first generation digital formats).

This means the earliest that hieroglyphic writing will be released as part of the Unicode Standard is Unicode 11 (2018), a year after previously planned for Unicode 10. This delay unfortunately means it won't coincide with the 150th anniversary of the first print publication using a hieroglyphic typeface (mentioned in Some remarks about Unicode Hieroglyphic fonts).

This is a nuisance but in practical terms there is no reason to hold up work on building an ecosystem for Unicode hieroglyphic writing. It simply means it will be necessary to use an approach such as the web font referenced in Unicode Hieroglyphs in web browsers: Web Fonts as the basis of fonts and tools, accepting some limitations in what can be done until the Unicode standard is updated and implemented by web browsers, system software, and applications such as word processors.


Discussions

This Summer featured much discussion of the Proposal to encode three control characters for Egyptian Hieroglyphs (L2/16-018R) at the Informatique et Égyptologie - Cambridge - 2016 meeting in July. And afterwards. In early August, several of us from the I&E meeting participated in a telephone discussion with members of the Unicode Technical Committee (UTC).

There were also discussions about expanding the repertoire of Unicode hieroglyphs. In practical terms, this is the main obstacle to fully encoding a range of ancient sources one might expect to be represented in a plain text writing system. Repertoire is not entirely unconnected with control characters but the two can proceed separately in the standardisation process so I'll treat this topic another time.

The purpose of the proposed three control characters is to enable Hieroglyphic writing in Unicode (the current situation is 1071 basic hieroglyph characters were incorporated in the standard in 2009 but there is no way to form quadrats so there is no authentic writing system as such). Most participants were in agreement with the principle of enabling the writing system. However, the Thesaurus Linguae Aegyptiae (TLA) and Ramses corpsus projects have objected to moving forward with the three until additional features are provided.

The reason these extensions were not proposed for the first release was to focus on the vast majority of modern hieroglyphic in typeset books and digital formats which use horizontal writing and do not need additions to enable digital encoding.

Interest was expressed in extending the scope of  L2/16-018R to deal with vertical writing and the related 'tall quadrat' orthography (used in some horizontal writings). Some are of the opinion it is important this is done for the first release of support for the writing system rather than as a second stage. I've described two extended control characters, EGYPTIAN HIEROGLYPH HORIZONTAL GROUP JOINER and EGYPTIAN HIEROGLYPH HORIZONTAL GROUP JOINER I've been using for vertical text evaluation in a discussion document An Extension to the three control characters for Egyptian Hieroglyphs and some additional remarks (L2/16-214).

Examples of several instances of rare quadrat arrangements were noted that cannot be represented elegantly or unambiguously using L2/16-018R. Analysis so far suggests these account for order 0.01% (sic) of the digitised corpus but may be more common in certain ancient contexts.

Additional controls can be added to deal with rare quadrats but the issue needs to be better characterised and agreed by Egyptologists before deciding what to do. As with the basic 3 characters, data needs to be studied and evaluated before submitting a formal proposal.

Discussions in July yielded a consensus with TLA and Ramses projects on implementation of two of the three characters namely EGYPTIAN HIEROGLYPH HORIZONTAL JOINER and EGYPTIAN HIEROGLYPH VERTICAL JOINER, published as L2/16-227.

M-J Nederhof presented a discussion document L2/16-177 at the I&E meeting based on adapting his RES scheme as an alternative approach to control characters for Unicode quadrat sequences without using the horizontal or vertical joiners. This was followed by a revised document L2/16-210 with addendum L2/16-233 which outline two alternative versions and notations of his system.

There are many alternative ways one might define quadrat sequences using various levels of complexity but there would need to be convincing evidence to drop the vertical and horizontal joiners and/or require complex or hard to read sequences for simple quadrats. It is obviously important that any proposed alternatives are capable of implementation in current technology. 

Discussions continue on the Egyptian Hieroglyphs in the UCS mailing list (see archives at http://evertype.com/pipermail/egyptian_evertype.com/). Egyptologists and others with an interest in the topic are encouraged to join and participate in or follow the list.

Status of the L2/16-018R proposal

L2/16-018R was published in January 2016 as a revision to the original May 2015 publication L2/15-123. No objections had been received by UTC during May-January so the proposal was put out to international ballot as a UTC recommendation in January 2016. Comments were received by UTC in April 2016 ( L2/16-090, my reply at L2/16-104) where specific objections were made to the EGYPTIAN HIEROGLYPH LIGATURE JOINER.

Following discussions outlined above, L2/16-018R is on hold until it is determined what additional features are required to obtain consensus. I suspect the earliest this can be reviewed by UTC is January 2017.

Over the next few months it would be really useful if comments, requirements or objections about any additions can be made in a timely fashion to UTC in future to avoid further unnecessary delays.

Bob Richmond