Tuesday, 17 May 2016

Foundations of a Universal Egyptian Hieroglyphic Writing System in Unicode plain text

A basic collection of 1071 Egyptian hieroglyph characters was added to Unicode in 2009 (Unicode 5.2), the conclusion of a process that began in 2005 and during which it was decided not to release a hieroglyphic plain text writing system at this first stage.

To put it simply it is impossible at present to use Unicode hieroglyphs as a writing system.

 cannot be written using Unicode alone.

Works is now underway to take Unicode hieroglyphs to the next level and enable a writing system.

One fundamental point to understand is a Universal Egyptian Hieroglyphic Writing System in Unicode will be accessible by billions of people. A dramatic change from the current situation where specialist tools available to Egyptologists, students and others are used by at most thousands of individuals who all have a greater or lesser degree of knowledge about how hieroglyphic works as a writing system.

The ancient Egyptians did not write or arrange hieroglyphs randomly; the writing system uses a variety of informal and unwritten rules. Certainly there was much flexibility and styles of writing were not static but the overall shape of the writing system was consistent for over 3000 years of everyday use. A Universal writing system must attempt to somehow take these characteristics into account.

Consider the following arrangement of hieroglyphs produced in a traditional Manuel de Codage (MdC) hieroglyph editing application used by Egyptologists:
This sequence of three rectangular arrangements of hieroglyphs contains Egyptian ‘alphabet’ characters spelling out p-a-r-t-y hidden among some random hieroglyphs added for fun. An Egyptologist would recognize this as unauthentic hieroglyphic. Imagine the billions of other random arrangements of hieroglyphs that could be created by accident or for humorous, mischievous or malicious intent. It is unnecessary to know much about the hieroglyphic writing system to appreciate that if Unicode allowed for arbitrary un-Egyptian arrangements of hieroglyphs like p-a-r-t-y the situation would quickly become ludicrous. 

To mitigate this situation, it was clear while designing a plain text hieroglyphic writing system for Unicode that it is essential to limit the ways hieroglyphs can be arranged. The simplest solution is to construct a list of known valid arrangements of groups of hieroglyphs which make sense in the writing system and are attested in ancient sources. Then publish this list alongside the Unicode standard so developers and font designers know exactly what is required from implementations. That way plain text writings can only use well-defined features of the writing system. This approach is included as part of Proposal to encode three control characters for Egyptian Hieroglyphs (latest version L2/16-018R [pdf], January 2016) which is currently being reviewed for possible inclusion in Unicode 10 (2017).

It is inevitable that the initial release of such a list may be missing some perfectly valid arrangements so it can be expected to grow over time as experts make increasing use of the hieroglyphic plain text writing system and additional valid but less commonplace arrangements for plain text are identified.

Experts using hieroglyphic will encounter the obvious problem that ancient scribes didn't follow technical or style guidelines so there will be hieroglyphic writings that it might seem desirable to encode digitally as text but don't entirely fit into a plain text system either in principle or as it is defined at a given time. The simple answer to those who encounter this limitation is to either 1. represent the original writing in a more standard form. 2. use an existing digital encoding scheme that is not based on Unicode plain text or 3. use some new system built on Unicode plain text principles but with higher level features that allow for more elaborate writing or rendering of the writing.

Feedback about the three control character proposal since it was published over a year ago has shown this basic point can be difficult to grasp by some who are familiar with the flexibility of traditional MdC systems. I hope this post helps explain the simple reason why limitations are unavoidable whatever plain text system is adopted.

As a point of interest, I’ll note that experiments prior to L2/16-018R suggested a minimum of 3000 entries in the initial ‘valid’ list would address a very large proportion of requirements. Although the actual number listed to begin with will be likely somewhat higher.

Bob Richmond

No comments:

Post a Comment