Monday 23 May 2016

Unicode plain text proposal status (May 2016)

The latest Unicode Technical Committee (UTC) discussion about the Egyptian Hieroglyphic Writing system as Unicode plain text is available from the Unicode web site in Recommendations to UTC #147 May 2016 on Script Proposals [pdf]. This document also contains an update on the status of work being done to extend the repertoire of Egyptian Hieroglyphs in Unicode.

I hope to produce a first draft of a list of clusters of hieroglyphs required for plain text using all three L2/16-018R [pdf] control characters sometime in the next few weeks (this draft data initially to be published on www.egpz.org).

Bob Richmond

Hieroglyphs on the web: Egyptian hieroglyph character picker

A recent web application for working with Unicode hieroglyphs is an Egyptian hieroglyph character picker (EHCP) by Richard Ishida; the latest in his collection of Unicode character pickers on http://r12a.github.io/.

Technical note. EHCP uses Unicode hieroglyphs for most purposes but hieroglyph group rendering follows the WikiHiero method of image arrangement (see my earlier post Hieroglyphs on the web: WikiHiero). One difference is EHCP replaces the WikiHiero PHP code (which runs on the remote server) with JavaScript code that runs in the local web browser instead. Two benefits of JavaScript are 1. better performance in many circumstances and 2. No need to restrict the software to PHP server pages.

The EHCP application is at http://r12a.github.io/pickers/egyptian/ with documentation at

EHCP is useful for experimenting with Unicode. It is not intended to be a fully-fledged hieroglyphic editor. Once Unicode plain text hieroglyphic is available it should be very straightforward to modify EHCP to work with plain text hieroglyphic fonts and eliminate the need for WikiHiero hieroglyphs as images,


Bob Richmond

Tuesday 17 May 2016

Foundations of a Universal Egyptian Hieroglyphic Writing System in Unicode plain text

A basic collection of 1071 Egyptian hieroglyph characters was added to Unicode in 2009 (Unicode 5.2), the conclusion of a process that began in 2005 and during which it was decided not to release a hieroglyphic plain text writing system at this first stage.

To put it simply it is impossible at present to use Unicode hieroglyphs as a writing system.

 cannot be written using Unicode alone.

Works is now underway to take Unicode hieroglyphs to the next level and enable a writing system.

One fundamental point to understand is a Universal Egyptian Hieroglyphic Writing System in Unicode will be accessible by billions of people. A dramatic change from the current situation where specialist tools available to Egyptologists, students and others are used by at most thousands of individuals who all have a greater or lesser degree of knowledge about how hieroglyphic works as a writing system.

The ancient Egyptians did not write or arrange hieroglyphs randomly; the writing system uses a variety of informal and unwritten rules. Certainly there was much flexibility and styles of writing were not static but the overall shape of the writing system was consistent for over 3000 years of everyday use. A Universal writing system must attempt to somehow take these characteristics into account.

Consider the following arrangement of hieroglyphs produced in a traditional Manuel de Codage (MdC) hieroglyph editing application used by Egyptologists:
This sequence of three rectangular arrangements of hieroglyphs contains Egyptian ‘alphabet’ characters spelling out p-a-r-t-y hidden among some random hieroglyphs added for fun. An Egyptologist would recognize this as unauthentic hieroglyphic. Imagine the billions of other random arrangements of hieroglyphs that could be created by accident or for humorous, mischievous or malicious intent. It is unnecessary to know much about the hieroglyphic writing system to appreciate that if Unicode allowed for arbitrary un-Egyptian arrangements of hieroglyphs like p-a-r-t-y the situation would quickly become ludicrous. 

To mitigate this situation, it was clear while designing a plain text hieroglyphic writing system for Unicode that it is essential to limit the ways hieroglyphs can be arranged. The simplest solution is to construct a list of known valid arrangements of groups of hieroglyphs which make sense in the writing system and are attested in ancient sources. Then publish this list alongside the Unicode standard so developers and font designers know exactly what is required from implementations. That way plain text writings can only use well-defined features of the writing system. This approach is included as part of Proposal to encode three control characters for Egyptian Hieroglyphs (latest version L2/16-018R [pdf], January 2016) which is currently being reviewed for possible inclusion in Unicode 10 (2017).

It is inevitable that the initial release of such a list may be missing some perfectly valid arrangements so it can be expected to grow over time as experts make increasing use of the hieroglyphic plain text writing system and additional valid but less commonplace arrangements for plain text are identified.

Experts using hieroglyphic will encounter the obvious problem that ancient scribes didn't follow technical or style guidelines so there will be hieroglyphic writings that it might seem desirable to encode digitally as text but don't entirely fit into a plain text system either in principle or as it is defined at a given time. The simple answer to those who encounter this limitation is to either 1. represent the original writing in a more standard form. 2. use an existing digital encoding scheme that is not based on Unicode plain text or 3. use some new system built on Unicode plain text principles but with higher level features that allow for more elaborate writing or rendering of the writing.

Feedback about the three control character proposal since it was published over a year ago has shown this basic point can be difficult to grasp by some who are familiar with the flexibility of traditional MdC systems. I hope this post helps explain the simple reason why limitations are unavoidable whatever plain text system is adopted.

As a point of interest, I’ll note that experiments prior to L2/16-018R suggested a minimum of 3000 entries in the initial ‘valid’ list would address a very large proportion of requirements. Although the actual number listed to begin with will be likely somewhat higher.

Bob Richmond

Monday 16 May 2016

Hieroglyphs on the web: WikiHiero


Wikipedia uses a simple technique to render simple hieroglyphic on a web page by arranging graphics of individual hieroglyphs to simulate the look of the hieroglyphic writing system. The Wikipedia page  https://en.wikipedia.org/wiki/Transliteration_of_Ancient_Egyptian contains examples such as:

This feature of Wikipedia uses software called WikiHiero, first developed by Guillaume Blanchard in 2004. This software is open source, licensed under GPL 2, and continues to be maintained by various contributors.

Technical summary. WikiHiero creates hieroglyph arrangements on a web page as bitmap graphics from a source encoding that follows much of that part of Manuel de Codage (MdC) that deals with hieroglyph encoding. The source of the web page contains elements such as <hiero>M23-X1:R4-X8-Q2:D4-W17-R14-G4-R8-O29:V30-U23-N26-D58-O49:Z1-F13:N31-V30:N16:N21*Z1-D45:N25</hiero> (for the illustration above). These elements are converted into the arrangement of graphics by the WikiHiero software running on the web server before the page is downloaded to a web browser. This process requires the web page is implemented using PHP at the server and is therefore limited to web sites that use PHP such as Wikipedia. See the WikiHiero home page at https://www.mediawiki.org/wiki/Extension:WikiHiero for details.

In my opinion, the greatest strength of the WikiHiero design is the fact that it generates web pages that work over a wide range of web-browsers including many obsolete browser versions. This is a major benefit for a web site like Wikipedia and other web sites built on sufficiently similar technology.

The main downside of WikiHiero for simple hieroglyphic is the fact that hieroglyphs are no more than graphics on the web page. WikiHiero pre-dates Unicode hieroglyphs and has not yet been adapted for use with the Unicode Standard.  This means WikiHiero hieroglyphs are not detected by search engines such as Google or Bing and Egyptian hieroglyphic cannot be used in the same way as other writing systems in Wikipedia.

Tip. If you encounter WikiHiero hieroglyphic on Wikipedia and you want to copy the text into a hieroglyph editor such as JSesh choose the [edit] option and you will see the <hiero>…</hiero> encoding. You can then copy the MdC content enclosed between the tags. Likewise, if you want to edit a Wikipedia page you can add your own hieroglyphs by wrapping your MdC in a <hiero>…</hiero> element.

Another potential benefit of WikiHiero I’d like to point out. The implementation of <hiero>…</hiero> encoding ought to be fairly simple to update to modern technology when the time is ripe. This means you shouldn’t feel put off contributing to Wikipedia now if your Egyptian MdC content works with WikiHiero. A future implementation of Wikipedia could elect to turn <hiero>…</hiero> into Unicode hieroglyphic plain text rather than graphics. In which case your content would ‘magically’ become accessible as text to search engines and so forth. Text quality would improve considerably by the use of a font and advanced typography rather than WikiHiero simplified layout of bitmap images.

Aside from its use in Wikipedia, WikiHiero has also been used to implement a simple MdC editor. See http://aoineko.free.fr/index.php?lang=en. This editor is also interesting in that it shows the detailed HTML encoding generated by WikiHiero from MdC.


Bob Richmond

Sunday 8 May 2016

Extending the Hieroglyph repertoire in Unicode


At time of writing, the latest draft proposal about additional Egyptian hieroglyphs is L2/16-079 “Preliminary draft for the encoding of an extended Egyptian Hieroglyphs repertoire” http://www.unicode.org/L2/L2016/16079-hieroglyphs.pdf by Michel Suignard, dated 2016-04-11. This is the latest of several iterations by Suignard, the first of which was a preliminary draft L2/15-240 dated 2015-10-09.

An important part of this proposal is a database containing basic information about each of the encoded hieroglyphs. This database to be maintained on the Unicode website. Over 6000 additional hieroglyphs are proposed in addition to the 1071 hieroglyphs encoded in Unicode already and this basic data should make it reasonably straightforward for software tools, fonts and so forth to work with the expanded repertoire.

There are a number of open issues for discussion in the draft proposal. I hope to write on some of these topics in future blog posts.

One point I’d like to make now: many of the additional hieroglyphs first appear in the Greco-Roman period so it is likely that fonts and tools aimed at classical Egyptian from Old Kingdom to New Kingdom omit or downplay these additions. In fact, for much use of digital hieroglyphic writing systems I suspect popular fonts will contain at most hundreds of additions to the current Unicode standard set rather than thousands. Time will tell.


I’m not personally involved in developing this proposal but agree with the overall aim to enrichen Unicode support for hieroglyphs. I don’t know what the thinking is on timescales for completing the proposal but don't see any reason it can't be finished this year. So it seems to me that if Egyptologists or others have ideas that might help improve what is being proposed, the time to be helpful is to communicate what you have to say during Summer 2016.

Bob Richmond