Monday, 15 August 2016

Informatique et Égyptologie - Cambridge - 2016

A meeting of the working group “Informatique et Égyptologie” of the International Association of Egyptologists took place at the Fitzwilliam Museum, Cambridge on 11-12 July, 2016.

Presentations and discussions were almost entirely concerned with the hieroglyphic writing system in future additions to the Unicode Standard. The main areas of focus were:


A summary of the meeting by Debbie Anderson of the Script Encoding Initiative, Berkeley is available as Brief Report from Cambridge meeting of Egyptologists and Update [pdf].

Ongoing discussions following the meeting are taking place on the Egyptian Hieroglyphs in the UCS mailing list (see archives at http://evertype.com/pipermail/egyptian_evertype.com/). I'll try to deal with some of these follow-up activities in future blog posts.

If documents from presentations made at the meeting become available online, I'd like to link to them here. Let me know if or when anything becomes available. Thanks.

Bob Richmond

Monday, 18 July 2016

Simple higher level protocols and Unicode Plain Text hieroglyphic writing

Simple (plain) text has limitations in what can be expressed without additional information. There are various 'Higher Level Protocols' (HLP) used to enrichen all kinds of writing systems. HTML/CSS and various document formats such as those used in 'office' applications are kinds of HLP managed as international standards. To use Unicode Egyptian hieroglyphic along with these complex protocols it is desirable for Egyptologists and others to develop conventions of how to work with these protocols in consistent ways so as to be able to share rich data. Something for the future.

However, not all protocols need be complex like HTML/CSS.

Consider hieroglyph combinations already encoded as characters in Unicode. 𓃁 (ab) is 𓂝 (a) on top of  𓃀 (b). Gardiner calls these combinations Monograms and the current practice in Unicode is to encode monograms as separate characters.

As things stand in Unicode there is no character for 𓂧 (d) on top of  𓃀 (b) - (db) rare as a monogram. For many applications and users this is not a problem but this may crop up in some situations like corpus database.

The answer is to define or adopt a simple higher level protocol. For instance use the character '+' to indicate 'on top of' so (db) is encoded 𓂧+𓃀. Similarly for any other monograms not yet encoded. This is still plain text just not pure hieroglyphic so can be used in applications and databases. To render this combination visually software will need adaptation but this is something that needs to be done in general for hieroglyphs not yet encoded in Unicode so there is little impact.

This reasoning applies to other specialist writing features for which there is not sufficiently strong evidence to warrant direct support in plain text at this time. I expect various conventions will evolve as users gain more familiarity with Unicode.

There is an active discussion as whether the policy of encoding monograms as separate characters should be continued or an additional control character should be introduced to take the role of + as given here. Either way, the method described here enables work on Unicode solutions to continue regardless.

Bob Richmond


Thursday, 7 July 2016

Transcription of Hieroglyphic into Unicode Hieroglyphic Plain Text: Part 1

Modern web pages and printed publications are dramatically more elaborate than documents from the 1950s which in turn used advances in printing technologies unavailable to our ancestors. Complex examples surround us online and in print. Newspapers and magazines, product labelling, instruction manuals, advertisements, reference works, textbooks and specialist publications the list goes on and on. Graphics, emojis, icons often complement textual elements which themselves increasingly use a variety of writing systems. Even when written in a language such as English, with its simple alphabet and easy to understand notion of plain text, text in documents can be very elaborate and feature a range of scripts, fonts and typographic styles.

Turning attention to hieroglyphic, consider the Middle Kingdom Stela (BM EA851, from BM page):


There is a line of horizontal text and eight columns of vertical text beneath, along with some pictures/hieroglyphs. Note: traditionally, an Egyptologist might create a line drawing of a stela rather than use a photograph as here; especially useful when there is damage or hard to read hieroglyphs. To discuss the text content, the text elements may be extracted and transcribed, often transcribing vertical text into horizontal writing. Hieroglyphic fonts have been available for 150 years and can be useful for this purpose, Transcription to Unicode Plain Text Hieroglyphic (when standardised) may be used in a similar way.

Modern technology has changed the way we look at artefacts that feature Ancient Egyptian in hieroglyphic. Our eyes are now used to elaborate web pages and publications in everyday life. The idea of arranging text and graphics in the Egyptian way is not as unusual as it would have appeared to scholars or general reader in the 19th and 20th century. Older books on the topic can therefore emphasise different and unfamiliar but this view is becoming increasingly anachronistic to the modern reader.

Web technology and editing tools are also designed around increasing complexity of documents and this means there are now natural ways to think of transcription of hieroglyphic sources in terms of modern technology and standards. In some cases this may mean changes, sometimes subtle, to traditional thinking on digital representations of hieroglyphic.

It is instructive to look at any complex web page based around a simple basic writing system such as English and consider how it might be transcribed as plain text. This exercise can help understand what needs to be considered for Egyptian and avoid over-enthusiastic expectations as to what can be expected from hieroglyphic in Unicode plain text.

This post is is about transcription of hieroglyphic sources to Unicode plain text (however it ends up being standardised). It is important to distinguish this practice from hieratic transcription to hieroglyphic (see Transcription of Hieratic into Unicode Hieroglyphic: Part 1 and later posts on the topic) which uses the same encoding system but has its own specific issues.

There are several points on transcription I'd like to make now in this introduction.
  • Limitations to plain text transcription implied by fonts. Obviously a general purpose hieroglyphic font cannot be expected to contain hieroglyphs that look exactly the same as those in an original document, inscription or painting. The same hieroglyph may be written differently in the same document.
  • Different fonts for plain text may render hieroglyph clusters in different ways. For example a font optimised to represent text in a traditional Manuel de Codage (MdC) layout may look different to a font designed to make a more pleasing generic representation. A font optimised for Late Egyptian hieratic transcription may make different choices for its subject matter. At some point we will have colour fonts in various styles. So far, most Egyptologists are only familiar with editing tools that are limited to a single font. Choice and variety enable a large step forward but can require some fresh thinking.
  • Higher level protocols are essential for many scholarly requirements from hieroglyphic. This will involve defining conventions for web pages (HTML/CSS) and new encoding formats using XML, near plain text, or adaptations of other formats. Work on this has scarcely begun.
  • Transcription is not meant to replace facsimile, photography or other techniques for representing hieroglyphic sources. In many cases the goal is to augment the source in order to promote search-ability, analysis and readability.
  • Typically an Egyptologist will transcribe into left to right horizontal text. If the source material uses columns the vertical text arrangements of hieroglyphs is often not copied verbatim but adapted into a more horizontal style of writing. Partly this is about convenience and uniformity but it is also the case that hieroglyphic fonts don't yet support right to left. General purpose software applications such as word processors are not yet geared up to work terribly well with column text. This is not an intrinsic limitation of the Unicode concept of plain text but likely a practical issue for the next few years at least.
I hope to return to various facets of this large topic in future posts.

Bob Richmond


Monday, 27 June 2016

Unicode Hieroglyphs in web browsers: generic web pages

In an ideal world, Egyptian hieroglyphs in Unicode will simply appear as such to the reader when they are used on a website page. The general reader should not have to do anything special such as install a font or configure the web browser in order to display Unicode text.

Egyptian-aware web pages may render text as images (as mentioned in various earlier posts, e.g. on WikiHiero) or use web-fonts to provide a full text experience. These techniques generally work well with reasonably up to date web browsers on all kinds of devices.

However, this post is concerned with generic web pages meaning web pages without any special coding for Egyptian hieroglyphs. These pages rely on the web browser and/or its underlying operating system to render correctly.

Probably the best known example of a generic web page is the Google search page www.google.com. This blog post itself is also such a web page, hosted on www.blogspot.com so if all is well with the browser and device on which you are reading you should see hieroglyphs at the end of this sentence - 𓇳𓄟𓋴𓋴.

Then if you copy then paste these four hieroglyphs into Google search, you see something like this:


I've highlighted the hieroglyphs in red boxes. Search is just one example. If you want to write hieroglyphs in web forums or read hieroglyphs as text on an arbitrary web page chances are you are reliant on generic hieroglyph support from your web-browser.

Modern versions of macOS/iOS from Apple and Windows 10 from Microsoft are Egyptian hieroglyphic ready 'out of the box' and so are their supplied web browsers (Safari, Edge or Internet Explorer) . So is a correctly configured Linux distro. Generic web pages just work for users of hundreds of millions of these modern devices.

If this works for you, great. The browser or system you are using is Unicode hieroglyphic ready and you are done with this post unless you are curious about technicalities.

However the global picture is not yet entirely rosy. Android, by far the most popular system for mobile phones and tablets, is not yet up to speed. Old versions of iOS/macOS and Windows are not hieroglyph ready. When you try the Google search on these kinds of system you likely see box characters where hieroglyphs are expected unless something has been added to these systems to avoid problems.

Why hieroglyphs work (or don't)

Technically for a web browser to render hieroglyphic text on a generic web page all it needs to do is:

1. Recognise the text characters are hieroglyphs.
2. Use a font with hieroglyphs to render the characters.

The Google search illustration above is from Windows 10 using Internet Explorer 11. Everything works as expected because Windows 10 comes with the Segoe UI Historic font (which contains Unicode hieroglyphs). Internet Explorer recognises hieroglyph characters and, in the absence of any more specific font specifications on the web page, uses the Segoe font as default. The Microsoft Edge and Mozilla Firefox browsers also work correctly.

However, even on Windows 10, the Chrome browser (version 59 and earlier) is not hieroglyph ready. It does not detect hieroglyphs on a web page and choose an available font. Chrome 59 serves as an illustration of how things can go wrong when a browser has bugs in font handling.

The latest release of Android (6.01) has an additional problem in that Android comes with a very limited number of fonts installed. Hieroglyphic is not and this is just one of a large number of writing systems that therefore don't render in Chrome, the standard Android web browser, because no font is available or automatically downloaded when required. In theory a Android device could work when the device maker supplies an enhanced version of  Android. In practice I've never seen this happen. Presumably we'll see a working version of Chrome from Google eventually. Meanwhile Chrome should be fine for most Egyptian-aware web-pages.

Getting hieroglyphs to work for generic web pages

With the huge range of devices and browsers available nowadays it is impossible to give detailed information about what may or may not be done to address problems if your setup is not hieroglyph ready.

In many cases the simplest solution is to update your system setup if possible and/or ensure your web browser version is kept up to date.The technical world has moved a long way since 2009 when hieroglyphs were introduced to the Unicode standard but initially unsupported in the then latest systems such as Ubuntu 10.04 (Lucid Lynx), Mac OS X 10.6 (Snow Leopard) and Windows 7.

Updating is easy for most Linux distros, Apple now provides macOS/iOS updates free of charge and likewise Microsoft provides updates from Windows 7/8 to Windows 10 (strangely, this is currently stated to be free of charge for only a limited time until late July 2016).

If updating is impossible, it seems the Firefox browser works correctly on Linux and Windows devices I've tried so long as it finds a Unicode hieroglyphic font installed. This may work for you. In particular if you are stuck with an old PC setup with Windows 7 it should be sufficient to install a Unicode hieroglyphic font then use Firefox for generic web-browsing instead of Internet Explorer or Chrome. Other less well known browsers may also work.


Bob Richmond

Tuesday, 21 June 2016

Hieroglyphs on the web: The Digital Topographical Bibliography

The Topographical Bibliography of Ancient Egyptian Hieroglyphic Texts, Statues, Reliefs and Paintings (Topographical Bibliography or TopBib for short) is a long running project (work began in the early 20th Century) based at the Griffith Institute, Oxford. The first part of Volume I of the Bibliography - THE THEBAN NECROPOLIS PART 1. PRIVATE TOMBS - was published in 1927. Since then new volumes and revised versions have been published.

The first seven printed volumes of TopBib followed the Topographical approach. The more recent Volume VIII - OBJECTS OF PROVIDENCE NOT KNOWN (2000-) instead, for obvious reasons, organises objects by type and period.

A brief  History of the Topographical Bibliography is available on the Griffith Institute website.

TopBib is the essential and definitive resource concerning Ancient Egyptian objects.

Behind the printed volumes, modern technology was introduced to TopBib during the editorship of Jaromir Malek with use of databases for digitised versions of the text, hieroglyph encodings and so forth. Some material was made available on the web and the notion of an online version of the Bibliography devised. Aside from the benefits of making digitised material available to scholars, technology helps work on TopBib to continue more efficiently (there is a large amount of material available yet to be analysed and published).

The Digital Topographical Bibliography currently provides access to the printed editions (mostly in PDF format). A small amount of material is also available in new data formats and a useful reference system has been introduced (see DIGITAL TOPOGRAPHICAL BIBLIOGRAPHY: The Digital Approach).

TopBib was studied as part of the process of adding Egyptian Hieroglyphs to the Unicode Standard. The printed editions originally employed the Gardiner 'Oxford' font used as the primary reference for the initial hieroglyph repertoire.

The idea of using Unicode plan text hieroglyphic for Digital TopBib goes back over a decade and its a good example of a major publication for which plain text meets all hieroglyph requirements.Much work needs to be done to complete the first digital edition but this should benefit greatly from plain text availability.

Bob Richmond

Thursday, 16 June 2016

Transcription of Hieratic into Unicode Hieroglyphic: Part 2

This post follows on from my earlier Transcription of Hieratic into Unicode Hieroglyphic: Part 1 which stressed the importance of hieratic transcription as an application of Unicode hieroglyphic.

The initial Unicode collection of Egyptian hieroglyphs released in 2009 was based on the Gardiner font and sign list. There is fairly good coverage of signs required for transcription but it is useful to consider how the situation can be improved in the context of Extending the Hieroglyph repertoire in Unicode.

One influential work on hieratic was Hieratische Paläographie by Georg Möller (1876-1921), published in four volumes: Volume I-III 1909-12 and Volume IV 1936 (with introduction by Hermann Grapow). [PDF versions are available for download here]. 

Möller employs numeric codes for hieroglyphs corresponding to hieratic elements as seen in this illustration from Volume I.:



Volume II pp 71-74 links these hieratic codes to the alphanumeric hieroglyph codes used in the Theinhardt font produced for Lepsius.

Hieratic examples are given from a variety of sources, organized by different periods from Old to Late Egyptian through to the Greco-Roman period. Some examples from Volume I:


Here, Möller codes 200 and 200b match Gardiner codes G43 and Z7 and hence Unicode 𓅱 G043 and 𓏲 Z007. Gardiner was very familiar with Hieratische Paläographie and its coding system so it is unsurprising that hieroglyphs in the Gardiner font and coding system links to the Möller numeric system. See Identification of the signs from the Hieratische Paläographie [M-J Nederhof website] for a list of matches between encoded hieroglyphs and Möller codes.

Nevertheless not all Möller codes are present in the Gardiner font and sign list. For example Möller 131 (mouse: encoded as E130 in Hieroglyphica but not yet in the Unicode repertoire).

Möller provides lists of groups/ligatures such as:


which need to be available in any Unicode plain text system.

Regarding extensions to the Unicode hieroglyph repertoire, it is desirable to add Möller codes to the Unicode hieroglyph database of candidates for encoding. His work is over a century old but has been influential and any errors known to modern Egyptologists can be identified in the database (when available for review).

I will be recommending hieroglyphs found in the Möller list but not yet encoded in Unicode be included in the next set of hieroglyphs to be included in the standard and thereby improve the scope of Unicode for transcription of hieratic to hieroglyphic.

This is not to ignore more recent scholarly work involving hieratic. If well-documented material from modern databases such as the Ramses Project and Thesaurus Linguae Aegyptiae or other publications is available to further improve hieratic transcription this data should also be added to the Unicode hieroglyph database of candidates for encoding.

As a point of interest, I am documenting all Möller groups/ligatures (where applicable) in the cluster list referred to in Foundations of a Universal Egyptian Hieroglyphic Writing System in Unicode plain text.

Bob Richmond


Wednesday, 15 June 2016

Hieroglyphs on the web: Thesaurus Linguae Aegyptiae

The Thesaurus Linguae Aegyptiae (TLA) is an Ancient Egyptian virtual dictionary and thesaurus intended to provide a specialist tool for lexicographic research into the Egyptian language. The web application and content is developed at the Berlin-Brandenburg Academy of Sciences and Humanities with contributions from various sources. Much of the content is in German but help text and some content is also available in English.

The TLA corpus includes Ancient Egyptian writings from the entire historical period from Old Kingdom through to the Greco-Roman period. The majority of the content is concerned with the Egyptian Language as written in hieroglyphic and hieratic scripts but there is also a Demotic database included in the current version.

Features of the website include a browsable version of Wörterbuch der Ägyptischen Sprache. [Erman, Adolf (editor), and Herman Grapow (editor)], a digitised slip archive from the Wörterbuch and the Vormanuskript (preliminary manuscript) of the Wörterbuch.

The TLA dictionary itself apparently started with a list of entries from the Wörterbuch by Horst Beinlich (the Beinlich Wordlist). The word list / lemma list has been expanded and developed since then.

The TLA web site went online in 2004 and has undergone various additions and improvements since (the current version is dated October 2014). It remains a work in progress. To access the material you may log in as a guest or register as a user.

The TLA is an invaluable resource for Egyptologists, including a huge volume of material. The database and website designs are very much oriented at specialists. To make the most of the search and research features it is necessary to study the help text and invest some and effort time in finding your way around the site and studying its search functionality.

Hieratic is transcribed as hieroglyphic. TLA uses a subset of the Hieroglyphica sign list for its hieroglyphs, with some additions. A large proportion of hieroglyphs used are already encoded in Unicode. It would be useful to identify those that are not yet encoded so these can be featured in the next update to the Unicode hieroglyph repertoire.

Hieroglyphic writings in TLA are currently implemented as graphics (as with Ramses Online; as outlined in my recent blog entry). As with Ramses, Unicode plain text would open up interesting possibilities for TLA. This should be fairly straightforward to implement in TLA once related work in the Unicode standard is completed.

Bob Richmond