Monday 18 July 2016

Simple higher level protocols and Unicode Plain Text hieroglyphic writing

Simple (plain) text has limitations in what can be expressed without additional information. There are various 'Higher Level Protocols' (HLP) used to enrichen all kinds of writing systems. HTML/CSS and various document formats such as those used in 'office' applications are kinds of HLP managed as international standards. To use Unicode Egyptian hieroglyphic along with these complex protocols it is desirable for Egyptologists and others to develop conventions of how to work with these protocols in consistent ways so as to be able to share rich data. Something for the future.

However, not all protocols need be complex like HTML/CSS.

Consider hieroglyph combinations already encoded as characters in Unicode. 𓃁 (ab) is 𓂝 (a) on top of  𓃀 (b). Gardiner calls these combinations Monograms and the current practice in Unicode is to encode monograms as separate characters.

As things stand in Unicode there is no character for 𓂧 (d) on top of  𓃀 (b) - (db) rare as a monogram. For many applications and users this is not a problem but this may crop up in some situations like corpus database.

The answer is to define or adopt a simple higher level protocol. For instance use the character '+' to indicate 'on top of' so (db) is encoded 𓂧+𓃀. Similarly for any other monograms not yet encoded. This is still plain text just not pure hieroglyphic so can be used in applications and databases. To render this combination visually software will need adaptation but this is something that needs to be done in general for hieroglyphs not yet encoded in Unicode so there is little impact.

This reasoning applies to other specialist writing features for which there is not sufficiently strong evidence to warrant direct support in plain text at this time. I expect various conventions will evolve as users gain more familiarity with Unicode.

There is an active discussion as whether the policy of encoding monograms as separate characters should be continued or an additional control character should be introduced to take the role of + as given here. Either way, the method described here enables work on Unicode solutions to continue regardless.

Bob Richmond


Thursday 7 July 2016

Transcription of Hieroglyphic into Unicode Hieroglyphic Plain Text: Part 1

Modern web pages and printed publications are dramatically more elaborate than documents from the 1950s which in turn used advances in printing technologies unavailable to our ancestors. Complex examples surround us online and in print. Newspapers and magazines, product labelling, instruction manuals, advertisements, reference works, textbooks and specialist publications the list goes on and on. Graphics, emojis, icons often complement textual elements which themselves increasingly use a variety of writing systems. Even when written in a language such as English, with its simple alphabet and easy to understand notion of plain text, text in documents can be very elaborate and feature a range of scripts, fonts and typographic styles.

Turning attention to hieroglyphic, consider the Middle Kingdom Stela (BM EA851, from BM page):


There is a line of horizontal text and eight columns of vertical text beneath, along with some pictures/hieroglyphs. Note: traditionally, an Egyptologist might create a line drawing of a stela rather than use a photograph as here; especially useful when there is damage or hard to read hieroglyphs. To discuss the text content, the text elements may be extracted and transcribed, often transcribing vertical text into horizontal writing. Hieroglyphic fonts have been available for 150 years and can be useful for this purpose, Transcription to Unicode Plain Text Hieroglyphic (when standardised) may be used in a similar way.

Modern technology has changed the way we look at artefacts that feature Ancient Egyptian in hieroglyphic. Our eyes are now used to elaborate web pages and publications in everyday life. The idea of arranging text and graphics in the Egyptian way is not as unusual as it would have appeared to scholars or general reader in the 19th and 20th century. Older books on the topic can therefore emphasise different and unfamiliar but this view is becoming increasingly anachronistic to the modern reader.

Web technology and editing tools are also designed around increasing complexity of documents and this means there are now natural ways to think of transcription of hieroglyphic sources in terms of modern technology and standards. In some cases this may mean changes, sometimes subtle, to traditional thinking on digital representations of hieroglyphic.

It is instructive to look at any complex web page based around a simple basic writing system such as English and consider how it might be transcribed as plain text. This exercise can help understand what needs to be considered for Egyptian and avoid over-enthusiastic expectations as to what can be expected from hieroglyphic in Unicode plain text.

This post is is about transcription of hieroglyphic sources to Unicode plain text (however it ends up being standardised). It is important to distinguish this practice from hieratic transcription to hieroglyphic (see Transcription of Hieratic into Unicode Hieroglyphic: Part 1 and later posts on the topic) which uses the same encoding system but has its own specific issues.

There are several points on transcription I'd like to make now in this introduction.
  • Limitations to plain text transcription implied by fonts. Obviously a general purpose hieroglyphic font cannot be expected to contain hieroglyphs that look exactly the same as those in an original document, inscription or painting. The same hieroglyph may be written differently in the same document.
  • Different fonts for plain text may render hieroglyph clusters in different ways. For example a font optimised to represent text in a traditional Manuel de Codage (MdC) layout may look different to a font designed to make a more pleasing generic representation. A font optimised for Late Egyptian hieratic transcription may make different choices for its subject matter. At some point we will have colour fonts in various styles. So far, most Egyptologists are only familiar with editing tools that are limited to a single font. Choice and variety enable a large step forward but can require some fresh thinking.
  • Higher level protocols are essential for many scholarly requirements from hieroglyphic. This will involve defining conventions for web pages (HTML/CSS) and new encoding formats using XML, near plain text, or adaptations of other formats. Work on this has scarcely begun.
  • Transcription is not meant to replace facsimile, photography or other techniques for representing hieroglyphic sources. In many cases the goal is to augment the source in order to promote search-ability, analysis and readability.
  • Typically an Egyptologist will transcribe into left to right horizontal text. If the source material uses columns the vertical text arrangements of hieroglyphs is often not copied verbatim but adapted into a more horizontal style of writing. Partly this is about convenience and uniformity but it is also the case that hieroglyphic fonts don't yet support right to left. General purpose software applications such as word processors are not yet geared up to work terribly well with column text. This is not an intrinsic limitation of the Unicode concept of plain text but likely a practical issue for the next few years at least.
I hope to return to various facets of this large topic in future posts.

Bob Richmond