Every-day Unicode

Håkon Robbestad Gylterud

This document is a short guide to writing unicode on systems with Xorg X11, such as Linux or *BSD.

Unicode is a systematisation of symbols, and contains just about any symbol you can imagine. The most common encoding of Unicode on computers is called UTF8, which overlaps with ASCII on the first 7 bits.

XCompose

Xorg has allow configuration of a compose key for mapping sequences of keystrokes to characters. For example, on my current setup i can type the greek letter π (pi), using the keystroke sequnce <RWIN> * p. That is first pressing once on the menu-button, then typing *, followed by p. Notice that these keys are pressend in sequence, not simultaneously.

The XCompose file

You can replace the default sequence mapping with a costom one, by writing it to the file ~/.XCompose. I use an XCompose file which derives from the Plan 9 keyboard file. It contains the complete cyrillic and greek alphabets, along with a lot of mathematical notation and useful everyday symbols. The following command gets the file, and you can inspect it.

curl -o ~/.XCompose https://hakon.gylterud.net/tutorials/XCompose

The lines of the XCompose file, each one giving a way to input a symbol, looks like:

<Multi_key> <minus> <greater> : "→" U2192

The <Multi_key> <minus> <greater> part tells you which sequence of keys to enter to produce the symbol. The sign in question is “→”, which happens to be unicode character with code U2192.

Setting the compose key

To specify which key shall be the compose key, one can use the command setxkbmap. For example the following command makes the «menu»-key the compose key.

setxkbmap -option "compose:menu"

I have this command in my .xinitrc so that it is run every time I start XOrg.

Another good choice may be right «windows»-button, rwin.

Note: If you want the right shift button to be your multikey, you need to use xmodmap to remap it:

xmodmap -e "keysym Shift_R = Multi_key"

GTK and QT applications

The following environment variable assignments hint GTK and QT applications to use the same compose settings as the rest of Xorg. Put them somewhere they will be exported to the relevant programs — I have mine in .xinitrc.

export GTK_IM_MODULE=xim
export QT_IM_MODULE=xim 

What unicode characters are useful?

Accents

Accents are often used in names, and being able to correctly write some ones name shows respect and care for that person.

Here is an small selection from the XCompose file, to give you an idea of how the various accents are constructed.

<Multi_key> <apostrophe> <e> : "é" U00E9
<Multi_key> <a> <e> : "æ" U00E6
<Multi_key> <o> <a> : "å" U00E5
<Multi_key> <slash> <o> : "ø" U00F8
<Multi_key> <quotedbl> <a> : "ä" U00E4
<Multi_key> <v> <g> : "ǧ" U01E7
<Multi_key> <asciicircum> <e> : "ê" U00EA
<Multi_key> <underscore> <o> : "ō" U014D
<Multi_key> <comma> <c> : "ç" U00E7

Smileys

Textual communication can be difficult to master, and perhaps the most difficult things to convey are humour and sarcasm. If you want to make sure the reader understands that you are not completely serious you can include a smiley.

<Multi_key> <colon> <parenleft> : "☹" U2639
<Multi_key> <colon> <parenright> : "☺" U263A

The «em» dash & the «en» dash

Correct usage of dashes can be confusing, but a first step to mastering this art is to recognise that there are more than one. The key which is usually presented on the key board is not a dash, but rather a hyphen.

My personal favorite dash is the em dash, which is one «em» wide — that is the width of the m character. In my native Norwegian it is called “tankestrek” (literally “thinking dash”), and is used to set off parenthetical statements. It is also used when writing quotes such as:

The ideally non-violent state will be an ordered anarchy. That State is the best governed which is governed the least. —Mahatma Gandhi

Another dash is the «en»-dash, the width of an n, it is used They are both easily input using the below XCompose entries.

<Multi_key> <e> <m> : "—" U2014
<Multi_key> <e> <n> : "–" U2013

Read more about dashes at Butterick’s Practical Typography.

The ellipsis

A popular symbol often written on computers using three separate punctation marks, the ellipsis “…” is present in Unicode.

<Multi_key> <3> <period> : "…" U2026

Mathematical symbols

I love mathematics, but even if you don’t you might need a couple of mathematical symbools from time to time. The XCompose file mentioned ealier comtains really a lot of mathematical notation, and it is worth scrolling through the relevant sections of it. Below is a tiny selection.

<Multi_key> <d> <e> : "°" U00B0
<Multi_key> <i> <s> : "∫" U222B
<Multi_key> <m> <o> : "∈" U2208
<Multi_key> <m> <u> : "×" U00D7
<Multi_key> <s> <r> : "√" U221A
<Multi_key> <minus> <colon> : "÷" U00F7
<Multi_key> <period> <period> : "·" U00B7
<Multi_key> <f> <a> : "∀" U2200
<Multi_key> <t> <e> : "∃" U2203
<Multi_key> <t> <f> : "∴" U2234
<Multi_key> <t> <u> : "⊢" U22A2
<Multi_key> <asciitilde> <minus> : "≃" U2243
<Multi_key> <asciitilde> <equal> : "≅" U2245
<Multi_key> <N> <N> : "ℕ" U2115
<Multi_key> <O> <plus> : "⊕" U2295
<Multi_key> <P> <P> : "ℙ" U2119
<Multi_key> <Q> <Q> : "ℚ" U211A
<Multi_key> <R> <R> : "ℝ" U211D
<Multi_key> <s> <0> : "⁰" U2070
<Multi_key> <s> <1> : "¹" U00B9
<Multi_key> <s> <2> : "²" U00B2
<Multi_key> <s> <o> : "º" U00BA

Greek letters

The Greek alphabet is used frequently in science and mathematics, and is also useful for writing Greek. Below are the XCompose entries for the lower case Greek alphavet. They all start with *, so they are quite easy to guess.

<Multi_key> <asterisk> <a> : "α" U03B1
<Multi_key> <asterisk> <b> : "β" U03B2
<Multi_key> <asterisk> <c> : "ξ" U03BE
<Multi_key> <asterisk> <d> : "δ" U03B4
<Multi_key> <asterisk> <e> : "ε" U03B5
<Multi_key> <asterisk> <f> : "φ" U03C6
<Multi_key> <asterisk> <g> : "γ" U03B3
<Multi_key> <asterisk> <h> : "θ" U03B8
<Multi_key> <asterisk> <i> : "ι" U03B9
<Multi_key> <asterisk> <k> : "κ" U03BA
<Multi_key> <asterisk> <l> : "λ" U03BB
<Multi_key> <asterisk> <m> : "μ" U03BC
<Multi_key> <asterisk> <n> : "ν" U03BD
<Multi_key> <asterisk> <o> : "ο" U03BF
<Multi_key> <asterisk> <p> : "π" U03C0
<Multi_key> <asterisk> <q> : "ψ" U03C8
<Multi_key> <asterisk> <r> : "ρ" U03C1
<Multi_key> <asterisk> <s> : "σ" U03C3
<Multi_key> <asterisk> <t> : "τ" U03C4
<Multi_key> <asterisk> <u> : "υ" U03C5
<Multi_key> <asterisk> <w> : "ω" U03C9
<Multi_key> <asterisk> <x> : "χ" U03C7
<Multi_key> <asterisk> <y> : "η" U03B7
<Multi_key> <asterisk> <z> : "ζ" U03B6

Cyrillic letters

The Cyrillic letters are written starting with @.

<Multi_key> <at> <apostrophe> <apostrophe> : "ъ" U044A
<Multi_key> <at> <at> <apostrophe> : "ь" U044C
<Multi_key> <at> <at> <E> : "Е" U0415
<Multi_key> <at> <at> <K> : "К" U041A
<Multi_key> <at> <at> <S> : "С" U0421
<Multi_key> <at> <at> <T> : "Т" U0422
<Multi_key> <at> <at> <Y> : "Ы" U042B
<Multi_key> <at> <at> <Z> : "З" U0417
<Multi_key> <at> <at> <e> : "е" U0435
<Multi_key> <at> <at> <k> : "к" U043A
<Multi_key> <at> <at> <s> : "с" U0441
<Multi_key> <at> <at> <t> : "т" U0442
<Multi_key> <at> <at> <y> : "ы" U044B
<Multi_key> <at> <at> <z> : "з" U0437
<Multi_key> <at> <C> <H> : "Ч" U0427
<Multi_key> <at> <C> <h> : "Ч" U0427
<Multi_key> <at> <E> <H> : "Э" U042D
<Multi_key> <at> <E> <h> : "Э" U042D
<Multi_key> <at> <K> <H> : "Х" U0425
<Multi_key> <at> <K> <h> : "Х" U0425
<Multi_key> <at> <S> <C> : "Щ" U0429
<Multi_key> <at> <S> <H> : "Ш" U0428
<Multi_key> <at> <S> <c> : "Щ" U0429
<Multi_key> <at> <S> <h> : "Ш" U0428
<Multi_key> <at> <T> <S> : "Ц" U0426
<Multi_key> <at> <T> <s> : "Ц" U0426
<Multi_key> <at> <Y> <A> : "Я" U042F
<Multi_key> <at> <Y> <E> : "Е" U0415
<Multi_key> <at> <Y> <O> : "Ё" U0401
<Multi_key> <at> <Y> <U> : "Ю" U042E
<Multi_key> <at> <Y> <a> : "Я" U042F
<Multi_key> <at> <Y> <e> : "Е" U0415
<Multi_key> <at> <Y> <o> : "Ё" U0401
<Multi_key> <at> <Y> <u> : "Ю" U042E
<Multi_key> <at> <Z> <H> : "Ж" U0416
<Multi_key> <at> <Z> <h> : "Ж" U0416
<Multi_key> <at> <c> <h> : "ч" U0447
<Multi_key> <at> <e> <h> : "э" U044D
<Multi_key> <at> <k> <h> : "х" U0445
<Multi_key> <at> <s> <c> : "щ" U0449
<Multi_key> <at> <s> <h> : "ш" U0448
<Multi_key> <at> <t> <s> : "ц" U0446
<Multi_key> <at> <y> <a> : "я" U044F
<Multi_key> <at> <y> <e> : "е" U0435
<Multi_key> <at> <y> <o> : "ё" U0451
<Multi_key> <at> <y> <u> : "ю" U044E
<Multi_key> <at> <z> <h> : "ж" U0436
<Multi_key> <at> <A> : "А" U0410
<Multi_key> <at> <B> : "Б" U0411
<Multi_key> <at> <D> : "Д" U0414
<Multi_key> <at> <F> : "Ф" U0424
<Multi_key> <at> <G> : "Г" U0413
<Multi_key> <at> <I> : "И" U0418
<Multi_key> <at> <J> : "Й" U0419
<Multi_key> <at> <L> : "Л" U041B
<Multi_key> <at> <M> : "М" U041C
<Multi_key> <at> <N> : "Н" U041D
<Multi_key> <at> <O> : "О" U041E
<Multi_key> <at> <P> : "П" U041F
<Multi_key> <at> <R> : "Р" U0420
<Multi_key> <at> <U> : "У" U0423
<Multi_key> <at> <V> : "В" U0412
<Multi_key> <at> <X> : "Х" U0425
<Multi_key> <at> <a> : "а" U0430
<Multi_key> <at> <b> : "б" U0431
<Multi_key> <at> <d> : "д" U0434
<Multi_key> <at> <f> : "ф" U0444
<Multi_key> <at> <g> : "г" U0433
<Multi_key> <at> <i> : "и" U0438
<Multi_key> <at> <j> : "й" U0439
<Multi_key> <at> <l> : "л" U043B
<Multi_key> <at> <m> : "м" U043C
<Multi_key> <at> <n> : "н" U043D
<Multi_key> <at> <o> : "о" U043E
<Multi_key> <at> <p> : "п" U043F
<Multi_key> <at> <r> : "р" U0440
<Multi_key> <at> <u> : "у" U0443
<Multi_key> <at> <v> : "в" U0432
<Multi_key> <at> <x> : "х" U0445

Expecting a comment section? Feel free to e-mail me your comments, or otherwise contact me to discuss the content of this site. See my contact info. You can also write your opinion on your own website, and link back here! ☺