KOI8-R (RFC 1489) is an 8-bit character encoding derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses the Russian subset of a Cyrillic script. KOI-8, on its turn, is an 8-bit extension of the KOI-7 encoding, which inherited a phonetic correspondence of Russian and Latin letters from the MTK-2 teletype code. As a result, Russian Cyrillic letters in KOI8-R are in pseudo-Latin alphabetical order rather than the normal Cyrillic one like in ISO 8859-5. Although this may seem unnatural, this has the useful effect that if the 8th bit is stripped, the text remains partially readable in any ASCII-based encoding (including KOI8-R itself) as a case-reversed transliteration. For example, "Код для обмена и обработки информации" (the Russian meaning of the "KOI" acronym) becomes kOD DLQ OBMENA I OBRABOTKI INFORMACII.
KOI-8 stands for 8-bitnyy kod dlya obmena i obrabotki informatsii (Russian: 8-битный код для обмена и обработки информации) which means "8-Bit Code for Information Interchange". In Microsoft Windows, KOI8-R is assigned the code page number 20866. In IBM, KOI8-R is assigned code page 878. KOI8-R also happens to cover Bulgarian.
It lacks proper quotation marks for these languages: both «...» and the Bulgarian „...“. Windows-1251 does support these, as well as more letters, and has thus become more popular. KOI8-R is used by less than 0.004% of websites, mostly Russian and Bulgarian. Unicode and UTF-8 is preferred to single-byte Cyrillic encodings in modern applications, Unicode contains 436 Cyrillic letters including for Old Cyrillic.
Character set
The following table shows the KOI8-R encoding. Each character is shown with its equivalent Unicode code point.
KOI8-R45670 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ||||||||||||||||
1x | ||||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | ─2500 | │2502 | ┌250C | ┐2510 | └2514 | ┘2518 | ├251C | ┤2524 | ┬252C | ┴2534 | ┼253C | ▀2580 | ▄2584 | █2588 | ▌258C | ▐2590 |
9x | ░2591 | ▒2592 | ▓2593 | ⌠2320 | ■25A0 | ∙2219 | √221A | ≈2248 | ≤2264 | ≥2265 | NBSP | ⌡2321 | °00B0 | ²00B2 | ·00B7 | ÷00F7 |
Ax | ═2550 | ║2551 | ╒2552 | ё0451 | ╓2553 | ╔2554 | ╕2555 | ╖2556 | ╗2557 | ╘2558 | ╙2559 | ╚255A | ╛255B | ╜255C | ╝255D | ╞255E |
Bx | ╟255F | ╠2560 | ╡2561 | Ё0401 | ╢2562 | ╣2563 | ╤2564 | ╥2565 | ╦2566 | ╧2567 | ╨2568 | ╩2569 | ╪256A | ╫256B | ╬256C | ©00A9 |
Cx | ю044E | а0430 | б0431 | ц0446 | д0434 | е0435 | ф0444 | г0433 | х0445 | и0438 | й0439 | к043A | л043B | м043C | н043D | о043E |
Dx | п043F | я044F | р0440 | с0441 | т0442 | у0443 | ж0436 | в0432 | ь044C | ы044B | з0437 | ш0448 | э044D | щ0449 | ч0447 | ъ044A |
Ex | Ю042E | А0410 | Б0411 | Ц0426 | Д0414 | Е0415 | Ф0424 | Г0413 | Х0425 | И0418 | Й0419 | К041A | Л041B | М041C | Н041D | О041E |
Fx | П041F | Я042F | Р0420 | С0421 | Т0422 | У0423 | Ж0416 | В0412 | Ь042C | Ы042B | З0417 | Ш0428 | Э042D | Щ0429 | Ч0427 | Ъ042A |
See also
- KOI8-B, a derivation of KOI8-R with only the letter subset implemented
- KOI8-U, another derivative encoding which adds Ukrainian characters
- KOI character encodings
- RELCOM
- Windows-1251, another common Cyrillic character encoding
Further reading
- Flohr, Guido; Kiss, Gabor; Chernov, Andrey A. (2016) [2006]. "Locale::RecodeData::KOI8_R - Conversion routines for KOI8-R". CPAN libintl-perl. 1.0. Archived from the original on 2017-01-15. Retrieved 2017-01-15.
- Kostis, Kosta. "koi8-r (Russian U*IX encoding, also used by RELCOM)". 1.20. Archived from the original on 2017-01-16. Retrieved 2017-01-16.
- RFC 1489
- "KOI8-R (RFC 1489)". Kermit. Columbia University. Retrieved 2020-06-24.
- Kornai, Andras; Birnbaum, David J.; da Cruz, Frank; Davis, Bur; Fowler, George; Paine, Richard B.; Paperno, Slava; Simonsen, Keld J.; Thobe, Glenn E.; Vulis, Dimitri; van Wingen, Johan W. (1993-03-13). "CYRILLIC ENCODING FAQ Version 1.3". 1.3. Retrieved 2020-06-24.
External links
- Universal Cyrillic decoder, an online program that may help recovering Cyrillic texts with broken KOI8-R or other character encodings.
- "The Home of the KOI8-R since 1995". 1995. Retrieved 2016-12-05.
- Czyborra, Roman (1998-11-30) [1998-05-25]. "The Cyrillic Charset Soup". Archived from the original on 2016-12-03. Retrieved 2016-12-03.
- Hohlov, Yu. E. "Cyrillic Information Representation in Electronic Form - Character Set (Code Page) Tables". Archived from the original on 2016-12-05. Retrieved 2016-12-05.
- Nechayev, Valentin (2013) [2001]. "Review of 8-bit Cyrillic encodings universe". Archived from the original on 2016-12-05. Retrieved 2016-12-05.
References
(in Russian) ГОСТ 19768-74 (СТ СЭВ 358-76). Машины вычислительные и система обработки данных. Коды 8-битные для обмена и обработки информации. ↩
"SBCS code page information - CPGID: 00878 / Name: Russian internet koi8-r". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. IBM. C-H 3-3220-050. Archived from the original on 2017-02-18. Retrieved 2017-02-18. https://www-01.ibm.com/software/globalization/cp/cp00878.html ↩
"CCSID information document; CCSID 878; KOI8-R CYRILLIC". IBM. Retrieved 2017-02-18. https://www-01.ibm.com/software/globalization/ccsid/ccsid878.html ↩
Richter, Helmut (2016-01-04) [1999-08-18]. "KOI8-R.TXT". 2.0. Retrieved 2016-12-09. http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT ↩
Code Page CPGID 00878 (pdf) (PDF), IBM https://public.dhe.ibm.com/software/globalization/gcoc/attachments/CP00878.pdf ↩
Code Page CPGID 00878 (txt), IBM https://public.dhe.ibm.com/software/globalization/gcoc/attachments/CP00878.txt ↩
International Components for Unicode (ICU), ibm-878_P100-1996.ucm, 2002-12-03 https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/ibm-878_P100-1996.ucm ↩