Unicode Resources

A lot of Internet pages are dedicated to various aspects of Unicode. This page summarizes links that may be useful for the project to implement Unicode support for Squeak.

Information about Squeak: http://www.squeak.org.
Here you will find information about Squeak and you can download Squeak for various operation systems.

The most interesting attempt to create a multilingual Squeak comes from Yoshiki Ohshima. A description of his approach can be found at http://www.is.titech.ac.jp/~ohshima/squeak/m17npaper/index.html. A preliminary version of his code is available on SqueakMap. I want to support his project and this collection of (hopefully) useful links is a modest attempt to give some support.

Information about Unicode: http://www.unicode.org.
This is the official internet page about Unicode. Here you can download technical reports, tables of all currently defined glyphs (in PDF) and information about code pages for eight-bit encodings.
Five classical chinese books can be downloaded from ftp://ftp.unicode.org/Public/TEXT/FIVEBOOKS/

Information about codepages: http://czyborra.com/charsets.
This page gives an excellent overview about encodings and contains references to other sources.

Finally, a beautiful testpage: The Lord's Prayer in several languages.

Unicode-enabled fonts

The ClearlyU project, a collection of BDF glyphs for Unicode:

http://crl.nmsu.edu/~mleisher/cu.html

The quality of these glyphs is very good and the use of the glyphs is free. The newest edition of this font contains glyphs for writings that are currently not often seen in the internet: Georgian, Ogham, Unified Canadian Aboriginal Syllabics, Cherokee, Ethiopic and others. The availability of these glyphs gives some hope that Unicode will help to overcome the discrimination of languages that are sometimes called "languages of minor importance".

BDF-Fonts that contain glyphs for most Unicode codepoints:
http://www.cl.cam.ac.uk/~mgk25/unicode.html
http://www.cl.cam.ac.uk/~mgk25/ucs
These pages are operated by Markus Kuhn.
http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz
http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts-asian.tar.gz

In summer 2001 the asian fonts contained all glyphs for katakana, hiragana and hangul as well as most (certainly more than 60 %) of the chinese ideographs in the range U+4E00 - U+9FA5. No glyphs are provided for the more complicated ideographs. The glyph size is 18x18 which is too small to draw the more complicated hanzi ideographs.

Also useful is this font:

http://www.inp.nsk.su/~bolkhov/files/fonts/univga/index.html

Getting all glyphs of the Unicode block 'CJK Unified Ideographs'

There is a way to download a Big5+ font and to rearrange its glyphs in conformance to Unicode. (Big5+ has glyphs for all Unicode glyphs in the range 0x4E00 to 0x9FA5). To do this, you need a bdf file cmex24m.bdf. This file can be downloaded from various sites:

Seemingly the font is also available from Debian, look at:

http://mail.nl.linux.org/linux-utf8/2001-04/msg00025.html

The glyph size is 24x24, which is large enough to draw even the most complicated hanzi ideographs.

Apart from the file cmex24m.bdf you need the file big52ucs.txt and some code to read both the *.bdf file and the *.txt file.

The file big52ucs.txt is in big5p-2.zip which can be downloaded from one of these sites:

The code that is needed to load the font and to convert it to Unicode can be downloaded from here: fntLoader.zip (6 KBytes) (latest update: Jun 02, 2003). This archive contains a change set. Read the change set comment to find out how to use it. Please bear in mind that it may take some time to file in a large font image. When a BDF-file is loaded, its header is copied into the transcript. Glyphs are displayed in the upper left corner of the screen to show you that something happends.

Other sites with chinese fonts:

http://www.eleceng.adelaide.edu.au/Personal/jglim/cfonts. Here you find bdf-fonts (mostly in size 24x24) for various encodings: Big5, GB2312, CNS-11643 (all 7 planes) and Unicode. Regrettably, the quality of the glyphs is not excellent.

The official site for CNS 11643 is: http://www.cns11643.gov.tw/web/index.jsp.

Announcement: Three blocks of Unicode are dedicated to Hanzi ideographs: The block from 0x4E00 to 0x9FA5 (CJK Unified Ideographs), the block from 0x3400 to 0x4DB5 (CJK Unified Ideographs Extension A) and the block from 0x20000 to 0x2A6DF (CJK Unified Ideographs Extension B). I began to draw the glyphs for CJK Unified Ideographs Extension A. I will made these glyphs available to you when this work is either completed or almost completed. There is no hope that we will soon see glyphs for CJK Unified Ideographs Extension B.

Internet pages about languages and writing systems:

http://titus.uni-frankfurt.de/unicode

The Titus project is dedicated to multilingual text processing with a clear focus on indogerman languages. The pages of this site are encoded in utf-8. Among other things, the site offers testpages for various scripts. Example texts in many less-known languages are also available. For visitors with an interest in languages, these pages are a joy to read. (Necessary remark: I never saw an internet browser that was able to correctly show all these pages.)

http://www.omniglot.com

This page is not related to Unicode. It contains informations about widely unknown languages and writing systems in a very impressive quality. It shows a lot of alphabets that are currently not included into Unicode. When you read these pages, you will soon understand, that Unicode is certainly not as universal as its name suggests. It is quite possible that some of the scripts that you can admire on these page will be added to Unicode some day.

Your comments:

Your comments are welcome. Please mail your comments to Boris.Gaertner@gmx.net