The Problem
There are many character encodings covering many character sets. However most are not suitable for the world-wide web.
Our recommended character encoding is UTF-8. Here are some of the reasons why:
- It is ASCII compatible.
- It is compact and efficient for most scripts.
- It is easily processed, unlike other multibyte character encodings.
- Read more reasons.
Briefly, a working definition of the key terms:
- Character Set:
A complete group of characters for one or more writing systems. More complete than an alphabet. It includes punctuation, digits etc.
- Character Encoding:
Computers can only store ones and zeros. So each letter, digit etc. in a character set, has to be assigned a unique sequence of ones and zeros, that is, each character has to be uniquely encoded. Two popular encodings are UTF-8 and ISO-8859-1. They both encode Western European characters, but both are very different from one another. For this reason, when computers or applications are communicating, they need know the character encoding of the data being transmitted.
-
Charset:
This is a somewhat confusing term, used on the internet, that actually means character encoding. Note: It does not mean character set.
When asked to create or save content in a particular encoding, many people simply don't know how to.
The Solution
There are many text editors and other tools available that allow you to save files as UTF-8 encoded.
Whichever option you choose, it is important to check that the character encoding of the page is properly declared in the <head> tag of your HTML page. For example:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
It is also important to remember that all HTML pages should declare their content [or natural] language, for a variety of reasons. Some browsers can, and do, use this language information to determine the appropriate fonts. This is especially important for Asian pages. The content language can be declared in the <html> tag like:
<html lang="zh-CN">
Some options for creating UTF-8 encoded HTML pages are:
- Use Mozilla Composer
Composer is the WYSIWYG HTML editor bundled with the Mozilla browser. You can launch Composer from within Mozilla via Window / Composer.
From within Mozilla Composer 1.7x, do:
File / Save And Change Character Encoding.
- From the menu select
UTF-8. See the screenshot below.

Mozilla Composer 1.4x is slightly different:
File / Save As Charset. From the menu select UTF-8.
Use jEdit - available from www.jedit.org.
- Note: You need to configure jEdit to save as UTF-8. Select
UTF-8 from the Utilities / Global Options / General / Default Character Encoding menu.

- Log into any Solaris UTF-8 locale [See Options on the Login screen] Open dtpad [if using CDE], or gedit if using GNOME.
- Open dtpad [if using CDE], or gedit if using GNOME. Either one of these text editors running in a UTF-8 locale will, by default, save files as UTF-8 encoded.
- Use Macromedia Dreamweaver MX [2004]
-
Dreamweaver MX available on Windows and Mac has nice support for UTF-8. You can set UTF-8 as the default character encoding in Preferences dialog.
- TextEdit on Mac
- TextEdit can save in UTF-8. You first have to use
Format->Make Plain Text to convince it that you don't want RTF. Then you can choose the character encoding from the Save As dialog.
- A Caution About Using Notepad on Windows
- Though Notepad does allow you to save as UTF-8, it prefixes the file contents with a Byte Order Mark [BOM]. This can and does cause problems, so I suggest you use one of the other alternatives mentioned above.
|
|