NOTE: This material does not necessarily refer to the most recent version of Java. For the most recent FAQs, see here.1. General Character Encoding
2. Latin Language Charset
3. Code Pages (CP)
General Character EncodingQ:1.1 Is the list of charactor encodings on the web site http://java.sun.com/products/jdk/1.1/intl/html/intlspec.doc7.html up to date? If not, where can I get the latest list of character encodings?See http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html. Q: 1.2 Is it possible to progammatically return the complete list of available code pages? It is not possible in any current release but it has been added to the release we are working on right now. Look for it in the beta of Java 1.4 that shows up early in the new year. Q:1.3 How do I compile a Java file in the utf-8 encoding? With Sun's Java compiler, you need to specify the encoding of the source file. Try using: Q:1.4 Are there any plans to create some kind of class that encapsulates character encodings? There is currently work underway to define public character converter APIs. The work is done as part of JSR 51, New I/O APIs. See New I/O APIs for the JavaTM Platform for more information. Q:1.5 Can you inform me of where the system value called "file.encoding" gets set? It is getting set to "646" in the JVM 1.2 on Solaris 7, and it is said to be an invalid charset by a third party server. Can you help me with this? My guess is that you are running your app in C/POSIX locale. On Solaris 7, the system call nl_langinfo(CODESET) returns "646" when the user locale is set to C/POSIX. Though we have a alias mapping table to map 646 to "ASCII" which is a valid charset name, but the mapping table is in sun.io package which I don't recommend you to use directly. I think setting the locale to en_US should solve the problem. Latin Language CharsetQ:2.1 I'd like to know if there's a way to make java understand Latin Language chars. I'm having problems to read the word "gua" (water, in portuguese) from a text file. Can you help me with this?Current versions of Java should have no problem with these characters.
Check that you are using Java version 1.1 or newer. Java 1.2.2 or 1.3
would be best. You must also make sure that the font you are using to
display these characters contains the glyphs you need. The Lucida Sans
font, that comes with the J2SDK version 1.2 or later, contains these
glyphs. You can use this font by creating the following font object:
Code Pages (CP)Q:3.1 Can someone tell if the Java character set encoding "cp285" is the one to use to support EBCIDIC UK 00285?Our documentation at http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html says it's "IBM United Kingdom, Ireland", and looking at the mapping table I'd say it's EBCDIC (it's certainly not ISO 646 / ASCII based). Q:3.2 Why do the debug classes of JDk 1.2.2 not include "Cp850"? Whenever I run my application in debug mode, it crashes because of this, but runs perfectly when not running in debug mode. Is there a way to add "Cp850" text en/decoding to the debug classes? Are you using Sun's Java 2 SDK itself to debug your application, or are you using some third-party IDE? IDE's often support a different set of encodings than the Java 2 SDK. For the Java 2 SDK, for all I know, we use the same class files whether you run in debug or no-debug mode. Q:3.3 I am working on a Java application which requires translation from byte[] to unicode characters. My application uses the number required by a Windows environment (eg '932') to specify the code page. I am only given the number, say '932'. Can you tell me how to cover all of the different code pages and obtain a corresponding mapping from the number to the code page used by Java, eg. 'ms932'?
The reason that '932' isn't a good enough name is that both IBM and
Microsoft have code pages called '932' -- and they are not the same. In
Java, the convention is that Microsoft versions are called "MS932" and
IBM versions are called "CP932". Your application needs to tell us
which one to use so that we can get the conversion right.
Q:3.4 It is written in your ocumentation on jdk1.1.7, that WIN cp1252 is a default code page for jdk 1.1.7 java compiler. Could you kindly tell me how to set win cp 1251 as a default code page? I use WIN NT4.0 on my machine.
The documentation is actually not quite correct in that point. The
default code page is the one that Windows uses for the default region.
So, if you run on any localized Windows version whose default region
uses CP1251 (for example, Russian), the JRE and all the tools use CP1251
by default.
|
|
| ||||||||||||