Skip to Content
Sun
Java
Solaris
Communities
My SDN Account
Join SDN
»
search tips
APIs
Downloads
Products
Support
Training
Participate
The Source
>
Technical Topics
>
Globalization
>
Reference
>
FAQs
>
FAQs
Unicode
NOTE: This material does not necessarily refer to the most recent version of Java. For the most recent FAQs, see
here
.
Java Unicode Questions
General Unicode
I am developing a software package using Java 2 Standard for the Greek market. I am using Unicode to handle the Greek character set. However, there is a problem displaying Unicode in the title bar, the words display with question marks rather than Greek letters. Can you suggest what is wrong?
I`m trying to convert some code that was written under JDK1.1 over to Java2. I was checking the encodings that this code uses, and it uses "UCS-2" (or ISO-10646-UCS-2). That encoding isn`t listed in the Supported Encodings page. Is that a supported encoding or not? If not, what should I use instead?
When we perform a round-trip conversion from CP949 to UCS2 for reading and then from UCS2 back to CP949 for writing, the new data and the original data are not the same.
If we use MS949 instead, the round-trip conversion is ok. Is there a bug in code conversion for CP949?
UTF-8
Can you tell me which keys I should press to activate the Cyrillic or Greek input modes while in the UTF-8 locale?
Can you tell me how to convert SJIS strings into UTF8 format?
General Unicode
I am developing a software package using Java 2 Standard for the Greek market. I am using Unicode to handle the Greek character set. However, there is a problem displaying Unicode in the title bar, the words display with question marks rather than Greek letters. Can you suggest what is wrong?
The title bar of a window is controlled by the OS, not by Java. If you are using MS Windows, it will only display Greek characters in a window title bar if you are running under the Windows Greek locale.
I'm trying to convert some code that was written under JDK1.1 over to Java2. I was checking the encodings that this code uses, and it uses "UCS-2" (or ISO-10646-UCS-2). That encoding isn't listed in the Supported Encodings page. Is that a supported encoding or not? If not, what should I use instead?
Our software in this case is better than our documentation - we have a good collection of Unicode converters, whose names show their relationship to byte order and byte order mark: Unicode, UnicodeBig, UnicodeBigUnmarked, UnicodeLittle and UnicodeLittleUnmarked.
When we perform a round-trip conversion from CP949 to UCS2 for reading and then from UCS2 back to CP949 for writing, the new data and the original data are not the same.
If we use MS949 instead, the round-trip conversion is ok. Is there a bug in code conversion for CP949?
The encodings Cp949 and MS949 are not identical; they probably were at some point in the past, but IBM and Microsoft are developing them independently. According to the IBM mapping table for code page 949, the byte sequence x8D62 is not defined, so the Cp949 converter should stumble over it.
UTF-8
Can you tell me which keys I should press to activate the Cyrillic or Greek input modes while in the UTF-8 locale?
Take a look at the Solaris 7 and 8 internationalization guides - they probably have all the information you need:
ftp://192.18.99.138/805-4123/805-4123.pdf
ftp://192.18.99.138/806-0169/806-0169.pdf
Can you tell me how to convert SJIS strings into UTF8 format?
There are two kinds of ResourceBundle instances - those based on property files, and those implemented as separate subclasses of ResourceBundle or ListResourceBundle. The first ones have to be encoded in ISO 8859-1, with other characters represented in "u" notation (say "u65e5u672c" for the Japanese "nihon"). You can use the native2ascii tool in the J2SDK to convert from SJIS to this representation. For the second ones, the source file can use any encoding, you just have to tell the compiler using the "-encoding" option which encoding you're using.
All of this will result in regular Unicode (UTF-16) characters in your program. You then have to convert the text to UTF-8 when generating your HTML pages. You can do that using a java.io.OutputStreamWriter or using the java.lang.String.getBytes(String) method.
Back to
Question Category Page
.