Wednesday, September 15, 2010

How to deal with the Chinese in the JSP



In a Web application often need to pass some parameters to the server, usually through the form to send a POST request to the server. In the parameter may contain Chinese information, such as the user information registration, shopping orders in the address information and so on. General string parameters are encoded using the local character set, such as Chinese GB2312 or GBK character set used, English or Western European language using ISO8859_1 character set, but in all Java programs use Unicode string handling, which requires a data conversion process . Unfortunately, most of the existing Java application server is developed by the English-speaking countries, the lack of large character set (Chinese, Japanese, Korean, etc.) environments, these application servers handling HTTP request parameters are Chinese deal with some issues, and most troubled JSP and Servlet developer problems.

The root causes of the problem arising in the HTTP request is the lack of sufficient information to specify the client character set used. In a JSP page, we can specify the following pseudo-command to the output character set used by the page:

JSP engine will convert the above pseudo-instruction HTTP response header:

Content-Type: text / html; charset = GB2312

Sample output is encoded with GB2312 Chinese page, the browser will correctly display the Chinese. However, the browser will form the contents of the POST to the server, he was not included charset, but also Chinese content in the form of using% xx (xx is a hexadecimal number) are encoded, such as Chinese character "Zhong" within the code for the GB2312 0xD6D0, in the HTTP request becomes a% D6% D0, according to the provisions of RFC2616, if the HTTP request did not specify the character set, use ISO8859_1 encoding, so "in" word in the processing into two characters, respectively, 'u00D6' and 'u00D0', and returned to the client into two characters can not display the browser normally appear as'??'銆?br />
The traditional approach to solve this problem is to write additional code to complete the character set conversion:

strOut = new String (strIn.getBytes ("8859_1"), "GB2312");

strIn is not been converted to a string, its encoding is ISO8859_1, strOut is through the conversion of the string, its encoding is GB2312.

In Apusic 0.9.5 version of the Java Servlets 2.3 specification to achieve the draft, which added in the ServletRequest interface a method setCharacterEncoding (String enc), can fill the request in the HTTP charset missing information, and above the tedious conversion process is done automatically in a Servlet engine, but also on the conversion process Servlet engine is optimized to improve the operating efficiency. Here's a simple example, we can do to compare.

/ / Traditional way

/ / New way







相关链接:



Save Delphi: Chinese open source community an excellent opportunity to display strength



Effectiveness Is The Fundamental Real Name Does Not Matter



TOP Web Servers



Real player converter h.264



"Tomato Garden" case of first instance pronounced loss of Chengdu, a total of over 10 million soft



About GROOVY and Grails



Dealers sustained GROWTH Quartet



Review Terminal And Telnet Clients



blackberry format



Rmvb Quicktime



Expert Astrology Or Biorhythms Or Mystic



E-cology in the Pan Micro Series 66



Top Personal Finance



Wmv mpeg converter



Reduce carbon emissions, EastFax paperless fax you the more green the more resounding



In October 2007 the sixth New York International Outsourcing Exhibition



No comments:

Post a Comment