Java and UTF-8 encoding

December 31st, 2006 by Samuel Santos Leave a reply »

If the J2SE platform has come a long way in internationalization, entering non-ASCII text in the J2EE world isn’t nearly as easy.

To achieve the same result you have to make some changes in your code and in your web server settings.

Firstly, to make sure that the right value in the Content-Type header precedes the text/html content so your browser correctly auto-detects the right encoding, place the following declaration at the beginning of the JSP:

<%@ page contentType="text/html; charset=utf-8" pageEncoding="UTF-8" %>

Next you have to create a filter that implements the ‘javax.servlet.Filter’ interface so you can have the request parameters encoded with UTF-8:

package com.samaxes.filters;

import javax.servlet.*;
import java.io.IOException;

/**
 * Filter called before every action.
 *
 * @author : samaxes
 */
public class UTF8Filter implements Filter {

    public void init(FilterConfig filterConfig) {
    }

    public void destroy() {
    }

    public void doFilter(ServletRequest servletRequest,
                         ServletResponse servletResponse,
                         FilterChain filterChain)
            throws IOException, ServletException {
        servletRequest.setCharacterEncoding("UTF-8");
        filterChain.doFilter(servletRequest, servletResponse);
    }
}

Now, your server reads the URL POST parameters correctly…

But there still is an issue – during a GET operation.

The trouble is that none of the charset information gets sent back to the web server during a GET or POST operation. The server has no way of knowing how to interpret the url-encoded GET parameters, so it assumes ISO-8859-1.

Fortunately the solution to address this is pretty simple, just specify URIEncoding="UTF-8" in your Tomcat’s connector settings within the server.xml file.

Your application shall now handle UTF-8 just fine.

Don't be shellfish...Tweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInPin on PinterestBuffer this pageEmail this to someone
Advertisement

8 comments

  1. Pandian says:

    Hi,
    Good one.
    But is there anyway by which I may get UTF characters in catalina log (take the case of tomcat)? If so, what kind of modifications we need to do?

  2. [quote comment="291"]Hi,
    Good one.
    But is there anyway by which I may get UTF characters in catalina log (take the case of tomcat)? If so, what kind of modifications we need to do?[/quote]
    It may be related with the encoding of the machine where you are running Tomcat.
    Are you opening the file as UTF-8?

  3. Pandian says:

    Dear Sam,
    Thanks for the reply, I have changed the encoding as UTF-8 in server.xml; Though, my System.out.printlns coudnt give me unicode characters. they are printed in ASCII only. Is there any other setting We need to changed to get Unicode characters in System.out stream?

  4. Try adding the attribute -Dfile.encoding=UTF-8 in your server starting script, then restart your server.

    In a DOS console you won’t see any Unicode character; you should use an editor to open your server log in UTF-8 encoding.

  5. baba says:

    as for the POST solution using your filter, you still need to edit web.xml from tomcat to make it handle the filter, right?

    • Correct, you must declare it in your web application deployment descriptor (web.xml).
      Alternatively you can use the @WebFilter annotation (only if your container supports the Servlet 3.0 spec).

Leave a Reply