There are countless posts out there, screams in the darkness by developers claiming that this gem doesn't work with that web framework using Ruby 1.9

In my case the problematic layers in the technology stack were Padrino 0.9.14 with Sinatra-1.0 and Typhoeus 0.1.31 plus some other String to byte array conversions in our codebase. 

Testing had showed that when the app received input containing umlauts and other accented letters (i.e. multi byte characters) it failed. There are some lengthy and detailed explanations and discussions about how Ruby 1.8 handles character encoding differently from Ruby 1.9. The sort of thing I'd hoped I'd never have to read but some times you just have to get informed. Hopefully this article will help shortcut or reiterate some of this info.

So lets begin with the rendered view of your web app, a couple of things will help ease the pain here.

First make sure all your pages contain the correct content-type meta tag:

 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Secondly make sure that all your forms use the accept-charset attribute in the html form tag or whatever you use to generate the tag:

<form accept-charset="utf-8"...

now we know that the pages can display and post multi byte UTF-8 characters, the web application might be a different story...

One of the next quick fixes to apply which worked well for me was to apply a filter on all requests which forced the encoding of all incoming String params.

A branch of RKH's fork of Sinatra helped with this, there's a utility method in lib/sinatra/base.rb which does the trick...

 

if defined? Encoding

  if Encoding.default_external.to_s =~ /^ASCII/

    Encoding.default_external = "UTF-8"

  end

  Encoding.default_internal ||= Encoding.default_external

 

  def force_encoding(data)

    return if data == self

    if data.respond_to? :force_encoding

      data.force_encoding(Encoding.default_external)

    elsif data.respond_to? :each_value

      data.each_value { |v| force_encoding(v) }

    elsif data.respond_to? :each

      data.each { |v| force_encoding(v) }

    end

  end

else

  def force_encoding(*) end

end

to work with this we need to set some default encodings for our webapp.

In Padrino this can be done in the app.rb file:

 

  if RUBY_VERSION < '1.9'

    $KCODE = 'u' 

  else

    Encoding.default_external = Encoding::UTF_8

    Encoding.default_internal = Encoding::UTF_8

  end

 

 if you are a padrino user who doesn't wanna touch the Sinatra gem you can create a general before filter in app.rb like this

before do

  force_encoding(params)

end

 there are likely other parts of your application stack that will not be UTF-8 friendly, noteably your tests might complain if you attempt to add multi byte characters to your test data.

 Adding the code hint

# encoding: utf-8

 to the first line of the test class files will alleviate this, it's a grubby fix but we live in a world of compromises, what can you do.

 

blog comments powered by Disqus