Migrate all tables to utf-8

If you’re like me, you maintain a mix of both old and new Rails apps. If you’re even more like me, you have no Rails apps that store Latin1 data any more (having converted them all some time around 2007). Yet your tables are still defined DEFAULT CHARACTER SET latin1. Time to fix that.

Character sets can be a huge pain in the ass. They say that Latin1 is a subset of utf8, but sadly it’s not so simple. Depending on how and where your characters were encoded, you can expect strange results during the conversion from Latin1 to utf8. I have found the best solution to be vanilla MySQL dumps with iconv converting the characters.

That being said, all my apps are UTF8 and they have been for years. Still many of the tables in the database list Latin1 as their default character set. Even though, by specifying charset=utf8 in my response headers, I override that, it’s better to specify this at the database level also. Turns out the migration is easy.

Things I did:

alter database character set utf8;

on the production database. About half of my databases listed Latin1 as default. Just to be clear, you should always create your database using rake db:create, which sets the correct character set.

Then it became time to migrate each table:

ruby script/generate migration default_character_set

wherein this pretty little one-liner performs everything we need:

class DefaultCharacterSet < ActiveRecord::Migration
  def self.up
    tables.each do |table|
      execute "alter table #{table} character set utf8"
    end
  end

  def self.down
    # i don't want to go back to latin1
  end
end