More utf-8 woes
The second installment of the utf-8 saga I started last week.
Of course, per all the advice that’s out there, I needed to
run some nasty-ass updates directly in MySQL. Buh. Even in my utf-8 Terminal.app I could not display some weirdness properly. So it came down to hex() and unhex()’ing in MySQL. Joy!
Even if it’s only for myself, here is the code that was used:
update articles set body = replace(body, unhex('C3A23F3F'), "'") where body regexp unhex('C3A2');
and let’s not forget this beauty:
update articles set body = replace(body, unhex('C3A23FC29D'), "'") where body regexp unhex('C3A2');
both of which returned various combinations of ??? รข?? and Japanese characters to a simple apostrophe.
And no, the character_set_server still isn’t utf-8. Most data is fine now though.
The best part was when the regexp engine of MySQL encountered the actual question marks as a result of the unhexing. This threw all parsing down the drain, since question marks need to be escaped in a regexp. But … as you’d have guessed … those weren’t actual question marks in the database. Just the same bit combinations. Double joy!
No comments yet.