Regexp in ruby 1.8.7 that will detect a 4-byte Unicode character

986 views Asked by At

Can anyone tell me how I would write a ruby regexp in ruby 1.8.7 to detect the presence of a 4-byte unicode character (specifically the emoji)? I am trying to handle the fact that mysql does not, by default, allow you to store 4-byte emoji unicode characters, now in use by iOS 5.

Thanks!

2

There are 2 answers

0
esilver On BEST ANSWER

This appears to match the first two bytes of the four bytes that represent emoji. This is being run in ruby 1.8.7.

str.match(/\360\237/)
0
crishoj On

Altering the table might be feasible using a non-blocking online approach, e.g. Maatkit's online-schema-change: http://www.percona.com/doc/percona-toolkit/pt-online-schema-change.html

From the docs:

In brief, this tool works by creating a temporary table which is a copy of the original table (the one being altered). (The temporary table is not created like CREATE TEMPORARY TABLE; we call it temporary because it ultimately replaces the original table.) The temporary table is altered, then triggers are defined on the original table to capture changes made on it and apply them to the temporary table. This keeps the two tables in sync. Then all rows are copied from the original table to the temporary table; this part can take awhile. When done copying rows, the two tables are swapped by using RENAME TABLE. At this point there are two copies of the table: the old table which used to be the original table, and the new table which used to be the temporary table but now has the same name as the original table. If --drop-old-table is specified, then the old table is dropped.