Storing Emojis in SQL Tables

6.9k views Asked by At

I am working with a SQL Server 2008 database on a Windows 2008 Server. Anytime I try to store an emoji in my table it converts it to a weird looking box. When I try to store the same emoji in SQL Server 2012 it stores the emoji fine. Is it not possible to store emojis correctly in SQL Server 2008? I really cannot update at this point so that would not be an option.

1

There are 1 answers

0
Solomon Rutzky On

What we know based on details from the question and comments on the question:

  1. Column is NVARCHAR
  2. Value is inserted from VB.NET app via stored procedure
  3. App hitting SQL Server 2008 (running on Windows 2008 Server) stores emoji character but "converts it to a weird looking box"
  4. Same app code hitting SQL Server 2012 stores the same emoji character just fine

What we do not know:

How is the character being retrieved in order to determine whether or not it was stored correctly?

  • Are you viewing it in the app or in SSMS?
  • If in SSMS, are you connecting to SQL Server 2008 and 2012 using the same SSMS running on the same machine? Or are you using the version of SSMS that came with each version of SQL Server (hence they are not the same program, even if on the same machine)?

Based on the above:

Most likely this is a font issue. I say this due to:

  1. If it were an issue of not supporting Unicode, then you would be seeing two question marks ?? (one for each surrogate character) instead of a single square box.
  2. Emojis are nothing special. They are merely supplementary characters. And there are currently (as of Unicode v 12.0) 72,457 supplementary characters defined (and slots for another 976,119).
  3. Supplementary Characters (emojis or otherwise) can be stored in NCHAR, NVARCHAR, and NTEXT columns without a problem, and without regard to the collation of the column or the current database.

To test this, I executed the following in a database having a default collation of SQL_Latin1_General_CP1_CI_AS, so there is definitely no "supplementary character support" there.

SELECT NCHAR(0xD83D) + NCHAR(0xDE31) AS [ScreamingFace],
       NCHAR(0xD83D) + NCHAR(0xDDFA) AS [WorldMap],
       NCHAR(0xD83D) + NCHAR(0xDF08) AS [Alchemical Symbol for Aqua Vitae];

It returns:

ScreamingFace    WorldMap    Alchemical Symbol for Aqua Vitae
                        

I see different things in different areas, all due to font differences. The chart below indicates what I am seeing:

LOCATION        FONT            Screaming     World     Alchemical Symbol
                                Face          Map       for Aqua Vitae
------------    ------------    ----------    ------    ----------------------------
Text Editor     Consolas        Yes           Yes       Square box w/ question mark
Grid Results    Code2003        Yes           Yes       Yes
Text Results    Courier New     Yes           Yes       Empty square box

Most likely you were using two different versions of SSMS, or at least SSMS on two different computers. In either case, you probably had different fonts mapped to the Grid Results, or were even using Grid Results on one and Text Results on the other.

In the end, if you want to know if data was stored correctly, you need to check the bytes that were stored. To do this, simply convert the string column to VARBINARY(MAX):

SELECT CONVERT(VARBINARY(MAX), string_column)
FROM schema.table;

And compare those results between the 2008 and 2012 systems. More than likely they are (or "were" given that this was almost 2.5 years ago) the same.


For more info on what characters can actually be stored in the various string datatypes in SQL Server (from SQL Server 7.0 through at least SQL Server 2019), please read the following post of mine:

How Many Bytes Per Character in SQL Server: a Completely Complete Guide