Special character handling when fetching data from MS SQL Server using Perl DBD

1.2k views Asked by At

I have an MS SQL Server 2008 Database, from which I am fetching data using perl DBD::Sybase module. But there are some special characters in the DB, like the Copyright symbol, Trademark symbol etc., which are not getting imported properly. Perl seems to change all of these special characters to a Question mark character. Is there a way to fix this?

I have tried specifying charset=utf8 in the connection string. The doc mentions a syb_enable_utf8 (bool) setting, but whenever I try that, I get an error:

Can't locate object method "syb_enable_utf8" via package "DBI::db"
2

There are 2 answers

0
Hari Menon On BEST ANSWER

One solution I found was this:

use Encode qw(encode_utf8);

Then, wherever you are writing data to a file or anywhere else, use Encode::encode_utf8($data);

where $data is the column/value which you have fetched from MSSQL.

0
bohica On

I don't use DBD::Sybase but a) I use a lot of other DBDs and b) I am currently collecting information about unicode support in DBDs. According to the pod you need at least OpenClient 15.x when using syb_enable_utf8. Are you using 15.x or later? Perhaps syb_enable_utf8 is not defined if your client is less than 15.x or perhaps you have too old a version of DBD::Sybase. Unfortunately I cannot see from the Changes file when syb_enable_utf8 was added.

However, when you say "can't locate method" I think that is a clue as syb_enable_utf8 is not a method, it is an attribute (it is under Sybase Specific Attributes) in the pod. So you need to add it to your connect call or set it via a connection handle like this:

my $h = DBI->connect("dbi:Sybase:something","user","password", {syb_enable_utf8 => 1});

or

$h->{syb_enable_utf8} = 1;

You should also read the bits in the pod describing what happens when syb_enable_utf8 is set as it appears from the documents it only applies to UNIVARCHAR, UNICHAR, and UNITEXT columns.

Lastly, you need to ensure you insert the data correctly in the first place. I'd guess if it is not inserted from Perl with syb_enable_utf8 and charset=utf8 and your data is not proper unicode characters in Perl before you insert you'll get garbage back.

The comment Raze2dust made had nothing to do with your issue but is worth heeding if you are going to write the data retrieved from your database elsewhere. Just remember to decode any data input to your script and encode any data output.