I've inherited a MySQL database which contains a field named Description of type text and collation of latin1_swedish_ci
.
The problem with this field is it contains utf-8 data with some Unicode characters, e.g. character 733, etc. Sometimes this character also exists in the field represented as HTML encoded "˝" as well.
I'm trying to read the table and export the data to a CSV file and I need to represent this character as a double quote.
Reading the HTML encoded character is easy enough. However, it appears that the actual Unicode character is converted to utf-8 before I can do anything with it resulting in a "?".
How do I read in the Unicode character 733 (U+02DD), recognize it and convert it?
Here's a simplified (not tested) version of the code.
<?
$testconn=odbc_connect ("TESTLIB", "......", "......");
$query="SELECT Description FROM TestTable";
$rsWeb=mysql_query($query));
$WebRow=mysql_fetch_row($rsWeb));
$Desc = $WebRow[0];
$Desc = str_replace('"','""',$Desc);
fwrite($output,"\"".$Desc."\",\r\n");
%>
I think your connection charset is not utf8, that's why chars are being converted to '?'.
ReplyDeleteRead this: http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html
Post result for query:
show variables like 'char%';
You really should put only non-entity (Unicode) version in the database, and entity-decode the rest. However, when you want to use UTF-8 with MySQL, there are a few things to remember:
ReplyDeleteYour table column's collation should be utf8_bin or similar.
Your table's collation and database collation should also be utf8_bin just in case.
Your connection charset should be UTF8. Do this by executing the "SET NAMES utf8" query.
Also, if you're outputting a HTML page, that should have the UTF8 charset as well. If everything is correct, the UTF8 characters should come out fine.
Good luck!