Skip to main content

How do I remove accents from characters in a PHP string?



I'm attempting to remove accents from characters in PHP string as the first step to making the string usable in a URL.





I'm using the following code:







$input = "Fóø Bår";



setlocale(LC_ALL, "en_US.utf8");

$output = iconv("utf-8", "ascii//TRANSLIT", $input);



print($output);







The output I would expect would be something like this:







F'oo Bar







However, instead of the accented characters being transliterated they are replaced with question marks:







F?? B?r







Everything I can find online indicates that setting the locale will fix this problem, however I'm already doing this. I've already checked the following details:





  1. The locale I am setting is supported by the server (included in the list produced by locale -a )



  2. The source and target encodings (UTF-8 and ASCII) are supported by the server's version of iconv (included in the list produced by iconv -l )



  3. The input string is UTF-8 encoded (verified using PHP's mb_check_encoding function, as suggested in the answer by mercator )



  4. The call to setlocale is successful (it returns 'en_US.utf8' rather than FALSE )










The cause of the problem:





The server is using the wrong implementation of iconv. It has the glibc version instead of the required libiconv version.







Note that the iconv function on some systems may not work as you expect. In such case, it'd be a good idea to install the GNU libiconv library. It will most likely end up with more consistent results.


PHP manual's introduction to iconv







Details about the iconv implementation that is used by PHP are included in the output of the phpinfo function.





(I'm not able to re-compile PHP with the correct iconv library on the server I'm working with for this project so the answer I've accepted below is the one that was most useful for removing accents without iconv support.)



Source: Tips4all

Comments

  1. I think the problem here is that your encodings consider ä and å different symbols to 'a'. In fact, the PHP documentation for strtr offers a sample for removing accents the ugly way :(

    http://ie2.php.net/strtr

    ReplyDelete
  2. You could use urlencode. Does not quite do what you want (remove accents), but will give you a url usable string

    $output = urlencode ($input);


    In Perl I could use a translate regex, but I cannot think of the PHP equivalent

    $input =~ tr/áâàå/aaaa/;


    etc...

    you could do this using preg_replace

    $patterns[0] = '/[á|â|à|å|ä]/';
    $patterns[1] = '/[ð|é|ê|è|ë]/';
    $patterns[2] = '/[í|î|ì|ï]/';
    $patterns[3] = '/[ó|ô|ò|ø|õ|ö]/';
    $patterns[4] = '/[ú|û|ù|ü]/';
    $patterns[5] = '/æ/';
    $patterns[6] = '/ç/';
    $patterns[7] = '/ß/';
    $replacements[0] = 'a';
    $replacements[1] = 'e';
    $replacements[2] = 'i';
    $replacements[3] = 'o';
    $replacements[4] = 'u';
    $replacements[5] = 'ae';
    $replacements[6] = 'c';
    $replacements[7] = 'ss';

    $output = preg_replace($patterns, $replacements, $input);


    (Please note this was typed from a foggy beer ridden Friday after noon memory, so may not be 100% correct)

    or you could make a hash table and do a replacement based off of that.

    ReplyDelete
  3. This is a code i found and use often

    function stripAccents($stripAccents){
    return strtr($stripAccents,'àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ','aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY');
    }

    ReplyDelete
  4. I agree with georgebrock's comment.

    If you find a way to get //TRANSLIT to work, you can build friendly URLs:


    use iconv with //TRANSLIT ñ => n~
    remove non-alphanumeric non-whitespace chars inside words: $url = preg_replace( '/(\w)[^\w\s](\w)/', '$1$2', $url );
    replace remaining separations: $url = preg_replace( '/[^a-z0-9]+/', '-', $url );
    remove double/leading/traling: $url = preg_replace( '-', e.g. '/(?:(^|\-)\-+|\-$)/', '', $url );


    If you can't get it to work, replace setp 1 with strtr/character-based replacement, like Xetius' solution.

    ReplyDelete
  5. I can't reproduce your problem. I get the expected result.

    How exactly are you using mb_detect_encoding() to verify your string is in fact UTF-8?

    If I simply call mb_detect_encoding($input) on both a UTF-8 and ISO-8859-1 encoded version of your string, both of them return "UTF-8", so that function isn't particularly reliable.

    iconv() gives me a PHP "notice" when it gets the wrongly encoded string and only echoes "F", but that might just be because of different PHP/iconv settings/versions (?).

    I suggest to you try calling mb_check_encoding($input, "utf-8") first to verify that your string really is UTF-8. I think it probably isn't.

    ReplyDelete
  6. When using iconv, locale mus be set:

    function test_enc($text = 'ěščřžýáíé ĚŠČŘŽÝÁÍÉ fóø bår FÓØ BÅR æ')
    {
    echo '<tt>';
    echo iconv('utf8', 'ascii//TRANSLIT', $text);
    echo '</tt><br/>';
    }

    test_enc();
    setlocale(LC_ALL, 'cs_CZ.utf8');
    test_enc();
    setlocale(LC_ALL, 'en_US.utf8');
    test_enc();


    Yields into:

    ????????? ????????? f?? b?r F?? B?R ae
    escrzyaie ESCRZYAIE fo? bar FO? BAR ae
    escrzyaie ESCRZYAIE fo? bar FO? BAR ae


    Another locales then cs_CZ and en_US I haven't installed and I can't test it.

    In C# I see solution using translation to unicode normalized form - accents are splitted out and then filtered via nonspacing unicode category.

    ReplyDelete
  7. One of the tricks I stumbled upon on the web was using htmlentities then stripping the encoded character :

    $stripped = preg_replace('`&[^;]+;`','',htmlentities($string));


    Not perfect but it does work well in some case.

    But, you're writing about creating an URL string, so urlencode and its counterpart urldecode may be better. Or, if you are creating a query string, use this last function : http_build_query.

    ReplyDelete
  8. u can use this class for removing unwanted characters.. But still it does not solves your problem

    ReplyDelete

Post a Comment

Popular posts from this blog

[韓日関係] 首相含む大幅な内閣改造の可能性…早ければ来月10日ごろ=韓国

div not scrolling properly with slimScroll plugin

I am using the slimScroll plugin for jQuery by Piotr Rochala Which is a great plugin for nice scrollbars on most browsers but I am stuck because I am using it for a chat box and whenever the user appends new text to the boxit does scroll using the .scrollTop() method however the plugin's scrollbar doesnt scroll with it and when the user wants to look though the chat history it will start scrolling from near the top. I have made a quick demo of my situation http://jsfiddle.net/DY9CT/2/ Does anyone know how to solve this problem?

Why does this javascript based printing cause Safari to refresh the page?

The page I am working on has a javascript function executed to print parts of the page. For some reason, printing in Safari, causes the window to somehow update. I say somehow, because it does not really refresh as in reload the page, but rather it starts the "rendering" of the page from start, i.e. scroll to top, flash animations start from 0, and so forth. The effect is reproduced by this fiddle: http://jsfiddle.net/fYmnB/ Clicking the print button and finishing or cancelling a print in Safari causes the screen to "go white" for a sec, which in my real website manifests itself as something "like" a reload. While running print button with, let's say, Firefox, just opens and closes the print dialogue without affecting the fiddle page in any way. Is there something with my way of calling the browsers print method that causes this, or how can it be explained - and preferably, avoided? P.S.: On my real site the same occurs with Chrome. In the ex