Skip to main content

How do I remove accents from characters in a PHP string?



I'm attempting to remove accents from characters in PHP string as the first step to making the string usable in a URL.





I'm using the following code:







$input = "Fóø Bår";



setlocale(LC_ALL, "en_US.utf8");

$output = iconv("utf-8", "ascii//TRANSLIT", $input);



print($output);







The output I would expect would be something like this:







F'oo Bar







However, instead of the accented characters being transliterated they are replaced with question marks:







F?? B?r







Everything I can find online indicates that setting the locale will fix this problem, however I'm already doing this. I've already checked the following details:





  1. The locale I am setting is supported by the server (included in the list produced by locale -a )



  2. The source and target encodings (UTF-8 and ASCII) are supported by the server's version of iconv (included in the list produced by iconv -l )



  3. The input string is UTF-8 encoded (verified using PHP's mb_check_encoding function, as suggested in the answer by mercator )



  4. The call to setlocale is successful (it returns 'en_US.utf8' rather than FALSE )










The cause of the problem:





The server is using the wrong implementation of iconv. It has the glibc version instead of the required libiconv version.







Note that the iconv function on some systems may not work as you expect. In such case, it'd be a good idea to install the GNU libiconv library. It will most likely end up with more consistent results.


PHP manual's introduction to iconv







Details about the iconv implementation that is used by PHP are included in the output of the phpinfo function.





(I'm not able to re-compile PHP with the correct iconv library on the server I'm working with for this project so the answer I've accepted below is the one that was most useful for removing accents without iconv support.)



Source: Tips4all

Comments

  1. I think the problem here is that your encodings consider ä and å different symbols to 'a'. In fact, the PHP documentation for strtr offers a sample for removing accents the ugly way :(

    http://ie2.php.net/strtr

    ReplyDelete
  2. You could use urlencode. Does not quite do what you want (remove accents), but will give you a url usable string

    $output = urlencode ($input);


    In Perl I could use a translate regex, but I cannot think of the PHP equivalent

    $input =~ tr/áâàå/aaaa/;


    etc...

    you could do this using preg_replace

    $patterns[0] = '/[á|â|à|å|ä]/';
    $patterns[1] = '/[ð|é|ê|è|ë]/';
    $patterns[2] = '/[í|î|ì|ï]/';
    $patterns[3] = '/[ó|ô|ò|ø|õ|ö]/';
    $patterns[4] = '/[ú|û|ù|ü]/';
    $patterns[5] = '/æ/';
    $patterns[6] = '/ç/';
    $patterns[7] = '/ß/';
    $replacements[0] = 'a';
    $replacements[1] = 'e';
    $replacements[2] = 'i';
    $replacements[3] = 'o';
    $replacements[4] = 'u';
    $replacements[5] = 'ae';
    $replacements[6] = 'c';
    $replacements[7] = 'ss';

    $output = preg_replace($patterns, $replacements, $input);


    (Please note this was typed from a foggy beer ridden Friday after noon memory, so may not be 100% correct)

    or you could make a hash table and do a replacement based off of that.

    ReplyDelete
  3. This is a code i found and use often

    function stripAccents($stripAccents){
    return strtr($stripAccents,'àáâãäçèéêëìíîïñòóôõöùúûüýÿÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝ','aaaaaceeeeiiiinooooouuuuyyAAAAACEEEEIIIINOOOOOUUUUY');
    }

    ReplyDelete
  4. I agree with georgebrock's comment.

    If you find a way to get //TRANSLIT to work, you can build friendly URLs:


    use iconv with //TRANSLIT ñ => n~
    remove non-alphanumeric non-whitespace chars inside words: $url = preg_replace( '/(\w)[^\w\s](\w)/', '$1$2', $url );
    replace remaining separations: $url = preg_replace( '/[^a-z0-9]+/', '-', $url );
    remove double/leading/traling: $url = preg_replace( '-', e.g. '/(?:(^|\-)\-+|\-$)/', '', $url );


    If you can't get it to work, replace setp 1 with strtr/character-based replacement, like Xetius' solution.

    ReplyDelete
  5. I can't reproduce your problem. I get the expected result.

    How exactly are you using mb_detect_encoding() to verify your string is in fact UTF-8?

    If I simply call mb_detect_encoding($input) on both a UTF-8 and ISO-8859-1 encoded version of your string, both of them return "UTF-8", so that function isn't particularly reliable.

    iconv() gives me a PHP "notice" when it gets the wrongly encoded string and only echoes "F", but that might just be because of different PHP/iconv settings/versions (?).

    I suggest to you try calling mb_check_encoding($input, "utf-8") first to verify that your string really is UTF-8. I think it probably isn't.

    ReplyDelete
  6. When using iconv, locale mus be set:

    function test_enc($text = 'ěščřžýáíé ĚŠČŘŽÝÁÍÉ fóø bår FÓØ BÅR æ')
    {
    echo '<tt>';
    echo iconv('utf8', 'ascii//TRANSLIT', $text);
    echo '</tt><br/>';
    }

    test_enc();
    setlocale(LC_ALL, 'cs_CZ.utf8');
    test_enc();
    setlocale(LC_ALL, 'en_US.utf8');
    test_enc();


    Yields into:

    ????????? ????????? f?? b?r F?? B?R ae
    escrzyaie ESCRZYAIE fo? bar FO? BAR ae
    escrzyaie ESCRZYAIE fo? bar FO? BAR ae


    Another locales then cs_CZ and en_US I haven't installed and I can't test it.

    In C# I see solution using translation to unicode normalized form - accents are splitted out and then filtered via nonspacing unicode category.

    ReplyDelete
  7. One of the tricks I stumbled upon on the web was using htmlentities then stripping the encoded character :

    $stripped = preg_replace('`&[^;]+;`','',htmlentities($string));


    Not perfect but it does work well in some case.

    But, you're writing about creating an URL string, so urlencode and its counterpart urldecode may be better. Or, if you are creating a query string, use this last function : http_build_query.

    ReplyDelete
  8. u can use this class for removing unwanted characters.. But still it does not solves your problem

    ReplyDelete

Post a Comment

Popular posts from this blog

Why is this Javascript much *slower* than its jQuery equivalent?

I have a HTML list of about 500 items and a "filter" box above it. I started by using jQuery to filter the list when I typed a letter (timing code added later): $('#filter').keyup( function() { var jqStart = (new Date).getTime(); var search = $(this).val().toLowerCase(); var $list = $('ul.ablist > li'); $list.each( function() { if ( $(this).text().toLowerCase().indexOf(search) === -1 ) $(this).hide(); else $(this).show(); } ); console.log('Time: ' + ((new Date).getTime() - jqStart)); } ); However, there was a couple of seconds delay after typing each letter (particularly the first letter). So I thought it may be slightly quicker if I used plain Javascript (I read recently that jQuery's each function is particularly slow). Here's my JS equivalent: document.getElementById('filter').addEventListener( 'keyup', function () { var jsStart = (new Date).getTime()...

Is it possible to have IF statement in an Echo statement in PHP

Thanks in advance. I did look at the other questions/answers that were similar and didn't find exactly what I was looking for. I'm trying to do this, am I on the right path? echo " <div id='tabs-".$match."'> <textarea id='".$match."' name='".$match."'>". if ($COLUMN_NAME === $match) { echo $FIELD_WITH_COLUMN_NAME; } else { } ."</textarea> <script type='text/javascript'> CKEDITOR.replace( '".$match."' ); </script> </div>"; I am getting the following error message in the browser: Parse error: syntax error, unexpected T_IF Please let me know if this is the right way to go about nesting an IF statement inside an echo. Thank you.