How do I split Tamil characters in a string?
When I use preg_match_all('/./u', $str, $results)
,
I get the characters "த", "à®®", "ி", "à®´" and "்".
How do I get the combined characters "த", "à®®ி" and "à®´்"?
Source: Tips4all
I think you should be able to use the grapheme_extract function to iterate over the combined characters (which are technically called "grapheme clusters").
ReplyDeleteAlternatively, if you prefer the regex approach, I think you can use this:
preg_match_all('/\pL\pM*|./u', $str, $results)
where \pL means a Unicode "letter", and \pM means a Unicode "mark".
(Disclaimer: I have not tested either of these approaches.)