Search the blog

If you use str_split() on a string that has characters that contain more than one byte you will notice the results are odd. For example, when using UTF-8, the accented character in the word Café takes up two bytes. If you run str_spit() on this you will get this following array: [0] => C [1] => a [2] => f [3] => � [4] => �. This is because str_split() is not multibyte-safe.

Note that if you don't understand why you get these results read my blog post “What you need to know about PHP’s internal character encoding”.

Here is a UTF-8 multibyte-safe version of str_split():

function split($str, $len = 1) {

    $arr		= [];
    $length 	= mb_strlen($str, 'UTF-8');

    for ($i = 0; $i < $length; $i += $len) {

        $arr[] = mb_substr($str, $i, $len, 'UTF-8');

    }

    return $arr;

}

print_r(str_split('Café')); // Wrong! Array ( [0] => C [1] => a [2] => f [3] => � [4] => � )

print_r(split('Café')); // Right! Array ( [0] => C [1] => a [2] => f [3] => é )

print_r(split('Café', 3)); // Array ( [0] => Caf [1] => é )
Tim Bennett is a freelance web designer from Leeds. He has a First Class Honours degree in Computing from Leeds Metropolitan University and currently runs his own one-man web design company, Texelate.