php – ('% EF% BB% BF', '% C2% A0') What are these characters and how can they be removed from the url?

Question:

Please help me decide.

1) var_dump('%EF%BB%BF'); //string(3) ""

2) var_dump('%C2%A0'); // string(2) " "

The first is kind of an empty string, as can be the result var_dum = string(3) "".

I don’t know any other explanation as unicode. I walk these (utf-8) invisible characters.

The easiest option

var_dump(explode('%', 'aaa%EF%BB%BF')[0]);

But how right it is to act this way, I'm not sure

If this is so, then we must take into account that maybe this space under a different encoding looks different and how can you clear the line from it?

Answer:

%EF%BB%BFBOM – Byte Order Mark for Unicode.
%C2%A0 – Utf-8 non breaking space

In this format, you see them in the URL, but in php they come as:

  • %EF%BB%BF => pack ("CCC", 0xef, 0xbb, 0xbf)
  • %C2%A0 => pack ("CC", 0xc2,0xa0)

Well, or more simply:
urldecode('%C2%A0')

This is how they should be checked.

In this example of yours, in the original string, all characters are printable :
var_dump(explode('%', 'aaa%EF%BB%BF')[0]);

You can delete it like this:

function removeBOM($str=""){
    if(substr($str, 0,3) == pack("CCC",0xef,0xbb,0xbf)) {
            $str=substr($str, 3);
    }
    return $str;
}

So:

$str = preg_replace('/\xA0/u', '', 'A'.pack("CC",0xc2,0xa0).'B');

Remove all non-printable characters:

$str = preg_replace('/[^[:print:]]/', '', $str);

Demo

var_dump(preg_replace('/\xA0/u', '', urldecode("Word%C2%A0Word"))); // WordWord
var_dump(preg_replace('/[^[:print:]]/', '', urldecode("%EF%BB%BFWord%C2%A0Word"))); // WordWord
Scroll to Top