Question:
Please help me decide.
1) var_dump('%EF%BB%BF'); //string(3) ""
2) var_dump('%C2%A0'); // string(2) " "
The first is kind of an empty string, as can be the result var_dum = string(3)
"".
I don’t know any other explanation as unicode. I walk these (utf-8) invisible characters.
The easiest option
var_dump(explode('%', 'aaa%EF%BB%BF')[0]);
But how right it is to act this way, I'm not sure
If this is so, then we must take into account that maybe this space under a different encoding looks different and how can you clear the line from it?
Answer:
%EF%BB%BF
– BOM – Byte Order Mark for Unicode.
%C2%A0
– Utf-8 non breaking space
In this format, you see them in the URL, but in php they come as:
%EF%BB%BF
=> pack ("CCC", 0xef, 0xbb, 0xbf)%C2%A0
=> pack ("CC", 0xc2,0xa0)Well, or more simply:
urldecode('%C2%A0')
This is how they should be checked.
In this example of yours, in the original string, all characters are printable :
var_dump(explode('%', 'aaa%EF%BB%BF')[0]);
You can delete it like this:
function removeBOM($str=""){
if(substr($str, 0,3) == pack("CCC",0xef,0xbb,0xbf)) {
$str=substr($str, 3);
}
return $str;
}
So:
$str = preg_replace('/\xA0/u', '', 'A'.pack("CC",0xc2,0xa0).'B');
Remove all non-printable characters:
$str = preg_replace('/[^[:print:]]/', '', $str);
var_dump(preg_replace('/\xA0/u', '', urldecode("Word%C2%A0Word"))); // WordWord
var_dump(preg_replace('/[^[:print:]]/', '', urldecode("%EF%BB%BFWord%C2%A0Word"))); // WordWord