c# – Error in Encoding.Unicode


I have the code:

byte[] one = new byte[] {215, 170, 8, 223, 101, 20, 107, 124};
byte[] two = Encoding.Unicode.GetBytes(Encoding.Unicode.GetString(one));
foreach (var el in two)
    Console.Write(el + " ");

At the exit:

215 170 253 255 101 20 107 124

As you can see, the third and fourth bytes do not match. Bug or feature?


This is a feature. With default settings, UnicodeEncoding does not check incoming bytes for validity. Those. you get a string, but since at the input there could be bytes that do not have a valid mapping to a string, then the string at this position will contain 0xFDFF (253, 255).

In fact, the behavior when an invalid character is encountered depends on Encoding, and it can be replaced with a custom one. For Unicode, this is Replacement Fallback. For more details, see MSDN, Character Encoding in .NET, Replacement Fallback.

If you want to receive an error, and not an "invalid character" – specify throwOnInvalidBytes when creating an encoding:

var encoding = new UnicodeEncoding(false, false, throwOnInvalidBytes: false);
encoding.GetString(new byte[] { 215, 170, 8, 223, 101, 20, 107, 124}); // бросит исключение
Scroll to Top