python – Encoding problems when sending an html template with Russian text: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0

Question:

Given: Windows 10, Python 3.4, PyScripter, Flask There is a file (let's say main.py ) in which the template is called

def index():
    return render_template('index.html')

in index.html input field

<div class="form-group">
    <label for="exampleInputName">Name</label>

If you run localhost:5000 like this, then everything is ok, but if you change it to Russian

<div class="form-group">
    <label for="exampleInputName">Имя</label>

then it gives an error

builtins.UnicodeDecodeError UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 437:> invalid continuation byte

Python3 uses utf-8 encoding by default, in html file also
meta charset="utf-8" spelled out.
Traceback starts swearing for a line
return render_template('index.html') in main.py

Answer:

It is not enough to write <meta charset="utf-8" /> , you also need to save the file itself in utf-8 encoding.


What for?

The registered meta does not affect the contents of the file in any way. It only gives a hint to the browser which encoding to use to decode the file. You can write the meta cp1251, but save it anyway in utf-8 – this will work, only the browser will be cracked, because it will listen to the hint and try to decode utf-8 bytes as cp1251 characters.

Flask doesn't use meta. It (more precisely, Jinja2) stupidly reads a text file and interprets it as a Jinja2 template. And whether there is HTML, or CSS, or JS, or JSON – the flag does not care.

The Jinja2 templating engine tries to read files as utf-8 by default and doesn't care about your meta.


But that's not all: all template processing is tied to unicode strings (for which, for simplicity, we can assume that they do not have a specific encoding, since this is a python implementation detail that is not important to us) , and render_template returns unicode (not byte!) string regardless of the encoding in which the file was saved and what is written there in the meta.

Before sending the browser response, Flask must back-encode the unicode string into some bytes, and by default it uses utf-8 encoding .

Thus, it does not matter in what encoding the file is saved and what is written in the meta – utf-8 bytes will always be sent to the browser! The browser will try to decode these bytes according to the meta hint. That is, if you replace the default encoding in Jinja2 (and do not replace it in Flask Response), then the encoding of the source template file and the encoding written in the meta do not have to match at all 🙂


Python file encodings and default encodings in Python itself do not play any role in this particular case.

Scroll to Top