How to case ignore all characters in a Python string?

Question:

There is a string like: "Привет" . It is checked via if , if so, then the value is 1 , otherwise 0 .

But if you enter: "привет" or "привеТ" or "пРивет" , etc., then the value will be 0 .

How to make it so that there is a case-ignoring character in a string?

Answer:

It would seem that you can simply take and bring both strings to a single case (upper or lower), but everything is not so simple. There is text for which text.lower() != text.upper().lower() , like "ß" :

"ß".lower()
>>> 'ß'

"ß".upper().lower()
>>> 'ss'

Let's say you need to compare "BUSSE" and "Buße" , or even "BUSSE" and "BUẞE" – these are all considered the same words in German. The recommended way is to use the casefold method, which converts the string into a form suitable for case-insensitive comparison.

>>> "BUSSE".casefold() == "Buße".casefold()
True

But that's not all. If your text renders correctly, in the following example you might think that 'Й' == 'Й' , but it's not:

>>> 'Й' == 'Й'
False

The fact is that the first Й is one character (U+0419), and the second Й is a combination of two (U+0418 and U+0306):

>>> import unicodedata

>>> [unicodedata.name(char) for char in 'Й']
['CYRILLIC CAPITAL LETTER SHORT I']

>>> [unicodedata.name(char) for char in 'Й']
['CYRILLIC CAPITAL LETTER I', 'COMBINING BREVE']

If you need to treat such strings as the same, then the easiest way to deal with this is to use unicodedata.normalize . You should probably use NFKD normalization, butthe documentation suggests other options ; You can choose what suits your task. Then:

>>> unicodedata.normalize('NFKD', 'Й') == unicodedata.normalize('NFKD', 'Й')
True

Putting it all together, you can use functions like this:

import unicodedata

def normalize_caseless(text):
    return unicodedata.normalize("NFKD", text.casefold())

def caseless_equal(left, right):
    return normalize_caseless(left) == normalize_caseless(right)


>>> caseless_equal('BUSSE', 'Buße')
True

>>> caseless_equal('Й', 'Й')
True

Free translation of the answer from Veedrac with enSO. There are helpful comments, you can read them too.

Scroll to Top