There is a string like:
"Привет" . It is checked via
if , if so, then the value is
1 , otherwise
But if you enter:
"пРивет" , etc., then the value will be
How to make it so that there is a case-ignoring character in a string?
It would seem that you can simply take and bring both strings to a single case (upper or lower), but everything is not so simple. There is text for which
text.lower() != text.upper().lower() , like
"ß".lower() >>> 'ß' "ß".upper().lower() >>> 'ss'
Let's say you need to compare
"Buße" , or even
"BUẞE" – these are all considered the same words in German. The recommended way is to use the
casefold method, which converts the string into a form suitable for case-insensitive comparison.
>>> "BUSSE".casefold() == "Buße".casefold() True
But that's not all. If your text renders correctly, in the following example you might think that
'Й' == 'Й' , but it's not:
>>> 'Й' == 'Й' False
The fact is that the first
Й is one character (U+0419), and the second
Й is a combination of two (U+0418 and U+0306):
>>> import unicodedata >>> [unicodedata.name(char) for char in 'Й'] ['CYRILLIC CAPITAL LETTER SHORT I'] >>> [unicodedata.name(char) for char in 'Й'] ['CYRILLIC CAPITAL LETTER I', 'COMBINING BREVE']
If you need to treat such strings as the same, then the easiest way to deal with this is to use
unicodedata.normalize . You should probably use NFKD normalization, butthe documentation suggests other options ; You can choose what suits your task. Then:
>>> unicodedata.normalize('NFKD', 'Й') == unicodedata.normalize('NFKD', 'Й') True
Putting it all together, you can use functions like this:
import unicodedata def normalize_caseless(text): return unicodedata.normalize("NFKD", text.casefold()) def caseless_equal(left, right): return normalize_caseless(left) == normalize_caseless(right) >>> caseless_equal('BUSSE', 'Buße') True >>> caseless_equal('Й', 'Й') True
Free translation of the answer from Veedrac with enSO. There are helpful comments, you can read them too.