I usually use
utf8_general_ci by default in my projects, but recently I came across that other developers often use
utf8_general_ci: Unicode (multi-language), Case insensitive
utf8_unicode_ci: Unicode (multi-language), Case insensitive
Which of these would be more suitable for the web or is there any other utf-8 more suitable for the web?
The main difference is how
utf8_unicode_ci make similar comparisons with some phonemes.
For example, in the German language the character “ß” would be equivalent to “ss”. Since
utf8_unicode_ci has to do this kind of comparison by matching more than one character, it is slower than
That is, if your application doesn't need character comparisons in multiple languages, go from
But considering systems that work globally and must work with multiple languages, such as WordPress or Wikimedia for example, using
utf8_unicode_ci is a good way out.
Another interesting chartset to mention is
utf8_bin . It is based on a bit-by-bit character comparison, resulting in a case-sensitive comparison, unlike other collations.
The choice of collation depends a lot on the nature of our application. Besides
uft8 , there are other charsets to meet the needs of a specific region (
latin1 for example) and as each scope varies a lot, I don't think it's possible to point out the most appropriate one for all cases.
In most cases,
utf8_general_ci will do, as, as its name suggests, it is for general use and most commonly found. However, it is interesting to know that there are other collations that can meet a more specific need, such as
Source: MySQL Documentation en