mysql – Which UTF-8 "collate" is most suitable for the Web (multi-language)

Question:

I usually use utf8_general_ci by default in my projects, but recently I came across that other developers often use utf8_unicode_ci

  • utf8_general_ci : Unicode (multi-language), Case insensitive
  • utf8_unicode_ci : Unicode (multi-language), Case insensitive

Which of these would be more suitable for the web or is there any other utf-8 more suitable for the web?

Answer:

The main difference is how utf8_general_ci and utf8_unicode_ci make similar comparisons with some phonemes.

For example, in the German language the character “ß” would be equivalent to “ss”. Since utf8_unicode_ci has to do this kind of comparison by matching more than one character, it is slower than utf8_general_ci .

That is, if your application doesn't need character comparisons in multiple languages, go from utf8_general_ci .

But considering systems that work globally and must work with multiple languages, such as WordPress or Wikimedia for example, using utf8_unicode_ci is a good way out.

Another interesting chartset to mention is utf8_bin . It is based on a bit-by-bit character comparison, resulting in a case-sensitive comparison, unlike other collations.

Conclusion

The choice of collation depends a lot on the nature of our application. Besides uft8 , there are other charsets to meet the needs of a specific region ( latin1 for example) and as each scope varies a lot, I don't think it's possible to point out the most appropriate one for all cases.

In most cases, utf8_general_ci will do, as, as its name suggests, it is for general use and most commonly found. However, it is interesting to know that there are other collations that can meet a more specific need, such as utf8_unicode_ci and utf8_bin .

Source: MySQL Documentation en

Scroll to Top