validation – How to validate a Brazilian civil name?

Question:

How to validate people's names in Brazilian Portuguese?

Answer:

The Portuguese alphabet is based on the Latin alphabet , which consists of 26 characters:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Added to these characters, the Portuguese alphabet (from Brazil) adds the following diacritical symbols :

  • ~ (Til) : nasalizes the vowel "a" and the diphthongs "ae", "oe" and "ao" — ã / ãe / õe / ão.
  • ¸ (Cedilha) : gives the letter "c" the sound of the letter "s" in front of "a", "o" and "u" — ç.
  • ^ (Circumflex Accent) : indicates the stressed syllable and closes the timbre of the vowels "a", "e" and "o", in cases where graphic accentuation is required – â / ê / ô.
  • ´ (Acute Accent) : indicates the stressed syllable and opens the timbre of vowels in cases where graphic accentuation is required – á / é / í / ó / ú.
  • ` (Severe Accent) : used to mark the feminine dative case (à), as opposed to "ao" (masculine), and the pronouns "that", "that" and "that" – à.
  • ¨ (Trema) : currently used only in Brazilian Portuguese to indicate the pronunciation of the vowel "u" ​​in the sequences "que", "qui", "güe" and "güi" – ü.

So, in addition to the traditional range az and AZ , we also have to include the characters ãõ , ç , âêô à , áéíóú and ü . And of course, let's not forget about white space.

The regex would look like:

[^a-zA-ZáéíóúàâêôãõüçÁÉÍÓÚÀÂÊÔÃÕÜÇ ]

Also remember:

  • Treat any blank spaces at the beginning and end of the name ( trim ).
  • Check for consecutive blank spaces.

Example (C#)

public static string TratarNome(string nome)
{
    if (string.IsNullOrWhiteSpace(nome)) throw new ArgumentException("Um nome em branco foi passado.");

    // Removendo caracteres em branco no ínicio e no final do nome:
    nome = nome.Trim();

    // Trocando dois ou mais espaços em branco consecutivos por apenas um:
    nome = Regex.Replace(nome, "[ ]{2,}", " ", RegexOptions.Compiled);

    // Verificando a ocorrência de caracteres inválidos no alfabeto português (do Brasil):
    if (Regex.IsMatch(nome, "[^a-zA-ZáéíóúàâêôãõüçÁÉÍÓÚÀÂÊÔÃÕÜÇ ]", RegexOptions.Compiled)) throw new ArgumentException("Nome inválido: \"" + nome + "\".");

    return nome;
}

In practice

I ran the code above on a base with tens of thousands of Brazilian names (around 100,000).

Of these I got the following false positives:

  • ñ : PEÑA , CAMIÑA , YÁÑEZ , MUÑOZ and MUÑIZ .
  • ' : SAINT'CLAIR .
  • - : SAINT-CLAIR .

In addition to the name of our colleague @jpkrohling:

  • ö : KRÖHLING.

Another curiosity is that a few records have the NBSP (160) whitespace instead of the common SP (32) whitespace. Validation also detected this (and in our case we decided to replace).


handling international names

Treating names, especially internationally, is not a simple task. The above treatment would fail with relatively common names like Björk , Marić ; or not as common as Graham-Cumming .

Also, when being more permissive, beware of a possible breach of an XSS attack . An example would be the use of the apostrophe . Some names use the apostrophe , which is often (wrongly?) represented by the single quote character ( ' ) instead of the correct character ( ' ).

The warning remains.

Scroll to Top