c# – Get accents from a string in C #

Question:

I've searched but can't find what I need, although I find solutions to replace but can't find a solution to help me.

I have an input string with tildes and I need to get the tildes out, my code:

string palabra = "pálábrá cón tíldés";

string palabaSinTilde = Regex.Replace(palabra, @"[^0-9A-Za-z]", "",
RegexOptions.None);

The output I have is: "plbr cn tlds"

What I need: word with accents

Thank you, have a good afternoon.

Answer:

Try the following extension method

public static class StringExtensions
{
    public static string SinTildes(this string texto) =>
        new String(
            texto.Normalize(NormalizationForm.FormD)
            .Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
            .ToArray()
        )
        .Normalize(NormalizationForm.FormC);
}

Explanation:

Characters like á , ö , etc. can be expressed in Unicode in two ways: A single character that corresponds to the character already accented: á for example or two consecutive characters where the first character is the accent and the next character is the character to which it is going to apply ´a . Both ways result for text editors to show this version -> á

This line:

.texto.Normalize(NormalizationForm.FormD)

Ensures that the string is expanded to separate characters such as tides and other modifiers in their constituent characters.

Later

.Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)

It makes sure to stick with only those characters that are not diacritical.

Then a new string is created with characters already removed

 new String(...)

Finally the chain is returned to its normal state with this line

.Normalize(NormalizationForm.FormC)
Scroll to Top