How to clean String from tildes in java?

Question:

I have tried various methods to try this, and none worked for me. What I need is a method that clears words from accents, and other symbols like ñ or ü. That is to say:

If it receives the word corrió , it returns the word corrio , or ñandú , returns nandu .

I tried the following methods found on this site and others and none of them work!

private String remove1(String texto) {
    String original = "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ";
    // Cadena de caracteres ASCII que reemplazarán los originales.
    String ascii = "AAAAAAACEEEEIIIIDNOOOOOOUUUUYBaaaaaaaceeeeiiiionoooooouuuuyy";
    String output = texto;
    for (int i=0; i<original.length(); i++) {
    // Reemplazamos los caracteres especiales.

        output = output.replace(original.charAt(i), ascii.charAt(i));

    }//for i

I tried this other one too:

public String deAccent(String str) {
      String nfdNormalizedString = Normalizer.normalize(str,  Normalizer.Form.NFD); 
      Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
      return pattern.matcher(nfdNormalizedString).replaceAll("");
}

They all return the same string to me without modifying it, what could be the error?

You would also need to remove the punctuation marks, that only alphanumeric characters remain when being read from a .txt file

EDIT, complete class: The method receives a list of text files (books) to clean them and add them to a vocabulary implemented in a binary tree.

public boolean procesar()
{

        if (libros.getFirst()==null)
        {
            System.out.println("no hay libro");
            return false;

        }
        else 
        {


        File f;

        for (int i = 0; i < libros.size(); i++) 
        {   
            try {
            Libro l;
            l = (Libro) libros.pollFirst();
            f = l.getFile();
            FileReader fr = new FileReader(f);
            BufferedReader br = new BufferedReader(fr);
            String ln = br.readLine();

                while(ln!=null)
                {

                    StringTokenizer st = new StringTokenizer(clean);
                    while (st.hasMoreTokens()) 
                    {

                        String word = st.nextToken();
                        String clean;
                        clean = remove1(word);


                        Palabra p = new Palabra(clean, l);
                        if (h1.contains(p)==false)
                        {
                        h1.add(p);
                       contador++;    

                        }
                        else
                        {
                            Palabra aux = (Palabra) h1.search(p);
                            aux.agregaUno();


                        }

                        //System.out.println(clean);

                    }
                ln = br.readLine();

                }


                }
            catch (IOException ioe)
            {
                System.out.println("ERROR AL ABRIR ARCHIVO");

            }
        }
        return true;
        }

}

Answer:

To remove accents can use the class StringUtils Apache Commons lang3.

Using this class, you have the stripAccents method, which, in its definition, puts the following:

Removes diacritics (~ = accents) from a string. The case will not be altered.

For instance, 'à' will be replaced by 'a'.

What it comes to say:

Remove diacritics (~ = accents) from a string. The case will not be altered.

For example, 'à' will be replaced by 'a'.

Scroll to Top