How to do a spell check in C#?

Question:

I need to do an analysis of the words contained in a database. The analysis consists of promoting a spell check only, showing a report on the screen (gridview) with the misspelled words.

I never developed anything like it, I wanted a light.

I can start the example with this:

string[] palavrasParaCorrigir = {"batata", "conoira", "cebola", "pimentao", "beterraba"};

Answer:

Probably already existing libraries like Hunspell (already mentioned in the chosen answer) or Aspell will solve your problem quickly: these libraries exist for several languages ​​and are used in several programs.

But if you want to dig a little deeper: there's an excellent article by Peter Norvig (Google Research Director) on the subject: http://norvig.com/spell-correct.html

Of course, it's in English, but it explains in a basic way how Google's broker works when we use the search engine and it suggests a fix.

In summary: the system is based on a dictionary with Hamming Code verification at distance 2. In the case of the article and the examples, the dictionary is a file with a lot of text, where these are correctly spelled. For this, Peter Norvig used several texts by Shakespeare.

When the user enters a word, the program takes that word, and sees if it exists in the dictionary. If so, the word is correct.

If it doesn't exist, it generates several mutants (error variations) of that word using the following techniques:

  • Swaps the position of nearby letters;
  • Take one of the letters, for each position;
  • Inserts a letter, in each position;
  • Erases one letter in each position.

From this list of mutants, it will check if any of them exist in the dictionary. The one that exists in the greatest number will be the correct one.

In the example program, if it still doesn't find a correct word, it takes each word from the mutant list, and generates new mutants. And again see if any of them exist in the dictionary.

At the end of the article , there's the code for the program in several languages ​​(at the time, I wrote a Java version and a Groovy version) but you'll see versions for practically every language, including two versions in C#.

The only additional detail is that you may have to tinker with the source code so that the range of letters doesn't just go from az, but also include the letters with accents, as we use in Portuguese.

Of course, you will need a dictionary in Portuguese. Or, optionally, if your list is just made up of products, for example, you can use your product list instead of the dictionary.

Scroll to Top