Hello. I have five languages: English, Portuguese, Spanish, Russian, Bulgarian. When the user enters a word, I have to determine the percentage of that word matches all languages. For example, the word encyclopedia is entered and it gives me the result: Russian – 99%, Bulgarian 87%, Spanish – 3%, English – 3%, Portuguese – 2%. I took the numbers from the ceiling. I've already looked at a bunch of api, but they all just define the language of the text and some give out% correspondence to this language. And I need all these five languages. Help me please 🙂
For a word, you can calculate the Levenshtein distances from it to each word in the dictionary, associate with each of them the posterior error probability (for the normal distribution law of the error, according to Student's law), and then use the probability formulas for the sum of events.
For a phrase, you can use the probability formula to make mistakes in words.