I have a selection from texts.
doc2vec from through
gensim library. The result is good. Determines similar texts with a bang. How can texts be clustered?
I tried to do this: I got a vector for each text. I threw it all into
k-means . The result is not very good.
What other approaches can you use with a trained
k-means only works with Euclidean distance, I suggest paying attention to a similar
k-medoids . It differs from the previous one in that in the latter, any distance can be used (in this case, cosine is suitable). The only drawback is that
k-medoids more time-consuming than
A full comparison of algorithms is offered here: https://stackoverflow.com/questions/21619794/what-makes-the-distance-measure-in-k-medoid-better-than-k-means