python – Clustering texts after doc2vec

Question:

I have a selection from texts. doc2vec from through doc2vec from gensim library. The result is good. Determines similar texts with a bang. How can texts be clustered?

I tried to do this: I got a vector for each text. I threw it all into k-means . The result is not very good.

What other approaches can you use with a trained doc2vec model?

Answer:

Since k-means only works with Euclidean distance, I suggest paying attention to a similar k-medoids . It differs from the previous one in that in the latter, any distance can be used (in this case, cosine is suitable). The only drawback is that k-medoids more time-consuming than k-means .

A full comparison of algorithms is offered here: https://stackoverflow.com/questions/21619794/what-makes-the-distance-measure-in-k-medoid-better-than-k-means

Scroll to Top