python – Clustering texts after doc2vec


I have a selection from texts. doc2vec from through doc2vec from gensim library. The result is good. Determines similar texts with a bang. How can texts be clustered?

I tried to do this: I got a vector for each text. I threw it all into k-means . The result is not very good.

What other approaches can you use with a trained doc2vec model?


Since k-means only works with Euclidean distance, I suggest paying attention to a similar k-medoids . It differs from the previous one in that in the latter, any distance can be used (in this case, cosine is suitable). The only drawback is that k-medoids more time-consuming than k-means .

A full comparison of algorithms is offered here:

Scroll to Top