About word2vec, which made the field of natural language processing exciting.
There is a program called word2vec that converts words into vector data.
This program can convert a word into a vector of any dimension.
It is known that the word vector generated by this program is surprisingly a word that makes sense intuitively when the word vector is added or subtracted.
For example,'king'-' man' +'woman' ='queen'
I feel said. It's a very accurate and amazing program.
・ Skip-gram of this algorithm on various sites
There is an explanation about, but there is no explanation about how to put word information into a vector of any dimension, so I do not know.
How does word2vec end up vectorizing words?
And how do you learn words to have meaning in vector space?
Please explain in Japanese.
It can be understood as knowledge that postscript learning is performed by a recurrent neural network.
I want to know what each dimension of the learned vector corresponds to in the neural network.
What you learn with word2vec is not a vector of sentences (text), but a vector of words.
Since the vector of the sentence corresponds to one word and one dimension, it is based on the frequency of appearance of the word.
It represents the characteristics of the document based on the frequency of appearance of words. This makes sense.
But I don't know what the word vector is based on.
Why, for example, in a 100-dimensional vector space, a king, a queen, a man, and a woman are located in the above relationship?
It's strange. In word2vec, it seems that learning is done with a neural network based on the probability of word co-occurrence. What does a node or synapse in a neural network correspond to in a word vector?
I don't really understand that
word2vec has the following properties: -Represents a word with a 1of-K vector-Learning with a neural network-The input and output of a neural network are words
At that time, n nodes (n is an arbitrary number determined by the user) are passed through as an intermediate layer.
Each node is learned little by little with the word of interest and the words that appear (co-occurrence) around it as input and output (correct answer data).
The n nodes in the middle layer at the end of learning are the vectors of each word in word2vec.
I didn't think that the n nodes in this middle layer were in each dimension of the vector.
It is strange that the vector created by this method has the above properties.
I understand it thanks to the following references. Please read it if you like.
References: Natural language processing with word2vec