Current location - Training Enrollment Network - Mathematics courses - What is the working principle of word vectors?
What is the working principle of word vectors?
In order to hand over natural language to machine learning algorithm, it is usually necessary to mathematize the language first. Word vector is a method to mathematize words in a language. One of the simplest word vector methods is one-hotrepresentation, that is, a word is represented by a long vector, and the length of the vector is the size of a dictionary. The component of the vector has only one position pair of 1, and everything else is 0, 1. However, this word vector representation has two shortcomings: it is easily troubled by dimension disaster, especially when it is used in some algorithms of deep learning; Can not describe the similarity between words well (this term seems to be called "lexical vacancy"). The other is the representation of DistributedRepresentation, which was first proposed by Hinton in 1986, and can overcome the above shortcomings of one-hotrepresentation. The basic idea is to map every word in a certain language into a short vector with a fixed length through training (of course, the "short" here is the "long" relative to one-hotrepresentation), and put all these vectors together to form a word vector space, and each vector can be regarded as a point in this space. By introducing "distance" into this space, we can judge words according to their distance. In order to better understand the above ideas, let's give a popular example: suppose there are n different points on a two-dimensional plane, and given one of them, now we want to find a point closest to this point on the plane. What are we going to do? Firstly, we establish a rectangular coordinate system, based on which every point on it uniquely corresponds to a coordinate (x, y); Then introduce Euclidean distance; Finally, the distance between this word and other n-/kloc-0 words is calculated respectively, and the word with the smallest distance value is the word we are looking for. In the above example, the position of coordinate (x, y) is equivalent to the word vector, which is used to mathematically quantify the position of a point on the plane. After the coordinate system is established, it is easy to get the coordinates of a point. However, in NLP task, it is much more complicated to get the word vector, and the word vector is not unique, and its quality depends on the training corpus, training algorithm and the length of the word vector. One way to generate word vectors is to use neural network algorithm. Of course, word vectors are usually tied to the language model, that is, they are obtained at the same time after training. The idea of using neural networks to train language models was first put forward by Xu Wei of Baidu IDL (Deep Learning Institute). The most classic article in this field is aneural documentary language model published by Bengio on JMLR in 2003, and then there are a series of related research works, including word2vec of Google TomasMikolov team.