Generates skipgram word pairs.
Generates skipgram word pairs.
skipgrams(sequence, vocabulary_size, window_size = 4, negative_samples = 1,
shuffle = TRUE, categorical = FALSE, sampling_table = NULL,
seed = NULL)Arguments
| sequence | A word sequence (sentence), encoded as a list of word indices
(integers). If using a |
| vocabulary_size | Int, maximum possible word index + 1 |
| window_size | Int, size of sampling windows (technically half-window).
The window of a word |
| negative_samples | float >= 0. 0 for no negative (i.e. random) samples. 1 for same number as positive samples. |
| shuffle | whether to shuffle the word couples before returning them. |
| categorical | bool. if [[1,0]: R:[1,0 [0,1]: R:0,1 [0,1]: R:0,1 |
| sampling_table | 1D array of size |
| seed | Random seed |
Value
List of couples, labels where:
couplesis a list of 2-element integer vectors:[word_index, other_word_index].labelsis an integer vector of 0 and 1, where 1 indicates thatother_word_indexwas found in the same window asword_index, and 0 indicates thatother_word_indexwas random.if
categoricalis set toTRUE, the labels are categorical, ie. 1 becomes[0,1], and 0 becomes[1, 0].
Details
This function transforms a list of word indexes (lists of integers) into lists of words of the form:
(word, word in the same window), with label 1 (positive samples).
(word, random word from the vocabulary), with label 0 (negative samples).
Read more about Skipgram in this gnomic paper by Mikolov et al.: Efficient Estimation of Word Representations in Vector Space
See also
Other text preprocessing: make_sampling_table,
pad_sequences,
text_hashing_trick,
text_one_hot,
text_to_word_sequence