Converts a text to a sequence of indexes in a fixed-size hashing space.
Converts a text to a sequence of indexes in a fixed-size hashing space.
text_hashing_trick(text, n, hash_function = NULL,
filters = "!\"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n", lower = TRUE,
split = " ")Arguments
| text | Input text (string). |
| n | Dimension of the hashing space. |
| hash_function | if |
| filters | Sequence of characters to filter out such as punctuation. Default includes basic punctuation, tabs, and newlines. |
| lower | Whether to convert the input to lowercase. |
| split | Sentence split marker (string). |
Value
A list of integer word indices (unicity non-guaranteed).
Details
Two or more words may be assigned to the same index, due to possible collisions by the hashing function.
See also
Other text preprocessing: make_sampling_table,
pad_sequences, skipgrams,
text_one_hot,
text_to_word_sequence