Update tokenizer internal vocabulary based on a list of texts or list of sequences.
Update tokenizer internal vocabulary based on a list of texts or list of sequences.
fit_text_tokenizer(object, x)Arguments
| object | Tokenizer returned by |
| x | Vector/list of strings, or a generator of strings (for memory-efficiency); Alternatively a list of "sequence" (a sequence is a list of integer word indices). |
Note
Required before using texts_to_sequences(), texts_to_matrix(), or
sequences_to_matrix().
See also
Other text tokenization: save_text_tokenizer,
sequences_to_matrix,
text_tokenizer,
texts_to_matrix,
texts_to_sequences_generator,
texts_to_sequences