SenticGCNTokenizer¶
-
class
SenticGCNTokenizer
(vocab_file: Optional[str] = None, train_files: Optional[List[str]] = None, train_vocab: bool = False, do_lower_case: bool = True, unk_token: str = '<unk>', pad_token: str = '<pad>', **kwargs)[source]¶ The SenticGCN tokenizer class used for to generate tokens for the embedding model.
- Parameters
text (
str
) – input text string to tokenize
- Example::
tokenizer = SenticGCNTokenizer.from_pretrained(“senticgcn”) inputs = tokenizer(‘Hello World!’) inputs[‘input_ids’]
-
get_vocab
()[source]¶ Returns the vocabulary as a dictionary of token to index.
tokenizer.get_vocab()[token] is equivalent to tokenizer.convert_tokens_to_ids(token) when token is in the vocab.
- Returns
The vocabulary.
- Return type
Dict[str, int]
-
save_vocabulary
(save_directory: str, filename_prefix: Optional[str] = None) → Tuple[str][source]¶ Save only the vocabulary of the tokenizer (vocabulary + added tokens).
This method won’t save the configuration and special token mappings of the tokenizer. Use [~PreTrainedTokenizerFast._save_pretrained] to save the whole state of the tokenizer.
- Parameters
save_directory (str) – The directory in which to save the vocabulary.
filename_prefix (str, optional) – An optional prefix to add to the named of the saved files.
- Returns
Paths to the files saved.
- Return type
Tuple(str)
-
property
vocab_size
¶ Size of the base vocabulary (without the added tokens).
- Type
int