SenticGCNTokenizer¶

class SenticGCNTokenizer(vocab_file: Optional[str] = None, train_files: Optional[List[str]] = None, train_vocab: bool = False, do_lower_case: bool = True, unk_token: str = '<unk>', pad_token: str = '<pad>', **kwargs)[source]¶

The SenticGCN tokenizer class used for to generate tokens for the embedding model.

Parameters: text (str) – input text string to tokenize

Example::: tokenizer = SenticGCNTokenizer.from_pretrained(“senticgcn”) inputs = tokenizer(‘Hello World!’) inputs[‘input_ids’]

get_vocab()[source]¶

Returns the vocabulary as a dictionary of token to index.

tokenizer.get_vocab()[token] is equivalent to tokenizer.convert_tokens_to_ids(token) when token is in the vocab.

Returns: The vocabulary.
Return type: Dict[str, int]

save_vocabulary(save_directory: str, filename_prefix: Optional[str] = None) → Tuple[str][source]¶

Save only the vocabulary of the tokenizer (vocabulary + added tokens).

This method won’t save the configuration and special token mappings of the tokenizer. Use [~PreTrainedTokenizerFast._save_pretrained] to save the whole state of the tokenizer.

Parameters

save_directory (str) – The directory in which to save the vocabulary.
filename_prefix (str, optional) – An optional prefix to add to the named of the saved files.

Returns

Paths to the files saved.

Return type

Tuple(str)

property vocab_size¶

Size of the base vocabulary (without the added tokens).

Type: int