SenticGCNTokenizer

class SenticGCNTokenizer(vocab_file: Optional[str] = None, train_files: Optional[List[str]] = None, train_vocab: bool = False, do_lower_case: bool = True, unk_token: str = '<unk>', pad_token: str = '<pad>', **kwargs)[source]

The SenticGCN tokenizer class used for to generate tokens for the embedding model.

Parameters

text (str) – input text string to tokenize

Example::

tokenizer = SenticGCNTokenizer.from_pretrained(“senticgcn”) inputs = tokenizer(‘Hello World!’) inputs[‘input_ids’]

get_vocab()[source]

Returns the vocabulary as a dictionary of token to index.

tokenizer.get_vocab()[token] is equivalent to tokenizer.convert_tokens_to_ids(token) when token is in the vocab.

Returns

The vocabulary.

Return type

Dict[str, int]

save_vocabulary(save_directory: str, filename_prefix: Optional[str] = None)Tuple[str][source]

Save only the vocabulary of the tokenizer (vocabulary + added tokens).

This method won’t save the configuration and special token mappings of the tokenizer. Use [~PreTrainedTokenizerFast._save_pretrained] to save the whole state of the tokenizer.

Parameters
  • save_directory (str) – The directory in which to save the vocabulary.

  • filename_prefix (str, optional) – An optional prefix to add to the named of the saved files.

Returns

Paths to the files saved.

Return type

Tuple(str)

property vocab_size

Size of the base vocabulary (without the added tokens).

Type

int