RecconEmotionEntailmentPreprocessor¶

class RecconEmotionEntailmentPreprocessor(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None, max_length: int = 512)[source]¶

Class to initialise the Preprocessor for RecconEmotionEntailment model. Preprocesses inputs and tokenises them so they can be used with RecconEmotionEntailmentModel

Parameters

tokenizer (Optional[PreTrainedTokenizer], optional) – Tokenizer to use for preprocessor. Defaults to None.
max_length (int, optional) – maximum length of truncated tokens. Defaults to 512.

__call__(data_batch: Dict[str, List[str]]) → transformers.tokenization_utils_base.BatchEncoding[source]¶

Preprocess data then tokenize, so it can be used in RecconEmotionEntailmentModel

Parameters: data_batch (Dict[str, List[str]]) – The dictionary should contain the following keys ‘emotion’, ‘target_utterance’, ‘evidence_utterance’, and ‘conversation_history’. Each value should be a list of strings, with each list being of same length.
Returns: BatchEncoding instance returned from self.tokenizer
Return type: BatchEncoding