RecconSpanExtractionPreprocessor¶
-
class
RecconSpanExtractionPreprocessor
(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None)[source]¶ Class to initialise the Preprocessor for RecconSpanExtraction model. Preprocesses inputs and tokenises them so they can be used with RecconSpanExtractionModel
- Parameters
tokenizer (Optional[PreTrainedTokenizer], optional) – Tokenizer to use for preprocessor. Defaults to None.
max_length (int, optional) – maximum length of truncated tokens. Defaults to 512.
-
__call__
(data_batch: Dict[str, List[str]]) → Tuple[transformers.tokenization_utils_base.BatchEncoding, List[Dict[str, Union[int, str]]], List[transformers.data.processors.squad.SquadExample], List[transformers.data.processors.squad.SquadFeatures]][source]¶ Preprocess data then tokenize, so it can be used in RecconSpanExtractionModel
- Parameters
data_batch (Dict[str, List[str]]) – The dictionary should contain the following keys ‘emotion’, ‘target_utterance’, ‘evidence_utterance’, and ‘conversation_history’. Each value should be a list of strings, with each list being of same length.
- Returns
BatchEncoding output from tokenizer
List of evidence utterances
List of SquadExample output from load_examples() function
List of SquadFeatures output from load_examples() function
- Return type
Tuple[ BatchEncoding, List[Dict[str, Union[int, str]]], List[SquadExample], List[SquadFeatures] ]