RstPreprocessor¶
-
class
RstPreprocessor
(tokenizer: Optional[transformers.tokenization_utils.PreTrainedTokenizer] = None)[source]¶ Class for preprocessing a list of raw texts to a batch of tensors.
-
__call__
(sentences: List[str])[source]¶ Main method to start preprocessing for RST.
- Parameters
sentences (List[str]) – list of input texts
- Returns
return a BatchEncoding instance with key ‘data_batch’ and embedded values of data batch. Also return a list of lengths of each text in the batch.
- Return type
Tuple[BatchEncoding, List[int]]
-