SenticGCNBertPreprocessor¶

class SenticGCNBertPreprocessor(tokenizer: Union[str, transformers.tokenization_utils.PreTrainedTokenizer] = 'bert-base-uncased', embedding_model: Union[str, transformers.modeling_utils.PreTrainedModel] = 'bert-base-uncased', config_filename: str = 'config.json', model_filename: str = 'pytorch_model.bin', spacy_pipeline: str = 'en_core_web_sm', senticnet: Union[str, Dict[str, float]] = 'https://storage.googleapis.com/sgnlp/models/sentic_gcn/senticnet.pickle', max_len: int = 85, device: str = 'cpu')[source]¶

Class for preprocessing sentence(s) and its aspect(s) to a batch of tensors for the SenticGCNBertModel to predict on.

__call__(data_batch: List[Dict[str, Union[str, List[str]]]]) → Tuple[List[sgnlp.models.sentic_gcn.preprocess.SenticGCNBertData], List[torch.Tensor]][source]¶

Method to generate list of input tensors from a list of sentences and their accompanying list of aspect.

Parameters

data_batch (List[Dict[str, Union[str, List[str]]]]) – list of dictionaries with 2 keys, ‘sentence’ and ‘aspect’. ‘sentence’ value are strings and ‘aspect’ value is a list of accompanying aspect.

Returns

return a list of ordered tensors for ‘text_indices’,: ’aspect_indices’, ‘left_indices’, ‘text_embeddings’ and ‘sdat_graph’.

Return type

Tuple[List[SenticGCNData], List[torch.Tensor]]