RecconSpanExtractionModel¶

class RecconSpanExtractionModel(config)[source]¶

The Reccon Span Extraction Model with a span classification head on top for extractive question-answering tasks like SQuAD.

This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.

Parameters: config (RecconSpanExtractionConfig) – Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Use the from_pretrained method to load the model weights.

Example:

from sgnlp.models.span_extraction import RecconSpanExtractionConfig, RecconSpanExtractionTokenizer, RecconSpanExtractionModel, utils

# 1. load from default
config = RecconSpanExtractionConfig()
model = RecconSpanExtractionModel(config)

# 2. load from pretrained
config = RecconSpanExtractionConfig.from_pretrained("https://storage.googleapis.com/sgnlp/models/reccon_span_extraction/config.json")
model = RecconSpanExtractionModel.from_pretrained("https://storage.googleapis.com/sgnlp/models/reccon_span_extraction/pytorch_model.bin", config=config)

# Using model
tokenizer = RecconSpanExtractionTokenizer.from_pretrained("mrm8488/spanbert-finetuned-squadv2")
text = {
    'context': "Our company's wei-ya is tomorrow night ! It's your first Chinese New Year in Taiwan--you must be excited !",
    'qas': [{
        'id': 'dailydialog_tr_1097_utt_1_true_cause_utt_1_span_0',
        'is_impossible': False,
        'question': "The target utterance is Our company's wei-ya is tomorrow night ! It's your first Chinese New Year in Taiwan--you must be excited ! The evidence utterance is Our company's wei-ya is tomorrow night ! It's your first Chinese New Year in Taiwan--you must be excited ! What is the causal span from context that is relevant to the target utterance's emotion happiness ?",
        'answers': [{'text': "Our company's wei-ya is tomorrow night ! It's your first Chinese New Year in Taiwan",
        'answer_start': 0}]}]}
dataset, _, _ = utils.load_examples(text, tokenizer)
inputs = {"input_ids": dataset[0], "attention_mask": dataset[1], "token_type_ids": dataset[2]}
outputs = model(**inputs)

forward(**kwargs)[source]¶

The [BertForQuestionAnswering] forward method, overrides the __call__ special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the [Module] instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

</Tip>

Parameters

input_ids (torch.LongTensor of shape (batch_size, sequence_length)) –
Indices of input sequence tokens in the vocabulary.

Indices can be obtained using [BertTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

[What are input IDs?](../glossary#input-ids)
attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) –
Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
- 1 for tokens that are not masked,
- 0 for tokens that are masked.
[What are attention masks?](../glossary#attention-mask)
token_type_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) –
Segment token indices to indicate first and second portions of the inputs. Indices are selected in [0, 1]:
- 0 corresponds to a sentence A token,
- 1 corresponds to a sentence B token.
[What are token type IDs?](../glossary#token-type-ids)
position_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) –
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range [0, config.max_position_embeddings - 1].

[What are position IDs?](../glossary#position-ids)
head_mask (torch.FloatTensor of shape (num_heads,) or (num_layers, num_heads), optional) –
Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]:
- 1 indicates the head is not masked,
- 0 indicates the head is masked.
inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) – Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
output_attentions (bool, optional) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.
output_hidden_states (bool, optional) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.
return_dict (bool, optional) – Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.
start_positions (torch.LongTensor of shape (batch_size,), optional) – Labels for position (index) of the start of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.
end_positions (torch.LongTensor of shape (batch_size,), optional) – Labels for position (index) of the end of the labelled span for computing the token classification loss. Positions are clamped to the length of the sequence (sequence_length). Position outside of the sequence are not taken into account for computing the loss.

Returns

A [transformers.modeling_outputs.QuestionAnsweringModelOutput] or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration ([BertConfig]) and inputs.

loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.
start_logits (torch.FloatTensor of shape (batch_size, sequence_length)) – Span-start scores (before SoftMax).
end_logits (torch.FloatTensor of shape (batch_size, sequence_length)) – Span-end scores (before SoftMax).
hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Return type

[transformers.modeling_outputs.QuestionAnsweringModelOutput] or tuple(torch.FloatTensor)

Example:

```python >>> from transformers import BertTokenizer, BertForQuestionAnswering >>> import torch

>>> tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
>>> model = BertForQuestionAnswering.from_pretrained("bert-base-uncased")

>>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"

>>> inputs = tokenizer(question, text, return_tensors="pt")
>>> with torch.no_grad():
...     outputs = model(**inputs)

>>> answer_start_index = outputs.start_logits.argmax()
>>> answer_end_index = outputs.end_logits.argmax()

>>> predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
>>> tokenizer.decode(predict_answer_tokens)

```

```python >>> # target is “nice puppet” >>> target_start_index, target_end_index = torch.tensor([14]), torch.tensor([15])

>>> outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index)
>>> loss = outputs.loss
>>> round(loss.item(), 2)

```