RecconSpanExtractionTokenizer

class RecconSpanExtractionTokenizer(vocab_file: str, do_lower_case: bool = False, **kwargs)[source]

Constructs a Reccon Span Extraction tokenizer, derived from the Bert tokenizer.

Parameters
  • vocab_file (str) – Path to the vocabulary file.

  • do_lower_case (bool, defaults to False) – Whether or not to lowercase the input when tokenizing.

Example:

from sg_nlp import RecconSpanExtractionTokenizer

tokenizer = RecconSpanExtractionTokenizer.from_pretrained("mrm8488/spanbert-finetuned-squadv2")
text = "Our company's wei-ya is tomorrow night ! It's your first Chinese New Year in Taiwan--you must be excited !"
inputs = tokenizer(text, return_tensors="pt")