evaluate_squad

Official evaluation script for SQuAD version 2.0.

In addition to basic functionality, we also compute additional statistics and plot precision-recall curves if an additional na_prob.json file is provided. This file is expected to map question ID’s to the model’s predicted probability that a question is unanswerable.

Functions

apply_no_ans_threshold

compute_exact

compute_f1

find_all_best_thresh

find_best_thresh

get_raw_scores

get_tokens

histogram_na_prob

main

make_eval_dict

make_precision_recall_eval

make_qid_to_has_ans

merge_eval

normalize_answer

Lower text and remove punctuation, articles and extra whitespace.

parse_args

plot_pr_curve

run_precision_recall_analysis