Model Interpretability Evaluation Metrics

PointGame

class interpretdl.PointGame[source]

Pointing Game Evaluation Method.

This evaluator assumes that the explanation result should align with the visual objects. Based on this idea, the evaluation is to compute the alignment between the bounding box or semantic segmentation with the explanations.

PointGame computes the alignment to the bounding box. PointGameSegmentation computes the alignment to the semantic segmentation.

More details can be found in the original paper: https://arxiv.org/abs/1608.00507.

Note that the bounding box of annotations is required for the evaluation. This method does not need models. For API compatibility, we implement it within the same functions as other evaluators.

evaluate(bbox: tuple, exp_array: numpy.ndarray, threshold=0.25) → dict[source]

Since the explanation is actually a ranking order, PointGame computes two categories of measures. One is based on thresholding. Here, threshold * max(exp_array) is used as the threshold. Based on this, precision, recall and F1 score are computed, w.r.t. bbox. Another measure does not depend on the threshold. Here, the ROC AUC score and the Average Precision (both of them are imported from sklearn.metrics) are computed.

Parameters:
  • bbox (tuple) – A tuple of four integers: (x1, y1, x2, y2), where (x1, y1) is the coordinates of the top-left point w.r.t. width and height respectively; (x2, y2) is the coordinates of the bottom-right point w.r.t. width and height respectively;
  • exp_array (np.ndarray) – the explanation result from an interpretation algorithm.
  • threshold (float, optional) – threshold for computing precision, recall and F1 score. Defaults to 0.25.
Returns:

A dict containing precision, recall, f1_score and auc_score, ap_score, where the first three depend on the threshold and the last two do not.

Return type:

dict

PointGameSegmentation

class interpretdl.PointGameSegmentation[source]

Pointing Game Evaluation Method using Segmentation.

This evaluator assumes that the explanation result should align with the visual objects. Based on this idea, the evaluation is to compute the alignment between the bounding box or semantic segmentation with the explanations.

PointGame computes the alignment to the bounding box. PointGameSegmentation computes the alignment to the semantic segmentation.

More details can be found in the original paper: https://arxiv.org/abs/1608.00507.

Note that the semantic segmentation is required for the evaluation. This method does not need models. For API compatibility, we implement it within the same functions as other evaluators.

evaluate(seg_gt: numpy.ndarray, exp_array: numpy.ndarray, threshold=0.25) → dict[source]

Since the explanation is actually a ranking order, PointGameSegmentation computes two categories of measures. One is based on thresholding. Here, threshold * max(exp_array) is used as the threshold. Based on this, precision, recall and F1 score are computed, w.r.t. seg_gt. Another measure does not depend on the threshold. Here, the ROC AUC score and the Average Precision (both of them are imported from sklearn.metrics) are computed.

Parameters:
  • seg_gt (np.ndarray) – binary values are supported only currently.
  • exp_array (np.ndarray) – the explanation result from an interpretation algorithm.
  • threshold (float, optional) – threshold for computing precision, recall and F1 score. Defaults to 0.25.
Returns:

A dict containing precision, recall, f1_score and auc_score, ap_score, where the first three depend on the threshold and the last two do not.

Return type:

dict