Interpreter Trustworthiness Evaluation Metrics

Abstract Evaluator

class interpretdl.InterpreterEvaluator(model: callable, device: str = 'gpu:0', **kwargs)[source]

InterpreterEvaluator is the base abstract class for all interpreter evaluators. The core function evaluate should be implemented.

All evaluators aim to evaluate the trustworthiness of the interpretation algorithms. Besides theoretical verification of the algorithm, here the evaluators validate the trustworthiness by looking through the obtained explanations from the interpretation algorithms. Different evaluators are provided.

Parameters:
  • model (callable) – A model with forward() and possibly backward() functions. This is not always required if the model is not involved.
  • device (str) – The device used for running model, options: "cpu", "gpu:0", "gpu:1" etc. Again, this is not always required if the model is not involved.

DeletionInsertion

class interpretdl.DeletionInsertion(model: callable, device: str, compute_deletion: bool = True, compute_insertion: bool = True, **kwargs)[source]

Deletion & Insertion Interpreter Evaluation method.

The evaluation of interpretation algorithms follows the intuition that flipping the most salient pixels first should lead to high performance decay. Perturbation-based examples can therefore be used for the trustworthiness evaluations of interpretation algorithms.

The Deletion metric is computed as follows. The perturbation starts from an original image, perturbs (zeros out) the most important pixels in the input, and then computes the responses of the trained model. So that a curve, with ratios of perturbed pixels as x-axis and probabilities as y-axis, can be obtained and the area under this curve is the deletion score.

The Insertion metric is similar, but the perturbation starts from a zero image, inserts the most important pixels to the input, and then computes the responses of the trained model. A similar curve can be obtained and the area under this curve is the insertion score.

More details regarding the Deletion & Insertion method can be found in the original paper: https://arxiv.org/abs/1806.07421

Parameters:
  • model (callable) – A model with forward() and possibly backward() functions. This is not always required if the model is not involved.
  • device (str) – The device used for running model, options: "cpu", "gpu:0", "gpu:1" etc. Again, this is not always required if the model is not involved.
  • compute_deletion (bool, optional) – Whether compute deletion score. Defaults to True.
  • compute_insertion (bool, optional) – Whether compute insertion score. Defaults to True.
Raises:

ValueError – At least one of compute_deletion and compute_insertion must be True.

evaluate(img_path: str, explanation: dict, batch_size: int = None, resize_to: int = 224, crop_to: int = None, limit_number_generated_samples: int = None) → dict[source]

Given img_path, DeletionInsertion first generates perturbed samples of deletion and insertion, respectively, according to the order provided by explanation. The number of samples is defined by limit_number_generated_samples (a sampling is used for the numbers are different). Then DeletionInsertion computes the probabilities of these perturbed samples, and the mean of all probabilities of the class of interest is computed for the final score.

Note that LIME produces explanations based on superpixels, the number of perturbed samples is originally equal to the number of superpixels. So if limit_number_generated_samples is None, then the number of superpixels is used. For other explanations that produce the explanation of the same spatial dimension as the input image, limit_number_generated_samples is set to 20 if not given.

Parameters:
  • img_path (str) – a string for image path.
  • explanation (dict or np.ndarray) – the explanation result from an interpretation algorithm.
  • batch_size (int or None, optional) – batch size for each pass. Defaults to None.
  • resize_to (int, optional) – Images will be rescaled with the shorter edge being resize_to. Defaults to 224.
  • crop_to (int, optional) – After resize, images will be center cropped to a square image with the size crop_to. If None, no crop will be performed. Defaults to None.
  • limit_number_generated_samples (int or None, optional) – a maximum value for samples of perturbation. If None, it will be automatically chosen. The number of superpixels is used for LIME explanations, otherwise, 20 is to be set. Defaults to None.
Returns:

A dict containing 'deletion_score', 'del_probas', 'deletion_images', 'insertion_score', 'ins_probas' and 'insertion_images', if compute_deletion and compute_insertion are both True.

Return type:

dict

Perturbation

class interpretdl.Perturbation(model: callable, device: str = 'gpu:0', compute_MoRF: bool = True, compute_LeRF: bool = True, **kwargs)[source]

Perturbation based Evaluations.

The evaluation of interpretation algorithms follows the intuition that flipping the most salient pixels first should lead to high performance decay. Perturbation-based examples can therefore be used for the trustworthiness evaluations of interpretation algorithms.

Two metrics are provided: most relevant first (MoRF) and least relevant first (LeRF).

The MoRF metric is computed as follows. The perturbation starts from an original image, perturbs (zeros out) the most important pixels in the input, and then computes the responses of the trained model. So that a curve, with ratios of perturbed pixels as x-axis and probabilities as y-axis, can be obtained and the area under this curve is the MoRF score.

The LeRF metric is similar, but the perturbation perturbs (zeros out) the least important pixels in the input and then computes the responses of the trained model. A similar curve can be obtained and the area under this curve is the LeRF score.

Note that MoRF is equivalent to Deletion, but LeRF is NOT equivalent to Insertion.

More details of MoRF and LeRF can be found in the original paper: https://arxiv.org/abs/1509.06321.

_summary_

Parameters:
  • model (callable) – A model with forward() and possibly backward() functions. This is not always required if the model is not involved.
  • device (str) – The device used for running model, options: "cpu", "gpu:0", "gpu:1" etc. Again, this is not always required if the model is not involved.
  • compute_MoRF (bool, optional) – Whether comptue MoRF score. Defaults to True.
  • compute_LeRF (bool, optional) – Whether comptue LeRF score. Defaults to True.
Raises:

ValueError – ‘At least one of compute_MoRF and compute_LeRF must be True.’

evaluate(img_path: str, explanation: list, batch_size=None, resize_to=224, crop_to=None, limit_number_generated_samples=None) → dict[source]

Given img_path, Perturbation first generates perturbed samples of MoRF and LeRF respectively, according to the order provided by explanation. The number of samples is defined by limit_number_generated_samples (a sampling is used for the numbers are different). Then Perturbation computes the probabilities of these perturbed samples, and the mean of all probabilities of the class of interest is computed for the final score.

Note that LIME produces explanations based on superpixels, the number of perturbed samples is originally equal to the number of superpixels. So if limit_number_generated_samples is None, then the number of superpixels is used. For other explanations that produce the explanation of the same spatial dimension as the input image, limit_number_generated_samples is set to 20 if not given.

Parameters:
  • img_path (str) – a string for image path.
  • explanation (list or np.ndarray) – the explanation result from an interpretation algorithm.
  • batch_size (int or None, optional) – batch size for each pass. Defaults to None.
  • resize_to (int, optional) – Images will be rescaled with the shorter edge being resize_to. Defaults to 224.
  • crop_to (int, optional) – After resize, images will be center cropped to a square image with the size crop_to. If None, no crop will be performed. Defaults to None.
  • limit_number_generated_samples (int or None, optional) – a maximum value for samples of perturbation. If None, it will be automatically chosen. The number of superpixels is used for LIME explanations, otherwise, 20 is to be set. Defaults to None.
Returns:

A dict containing 'MoRF_score', 'MoRF_probas', 'MoRF_images', 'LeRF_score', 'LeRF_probas' and 'LeRF_images', if compute_MoRF and compute_LeRF are both True.

Return type:

dict

Infidelity

class interpretdl.Infidelity(model: callable, device: str = 'gpu:0', **kwargs)[source]

Infidelity Interpreter Evaluation method.

The idea of fidelity is similar to the faithfulness evaluation, to evaluate how faithful/reliable/loyal of the explanations to the model. (In)fidelity measures the normalized squared Euclidean distance between two terms: the product of a perturbation and the explanation, and the difference between the model’s response to the original input and the one to the perturbed input, i.e.

\[INFD(\Phi, f, x) = \mathbb{E}_{I \sim \mu_I} [ (I^T \Phi(f, x) - (f(x) - f(x - I)) )^2 ],\]

where the meaning of the symbols can be found in the original paper.

A normalization is added, which is not in the paper but in the official implementation:

\[\beta = \frac{ \mathbb{E}_{I \sim \mu_I} [ I^T \Phi(f, x) (f(x) - f(x - I)) ] }{ \mathbb{E}_{I \sim \mu_I} [ (I^T \Phi(f, x))^2 ] }\]

Intuitively, given a perturbation, e.g., a perturbation on important pixels, the product (the former term) should be relatively large if the explanation indicates the important pixels too, compared to a perturbation on irrelavant pixels; while the difference (the latter term) should also be large because the model depends on important pixels to make decisions. Like this, large values would be offset by large values if the explanation is faithful to the model. Otherwise, for uniform explanations (all being constant), the former term would be a constant value and the infidelity would become large.

More details about the measure can be found in the original paper: https://arxiv.org/abs/1901.09392.

Parameters:
  • model (callable) – _description_
  • device (_type_, optional) – _description_. Defaults to ‘gpu:0’.
_build_predict_fn(rebuild: bool = False)[source]

Different from InterpreterEvaluator._build_predict_fn(): using logits.

Parameters:rebuild (bool, optional) – _description_. Defaults to False.
Returns:_description_
Return type:_type_
evaluate(img_path: str, explanation: numpy.ndarray, recompute: bool = False, batch_size: int = 50, resize_to: int = 224, crop_to: int = None)[source]

Given img_path, Infidelity first generates perturbed samples, with a square removal strategy on the original image. Since the difference (the second term in the infidelity formula) is independ of the explanation, so we temporaily save these results in case this image has other explanations for evaluations.

Then, given explanation, we follow the formula to compute the infidelity. A normalization is added, which is not in the paper but in the official implementation.

Parameters:
  • img_path (strornp.ndarray) – a string for image path.
  • explanation (np.ndarray) – the explanation result from an interpretation algorithm.
  • recompute (bool, optional) – whether forcing to recompute. Defaults to False.
  • batch_size (int, optional) – batch size for each pass.. Defaults to 50.
  • resize_to (int, optional) – Images will be rescaled with the shorter edge being resize_to. Defaults to 224.
  • crop_to (int, optional) – After resize, images will be center cropped to a square image with the size crop_to. If None, no crop will be performed. Defaults to None.
Returns:

the infidelity score.

Return type:

int