Base Interpreter

Abstract Interpreter

class interpretdl.Interpreter(model: callable, device: str, **kwargs)[source]

Interpreter is the base abstract class for all Interpreters. The implementation of any Interpreter should at least

(1) prepare predict_fn that outputs probability predictions, gradients or other desired intermediate results of the model, and

(2) implement the core function interpret() of the interpretation algorithm.

In general, we find this implementation is practical, makes the code more readable and can highlight the core function of the interpretation algorithm.

This kind of implementation works for all post-poc interpretation algorithms. While some algorithms may have different features and other fashions of implementations may be more suitable for them, our style of implementation can still work for most of them. So we follow this design for all Interpreters in this library.

Three sub-abstract Interpreters that implement _build_predict_fn() are currently provided in this file: InputGradientInterpreter, InputOutputInterpreter, IntermediateLayerInterpreter. For each of them, the implemented predict_fn can be used by several different algorithms. Therefore, the further implementations can focus on the core algorithm. More sub-abstract Interpreters will be provided if necessary.

Warning

use_cuda would be deprecated soon. Use device directly.

Parameters:
  • model (callable) – A model with forward() and possibly backward() functions.
  • device (str) – The device used for running model, options: "cpu", "gpu:0", "gpu:1" etc.
_build_predict_fn(**kwargs)[source]

Build predict_fn for interpreters. This will be called by interpret().

_env_setup()[source]

Prepare the environment setup. This is not always necessary because the setup can be done within the function of _build_predict_fn().

interpret(**kwargs)[source]

Main function of the interpreter.

Sub-abstract: Input Gradient Interpreter

class interpretdl.InputGradientInterpreter(model: callable, device: str, **kwargs)[source]

This is one of the sub-abstract Interpreters.

InputGradientInterpreter are used by input gradient based Interpreters. Interpreters that are derived from InputGradientInterpreter include GradShapCVInterpreter, IntGradCVInterpreter, SmoothGradInterpreter.

This Interpreter implements _build_predict_fn() that returns input gradient given an input.

Parameters:
  • model (callable) – A model with forward() and possibly backward() functions.
  • device (str) – The device used for running model, options: "cpu", "gpu:0", "gpu:1" etc.
_build_predict_fn(rebuild: bool = False, gradient_of: str = 'probability')[source]

Build predict_fn for input gradients based algorithms. The model is supposed to be a classification model.

Parameters:
  • rebuild (bool, optional) – forces to rebuild. Defaults to False.
  • gradient_of (str, optional) – computes the gradient of ["loss", "logit" or "probability"] w.r.t. input data. Defaults to "probability". Other options can get similar results while the absolute scale might be different.

Sub-abstract: Input Output Interpreter

class interpretdl.InputOutputInterpreter(model: callable, device: str, **kwargs)[source]

This is one of the sub-abstract Interpreters.

InputOutputInterpreter are used by input-output correlation based Interpreters. Interpreters that are derived from InputOutputInterpreter include OcclusionInterpreter, LIMECVInterpreter, SmoothGradInterpreter.

This Interpreter implements _build_predict_fn() that returns the model’s prediction given an input.

Parameters:
  • model (callable) – A model with forward() and possibly backward() functions.
  • device (str) – The device used for running model, options: "cpu", "gpu:0", "gpu:1" etc.
_build_predict_fn(rebuild: bool = False, output: str = 'probability')[source]

Build predict_fn for Input-Output based algorithms. The model is supposed to be a classification model.

Parameters:
  • rebuild (bool, optional) – forces to rebuild. Defaults to False.
  • output (str, optional) – computes the logit or probability. Defaults: "probability". Other options can get similar results while the absolute scale might be different.

Sub-abstract: Intermediate-Layer Interpreter

class interpretdl.IntermediateLayerInterpreter(model: callable, device: str, **kwargs)[source]

This is one of the sub-abstract Interpreters.

IntermediateLayerInterpreter exhibits features from intermediate layers to produce explanations. This interpreter extracts intermediate layers’ features, but no gradients involved. Interpreters that are derived from IntermediateLayerInterpreter include RolloutInterpreter, ScoreCAMInterpreter.

This Interpreter implements _build_predict_fn() that returns the model’s intermediate outputs given an input.

Parameters:
  • model (callable) – A model with forward() and possibly backward() functions.
  • device (str) – The device used for running model, options: "cpu", "gpu:0", "gpu:1" etc.
_build_predict_fn(rebuild: bool = False, target_layer: str = None, target_layer_pattern: str = None)[source]

Build predict_fn for IntermediateLayer based algorithms. The model is supposed to be a classification model. target_layer and target_layer_pattern cannot be set at the same time. See the arguments below.

Parameters:
  • rebuild (bool, optional) – forces to rebuild. Defaults to False.
  • target_layer (str, optional) – the name of the desired layer whose features will output. This is used when there is only one layer to output. Conflict with target_layer_pattern. Defaults to None.
  • target_layer_pattern (str, optional) – the pattern name of the layers whose features will output. This is used when there are several layers to output and they share a common pattern name. Conflict with target_layer. Defaults to None.

Sub-abstract: Transformer Interpreter

class interpretdl.TransformerInterpreter(model: callable, device: str, **kwargs)[source]

This is one of the sub-abstract Interpreters.

TransformerNLPInterpreter are used by Interpreters for Transformer based model. Interpreters that are derived from TransformerNLPInterpreter include BTNLPInterpreter, GANLPInterpreter.

This Interpreter implements _build_predict_fn() that returns servral variables and gradients in each layer.

Parameters:
  • model (callable) – A model with forward() and possibly backward() functions.
  • device (str) – The device used for running model, options: "cpu", "gpu:0", "gpu:1" etc.
_build_predict_fn(rebuild: bool = False, embedding_name: str = None, attn_map_name: str = None, attn_v_name: str = None, attn_proj_name: str = None, gradient_of: str = None)[source]

Build predict_fn for transformer based algorithms. The model is supposed to be a classification model.

Parameters:
  • rebuild (bool, optional) – forces to rebuild. Defaults to False.
  • embedding_name (str, optional) – the layer name for embedding, if in need.
  • attn_map_name (str, optional) – the layer name for attention weights, if in need.
  • attn_v_name (str, optional) – the layer name for attention value.
  • attn_proj_name (str, optional) – the layer name for attention projection, if in need.
  • nlp (bool, default to False) – whether the input data is for language test.