Intermediate-Layer Feature Interpreters

Grad-CAM

class interpretdl.GradCAMInterpreter(model: callable, device: str = 'gpu:0')[source]

Gradient CAM Interpreter.

Given a convolutional network and an image classification task, classification activation map (CAM) can be derived from the global average pooling and the last fully-connected layer, and show the important regions that affect model’s decisions.

GradCAM further looks at the gradients flowing into one of the convolutional layers to give weight to activation maps. Note that if there is a global average pooling layer in the network, GradCAM targeting the last layer is equivalent to CAM.

More details regarding the CAM method can be found in the original paper: https://arxiv.org/abs/1512.04150.

More details regarding the GradCAM method can be found in the original paper: https://arxiv.org/abs/1610.02391.

Parameters:
  • model (callable) – A model with forward() and possibly backward() functions.
  • device (str) – The device used for running model, options: "cpu", "gpu:0", "gpu:1" etc.
interpret(inputs: str, target_layer_name: str, label: list = None, resize_to: int = 224, crop_to: int = None, visual: bool = True, save_path: str = None) → numpy.ndarray[source]

The technical details of the GradCAM method are described as follows: GradCAM computes the feature map at the layer of target_layer_name and the gradient of the objective function w.r.t. target_layer_name. With the average of gradients along the spatial dimensions, gradients will be multiplied with feature map, following by a ReLU activation to produce the final explanation.

Parameters:
  • inputs (str or list of strs or numpy.ndarray) – The input image filepath or a list of filepaths or numpy array of read images.
  • target_layer_name (str) – The target layer to calculate gradients.
  • labels (list or tuple or numpy.ndarray, optional) – The target labels to analyze. The number of labels should be equal to the number of images. If None, the most likely label for each image will be used. Default: None.
  • resize_to (int, optional) – Images will be rescaled with the shorter edge being resize_to. Defaults to 224.
  • crop_to (int, optional) – After resize, images will be center cropped to a square image with the size crop_to. If None, no crop will be performed. Defaults to None.
  • visual (bool, optional) – Whether or not to visualize the processed image. Default: True.
  • save_path (str, optional) – The filepath(s) to save the processed image(s). If None, the image will not be saved. Default: None.
Returns:

interpretations/heatmap for images

Return type:

[numpy.ndarray]

Score CAM

class interpretdl.ScoreCAMInterpreter(model: callable, device: str = 'gpu:0')[source]

Score-CAM Interpreter.

ScoreCAMInterpreter bridges the gap between perturbation-based and CAM-based methods, and derives the weight of activation maps in an intuitively understandable way.

More details regarding the Score CAM method can be found in the original paper: https://arxiv.org/abs/1910.01279.

Parameters:
  • model (callable) – A model with forward() and possibly backward() functions.
  • device (str) – The device used for running model, options: "cpu", "gpu:0", "gpu:1" etc.
interpret(inputs: str, target_layer_name: str, labels: list = None, resize_to: int = 224, crop_to: int = None, visual: bool = True, save_path: str = None) → numpy.ndarray[source]

Main function of the interpreter.

(TODO) The technical details will be described later.

Parameters:
  • inputs (str or list of strs or numpy.ndarray) – The input image filepath or a list of filepaths or numpy array of read images.
  • target_layer_name (str) – The target layer to calculate gradients.
  • labels (list or tuple or numpy.ndarray, optional) – The target labels to analyze. The number of labels should be equal to the number of images. If None, the most likely label for each image will be used. Default: None.
  • resize_to (int, optional) – Images will be rescaled with the shorter edge being resize_to. Defaults to 224.
  • crop_to (int, optional) – After resize, images will be center cropped to a square image with the size crop_to. If None, no crop will be performed. Defaults to None.
  • visual (bool, optional) – Whether or not to visualize the processed image. Default: True.
  • save_path (str, optional) – The filepath(s) to save the processed image(s). If None, the image will not be saved. Default: None.
Returns:

interpretations/heatmap for images

Return type:

[np.ndarray]

Rollout

class interpretdl.RolloutInterpreter(model: callable, device: str = 'gpu:0')[source]

Rollout Interpreter.

This is a specific interpreter for Transformers models. RolloutInterpreter assumes that attentions can be linearly combined and the obtained score is able to show the scores of tokens, which gives an explanation of token importance.

More details regarding the Rollout method can be found in the original paper: https://arxiv.org/abs/2005.00928.

Parameters:
  • model (callable) – A model with forward() and possibly backward() functions.
  • device (str) – The device used for running model, options: "cpu", "gpu:0", "gpu:1" etc.
interpret(inputs: str, start_layer: int = 0, attention_layer_pattern: str = '^blocks.*.attn.attn_drop$', resize_to: int = 224, crop_to: int = None, visual: bool = True, save_path: str = None)[source]

Given inputs, RolloutInterpreter obtains all attention maps (of layers whose name matches attention_layer_pattern) and calculates their matrix multiplication. The start_layer controls the number of involved layers.

Parameters:
  • inputs (str or list of strs or numpy.ndarray) – The input image filepath or a list of filepaths or numpy array of read images.
  • start_layer (int, optional) – The index of the start layer involving the computation of attentions. Defaults to 0.
  • attention_layer_pattern (str, optional) – the string pattern to pick the layers that match the pattern. Defaults to ^blocks.*.attn.attn_drop$.
  • resize_to (int, optional) – Images will be rescaled with the shorter edge being resize_to. Defaults to 224.
  • crop_to (int, optional) – After resize, images will be center cropped to a square image with the size crop_to. If None, no crop will be performed. Defaults to None.
  • visual (bool, optional) – Whether or not to visualize the processed image. Default: True.
  • save_path (str, optional) – The filepath(s) to save the processed image(s). If None, the image will not be saved. Default: None.
Returns:

interpretations/heatmap for images.

Return type:

[np.ndarray]