Metrics #

Evaluating feature based local explanations #

Let \(f\) be a black box predictor that maps an input \(\mathbf{x} \in \mathbb{R}^d\) to an output \(f(\mathbf{x}) \in \mathbb{R}\) .

An explanation function \(g\) takes in a predictor \(f\) and an instances \(\mathbf{x}\) and returns the feature importance scores \(g(f,\mathbf{x}) \in \mathbb{R}^d\) .

Let \(\rho: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}^{+}\) be a distance metric over input instances.

Let \(D: \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}^{+}\) be a distance metric over explanations.

An evaluation metric \(\mu\) takes in as input a predictor \(f\) ,an explanation fucntion \(g\) , and input \(\mathbf{x}\) , and outputs a scalar \(\mu(f,g;\mathbf{x})\) .

We wil mainly focus on these threes evaluation metrics which can be evaluated without access to ground-truth explanations¹.

Faithfulness #

(high) faithfulness,relevance,fidelity.

The feature importance scores from \(g\) should correspond to the importance features of \(\mathbf{x}\) for \(f\) , such that, when we set a particular set of features \(\mathbf{x}_s\) to a baseline value \(\overline{\mathbf{x}}_s\) , the change in predictor’s output should be proportional (measured via correlation) to the sum of the attribution scores of features in \(\mathbf{x}_s\) .

For a subset of indices \(S \subset {1,2,...,d}\) , let \(\mathbf{x}_s = ( \mathbf{x}_i,i \in S )\) a sub-vector of input features. For a given subset size \(|S|\) , we define faithfullness as

\( \mu_{F}(f,g,|S|;\mathbf{x}) = \text{corr}_{S \in \binom {d}{|S|}}\left( \sum_{i \in S}g(f,\mathbf{x})_{i},f(\mathbf{x})-f(\mathbf{x}|\mathbf{x}_s=\overline{\mathbf{x}}_s)\right) \)

The baseline can be the mean of the training data.

Sensitivity #

(low) sensitivity, stability, reliability, explanation continuity.

If inputs are near each other and their model outputs are similar, then their explanations should be close to each other.

Let \(\mathcal{N}_r(\mathbf{x})\) be a neighborhood of datapoints within a radius \(r\) of \(\mathbf{x}\) .

\( \mathcal{N}_r(\mathbf{x}) = \left\{ \mathbf{z} \in \mathcal{D}_x | \rho(\mathbf{x},\mathbf{z}) \leq r, f(\mathbf{x}) = f(\mathbf{z}) \right\} \)

Max Sensitivity

\( \mu_{M}(f,g,r;\mathbf{x}) = \max_{z\in\mathcal{N}_r(\mathbf{x})} D(g(f,\mathbf{x}),g(f,\mathbf{z})) \)

Average Sensitivity

\( \mu_{A}(f,g,r;\mathbf{x}) = \int_{\mathcal{N}_r(\mathbf{x})} D(g(f,\mathbf{x}),g(f,\mathbf{z})) \mathbb{P}_{\mathbf{x}}(\mathbf{z}) d\mathbf{z} \)

Complexity #

(low) complexity,information gain,sparsity.

A complex explantion is one that uses all the \(d\) features in its explanation. The simplest explanation would be concentrated on one feature.

We define complexity as the entropy of the fractional contribution distribution.

\( \mu_{C}(f,g;\mathbf{x}) = \mathbb{E}_{i}\left[ -\ln(\mathbb{P}_{g})\right] = - \sum_{i=1}^{d} \mathbb{P}_{g}(i) \ln(\mathbb{P}_{g}(i)) \)

where \(\mathbb{P}_{g}\) is the fractional contribution distribution

\( \mathbb{P}_{g}(i) = \frac{|g(f,\mathbf{x})_i|}{\sum_{j=1}^{d}|g(f,\mathbf{x})_j|}. \)

References #

Evaluating and Aggregating Feature-based Model Explanations, Bhatt, Umang and Weller, Adrian and Moura, José M. F., Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), 2020. ↩︎