Neural Networks as Decision Trees

More or less obvious transpositions

Photo by Larisa Birta

This post is inspired by a recent article (which we will not cover) stating that neural networks are decision trees. Clearly it is not the only article to address the topic. Paying too much attention to certain articles — stating that neural networks are decision trees, compositions of splines, kernel machines — one may end up believing that neural networks are equivalent to any ML construct one chooses to name…

ReLU activations naturally determine tree structures

The following simple argument is from the paper Towards Interpretable ANNs: An Exact Transformation to Multi-Class Multivariate Decision Trees by Nguyena, Kasmarika and Abbass. Consider a feed-forward neural network whose hidden layers are activated by ReLU function. Fix a hidden layer, say the k-th — we think of it as assigned just to omit the index k. The index j refers to this layer, the index i refers to the preceding layer (the (k-1)-th layer). Denote with zj the value of the hidden node j in layer k before the activation:

\displaystyle z_j = \sum_{i= 1}^{I} w_{i\, j} \, H_{i} + b_ j \,.

The H values are the activations from the preceding layer (inputs to k-th layer) and bj is a bias term. The post-activation values h may coincide with the z values or not (in this case ReLU activation returns 0). The possibilities are depicted in the following figure.

Due to the nature of ReLU activation, the output of a node after applying ReLU activation is either 0 or the same value to the input to that node, prior activation (that is, hj = zj). Thus, it is easy to see that each hidden layer of the neural network can be transformed into a binary decision tree. Decision at each tree stage is made by the activation of the corresponding node in the hidden layer based on the constraint of whether or not the value before the activation function is greater than 0.

As for explainability, it is clear that the size of the tree grows exponentially as the network size grows; it’s like going from one black box to another.

C-Net

There is a method for generating multivariate decision trees (MDTs) from neural networks. We present the first C-Net architecture (there is a new version which we will not cover). The procedure is the following. After the neural network is trained, new data is introduced and the outputs of the last hidden layer are computed. In other words, from a set of training and test data, denoted with <Xt, Yt> and <XT, YT> respectively, we can compute the mapping between the last hidden output layer and the output, denoted as <Ht, Yt> and <HT, YT>. We retain these two sets, representing the relationship between the last hidden layer and the output layer, for the next stage in which they are used to train a Quinlan C5 univariate decision tree (UDT) whose algorithm adapts an entropic information gain ratio for branch-splitting criterion. After that, we know that a decision tree can be represented by a set of polyhedrons expressed in the form of linear constraints. These constraints have the form Hj(Xt) op Cj , where op represents the binary operators {≤, <, =, >, ≥}, and Cj is the numeric threshold of such a constraint on input Hj . To obtain a multivariate for of the expression, a back-projection from the output of the neural network to the input of the neural network is needed.

The algorithm is the following.

Useful links

Neural Networks are Decision Trees
C.Aytekin
arXiv:2210.05189 [cs.LG], 2022.

Towards Interpretable ANNs: An Exact Transformation to Multi-Class Multivariate Decision Trees
D. T. Nguyen, K. E. Kasmarik, H. A. Abbass
arXiv:2003.04675 [cs.LG], 2020.

C-Net: A Method for Generating Non-deterministic and Dynamic Multivariate Decision Trees
H. A. Abbass, M. Towsey, G. D. Finn
Knowledge and Information SystemsVolume, 3 Issue, pp. 184–197, 2001 (link).

Rectifier (ReLu activation) – Wikipedia entry.

The illusion of learning (link).

Explainable AI – Wikipedia entry.

Leave a comment