Table of Contents Introduction Types of Differentiation Forward Mode AD Reverse Mode AD Basic Implementation in Python Matrix Implementation Comparison with PyTorch Introduction These notes largely follow the survey presented by (Baydin et al. 2018). I have added a few examples to clarify the matrix algebra as well as a lead in to a practical implemenation.
Automatic differentiation is a method for computing the derivatives of functions in a modular way using the chain rule of calculus. It is used in many deep learning frameworks such as PyTorch and Tensorflow. Consider a complex series of functions that together work to yield some useful input, such as that of a deep learning model. Traditionally, the parameters of such a model would be optimized through gradient descent. This requires that the derivatives with respect to the parameters are implemented for every function used in the model.
Introduction Let’s take a simple constrained problem (from Nocedal and Wright).
\begin{align*} \min \quad & x_1 + x_2\\ \textrm{s.t.} \quad & x_1^2 + x_2^2 - 2 = 0 \end{align*}
The set of possible solutions to this problem lie on the boundary of the circle defined by the constraint:
Figure 1: Source: Nocedal and Wright If we let \(g(\mathbf{x}) = x_1^2 + x_2^2 - 2\), then the gradient vector is \((2x_1, 2x_2)\)