In lab meeting this week we discussed Automatic Differentiation in Machine Learning: a Survey, which addresses the general technique of autodifferentiation for functions expressed in computer programs.
The paper initially outlines three main approaches to the computation of derivatives in computer programs: 1) manually working out derivates by hand and coding the result. 2) using numerical approximations to derivates in the form of assessing function values at small discrete steps, and 3) computer manipulation of symbolic mathematic expressions that automatically generates differential expressions via standard calculus rules. The limitations of approach 1 are that derivatives can involve the calculation of a large number of terms (expression swell) that are both manually cumbersome to deal with and lend themselves to easy algebraic mistakes. The disadvantage of approach 2 is a finite difference approach to numerical differentiation inevitably involves round-off errors and is sensitive to step-size. The disadvantage to 3 is that this approach is computationally cumbersome and often standard calculus rules involve forms of derivates that still need to be manually reduced. In contrast to these methods, the paper introduces autodifferentiation as a set of techniques that operates at the elementary computation level via step-wise implementation of the chain rule. The assessment of a function in a computer program involves an execution of procedures that can be represented via a computational graph where each node in the graph is an intermediary temporary variable. Computers execute this trace of elementary operations via intermediary variables and this procedure can be utilized efficiently to additionally concurrently calculate a derivative. The paper outlines two general approaches to autodifferentiation: forward mode and backward mode. In forward mode, a derivative trace is evaluated alongside the computers execution of the functional trace, which intermediary values from the functional evaluation trace are used in concert with the derivative trace calculation. This execution can be conveniently realized by extending the representation of the real numbers to include a dual component. Analogous to imaginary numbers, this dual component is denoted with an In executing this forward mode autodifferentiation, a derivative trace is initially ‘seeded’ with an input derivative vector as a function is evaluated with an initial input. Whereas symbolic differentiation can be thought of as the mapping of Backward mode, by contrast, computes |