S      !"#$%&'()*+,-./01234 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J KLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno p q r s t u v w x y z { | } ~        !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~    !(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlySafe*+4KLN3Used to sidestep the need for UndecidableInstances.A C is a tower of all (higher order) partial derivatives of a functionAt each step, a  f& is wrapped in another layer worth of f. )a :- f a :- f (f a) :- f (f (f a)) :- ...Take the tail of a .Take the head of a . Construct a  by unzipping the layers of a  Comonad.  (c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone34579>CILNjallowed to return False for items with a zero derivative, but we'll give more NaNs than strictly necessaryUallowed to return False for zero, but we give more NaN's than strictly necessary then Embed a constant Scalar-vector multiplication Vector-scalar multiplication Scalar division   = lift 0        (c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone +3457>CL (c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone134579>CILNC is useful for defining new AD primitives in a fairly generic way.!(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNoneZip a  f with a  g assuming f! has at least as many entries as g.Zip a  f with a  g assuming f, using a default value after f is exhausted."Used internally to define various  combinators."Used internally to define various  combinators.(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone +3457>ILNTower is an AD B that calculates a tangent tower by forward AD, and provides fast diffsUU, diffsUF" !"#$%&'()* !"#$%&'()* !"#$'%&()*  !"#$%&'()*(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone+3457>ILN+,-./012+,-./012+,-./012+,-./012 None &3457>L3456789:;<=> 3456789:;<=> 3456879:;<=>3456789:;<=> (c) Edward Kmett 2015BSD3ekmett@gmail.com experimentalGHC onlyNone !"+4>L?>The choice between two AD modes is an AD mode in its own right?@ABCDEFGHIJ ?@ABCDEFGHIJ ?@AEDFGBCHIJ?@ABCDEFGHIJNoneKCCompute the directional derivative of a function given a zipped up * of the input values and their derivativesLNCompute the answer and directional derivative of a function given a zipped up * of the input values and their derivativesMMCompute a vector of directional derivatives for a function given a zipped up + of the input values and their derivatives.NYCompute a vector of answers and directional derivatives for a function given a zipped up + of the input values and their derivatives.OThe OY function calculates the first derivative of a scalar-to-scalar function by forward-mode AD diff sin 01.0PThe PU function calculates the result and first derivative of scalar-to-scalar function by Forward mode AD P  ==  "#  P f = f "# d f  diff' sin 0 (0.0,1.0) diff' exp 0 (1.0,1.0)QThe QN function calculates the first derivatives of scalar-to-nonscalar function by Forward mode ADdiffF (\a -> [sin a, cos a]) 0 [1.0,-0.0]RThe R\ function calculates the result and first derivatives of a scalar-to-non-scalar function by Forward mode ADdiffF' (\a -> [sin a, cos a]) 0[(0.0,1.0),(1.0,-0.0)]SBA fast, simple, transposed Jacobian computed with forward-mode AD.T2A fast, simple, transposed Jacobian computed with Forward mode AD) that combines the output with the input.UCompute the Jacobian using Forward mode AD%. This must transpose the result, so S( is faster and allows more result types.7jacobian (\[x,y] -> [y,x,x+y,x*y,exp x * sin y]) [pi,1]_[[0.0,1.0],[1.0,0.0],[1.0,1.0],[1.0,3.141592653589793],[19.472221418841606,12.502969588876512]]VCompute the Jacobian using Forward mode ADK and combine the output with the input. This must transpose the result, so T) is faster, and allows more result types.WCompute the Jacobian using Forward mode AD along with the actual answer.XCompute the Jacobian using Forward mode ADW combined with the input using a user specified function, along with the actual answer.Y9Compute the gradient of a function using forward mode AD.Note, this performs O(n) worse than $% for n2 inputs, in exchange for better space utilization.ZDCompute the gradient and answer to a function using forward mode AD.Note, this performs O(n) worse than $& for n2 inputs, in exchange for better space utilization.[Compute the gradient of a function using forward mode AD and combine the result with the input using a user-specified function.Note, this performs O(n) worse than $' for n2 inputs, in exchange for better space utilization.\Compute the gradient of a function using forward mode AD and the answer, and combine the result with the input using a user-specified function.Note, this performs O(n) worse than $( for n2 inputs, in exchange for better space utilization.gradWith' (,) sum [0..4]:(10.0,[(0.0,1.0),(1.0,1.0),(2.0,1.0),(3.0,1.0),(4.0,1.0)])KLMNOPQRSTUVWXYZ[\3KLMNOPQRSTUVWXYZ[\3YZ[\UWVXSTOPQRKLMNKLMNOPQRSTUVWXYZ[\(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNoneIN]5Compute the answer and all derivatives of a function (a -> a)^2Compute the zero-padded derivatives of a function (a -> a)_5Compute the answer and all derivatives of a function  (a -> f a)`2Compute the zero-padded derivatives of a function  (a -> f a)a taylor f x compute the Taylor series of f around x.b taylor0 f x compute the Taylor series of f around x, zero-padded.c maclaurin f! compute the Maclaurin series of fd maclaurin f! compute the Maclaurin series of f , zero-paddede+Compute the first derivative of a function (a -> a)f6Compute the answer and first derivative of a function (a -> a)g/Compute a directional derivative of a function  (f a -> a)h>Compute the answer and a directional derivative of a function  (f a -> a)i/Compute a directional derivative of a function  (f a -> g a)j>Compute the answer and a directional derivative of a function  (f a -> g a)kGiven a function  (f a -> a)P, and a tower of derivatives, compute the corresponding directional derivatives.lGiven a function  (f a -> a)\, and a tower of derivatives, compute the corresponding directional derivatives, zero-paddedmGiven a function  (f a -> g a)O, and a tower of derivatives, compute the corresponding directional derivativesnGiven a function  (f a -> g a)\, and a tower of derivatives, compute the corresponding directional derivatives, zero-padded]^_`abcdefghijklmn ]^_`abcdefghijklmn abcdef]^_`ghklijmn]^_`abcdefghijklmn None+>CLopqopqopqopq (c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNoneINrstuvwxyz{|}~ orstuvwxyz{|}~o vwxyz{rstu|}~rstuvwxyz{|}~ (c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNoneINCCompute the directional derivative of a function given a zipped up * of the input values and their derivativesNCompute the answer and directional derivative of a function given a zipped up * of the input values and their derivativesMCompute a vector of directional derivatives for a function given a zipped up + of the input values and their derivatives.YCompute a vector of answers and directional derivatives for a function given a zipped up + of the input values and their derivatives.The Y function calculates the first derivative of a scalar-to-scalar function by forward-mode o diff sin 01.0The U function calculates the result and first derivative of scalar-to-scalar function by Forward mode o   ==  "#   f = f "# d f  diff' sin 0 (0.0,1.0) diff' exp 0 (1.0,1.0)The N function calculates the first derivatives of scalar-to-nonscalar function by Forward mode odiffF (\a -> [sin a, cos a]) 0 [1.0,-0.0]The \ function calculates the result and first derivatives of a scalar-to-non-scalar function by Forward mode odiffF' (\a -> [sin a, cos a]) 0[(0.0,1.0),(1.0,-0.0)]BA fast, simple, transposed Jacobian computed with forward-mode AD.2A fast, simple, transposed Jacobian computed with Forward mode o) that combines the output with the input.Compute the Jacobian using Forward mode o%. This must transpose the result, so ( is faster and allows more result types.7jacobian (\[x,y] -> [y,x,x+y,x*y,exp x * sin y]) [pi,1]_[[0.0,1.0],[1.0,0.0],[1.0,1.0],[1.0,3.141592653589793],[19.472221418841606,12.502969588876512]]Compute the Jacobian using Forward mode oK and combine the output with the input. This must transpose the result, so ) is faster, and allows more result types.Compute the Jacobian using Forward mode o along with the actual answer.Compute the Jacobian using Forward mode oW combined with the input using a user specified function, along with the actual answer.9Compute the gradient of a function using forward mode AD.Note, this performs O(n) worse than $% for n2 inputs, in exchange for better space utilization.DCompute the gradient and answer to a function using forward mode AD.Note, this performs O(n) worse than $& for n2 inputs, in exchange for better space utilization.Compute the gradient of a function using forward mode AD and combine the result with the input using a user-specified function.Note, this performs O(n) worse than $' for n2 inputs, in exchange for better space utilization.Compute the gradient of a function using forward mode AD and the answer, and combine the result with the input using a user-specified function.Note, this performs O(n) worse than $( for n2 inputs, in exchange for better space utilization.gradWith' (,) sum [0..4]:(10.0,[(0.0,1.0),(1.0,1.0),(2.0,1.0),(3.0,1.0),(4.0,1.0)])3oo3(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone+34579>KLSWe only store partials in sorted order, so the map contained in a partial will only contain partials with equal or greater keys to that of the map in which it was found. This should be key for efficiently computing sparse hessians. there are only (n + k - 1) choose (k - 1) distinct nth partial derivatives of a function with k inputs.A monomial is used to indicate order of differentiation. For a k-ary function, it represented as a list of k non-negative Ints. MI [n_0,n_1...n_{k-1}] denotes differentiation n_0 times with respect to variable 0, n_1 times to variable 1, etc. Trailing zeros omitted for efficiency.AAdd 1 to variable k (i.e.differentiate once more wrt variable k).4     ,     (c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone  (c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNoneIN oo (c) Edward Kmett 2012-2015BSD3ekmett@gmail.com experimentalGHC onlyNone!"+3457>ILN This is used to create a new entry on the chain given a unary function, its derivative with respect to its input, the variable ID of its input, and the value of its input. Used by  and  internally.This is used to create a new entry on the chain given a binary function, its derivatives with respect to its inputs, their variable IDs and values. Used by  internally.^Helper that extracts the derivative of a chain when the chain was constructed with 1 variable.nHelper that extracts both the primal and derivative of a chain when the chain was constructed with 1 variable.5Used internally to push sensitivities down the chain.DExtract the partials from the current chain for a given AD variable. Return an  of # given bounds for the variable IDs. Return an  of sparse partials"Construct a tape that starts with n variables.3 !"#$%&'()*+,-./0* !"#$%&'()*+,-./0(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone *+4>CKL>The composition of two AD modes is an AD mode in its own right11(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone 479INThe k function calculates the gradient of a non-scalar-to-scalar function with reverse-mode AD in a single pass. grad (\[x,y,z] -> x*y+z) [1,2,3][2,1,1]The v function calculates the result and gradient of a non-scalar-to-scalar function with reverse-mode AD in a single pass.!grad' (\[x,y,z] -> x*y+z) [1,2,3] (5,[2,1,1]) g fE function calculates the gradient of a non-scalar-to-scalar function ft with reverse-mode AD in a single pass. The gradient is combined element-wise with the argument using the function g.  ==  (_ dx -> dx) 2 ==  3  g fG calculates the result and gradient of a non-scalar-to-scalar function fs with reverse-mode AD in a single pass the gradient is combined element-wise with the argument using the function g.  ==  (_ dx -> dx) The c function calculates the jacobian of a non-scalar-to-non-scalar function with reverse AD lazily in m passes for m outputs.$jacobian (\[x,y] -> [y,x,x*y]) [2,1][[0,1],[1,0],[1,2]]The b function calculates both the result and the Jacobian of a nonscalar-to-nonscalar function, using m# invocations of reverse AD, where m( is the output dimensionality. Applying fmap snd* to the result will recover the result of  | An alias for gradF'%jacobian' (\[x,y] -> [y,x,x*y]) [2,1][(1,[0,1]),(2,[1,0]),(2,[1,2])]R'jacobianWith g f' calculates the Jacobian of a non-scalar-to-non-scalar function f with reverse AD lazily in m passes for m outputs.kInstead of returning the Jacobian matrix, the elements of the matrix are combined with the input using the g.  ==  (_ dx -> dx)  3 == (f x -> 3 x 4 f x) W g f' calculates both the result and the Jacobian of a nonscalar-to-nonscalar function f, using m# invocations of reverse AD, where m( is the output dimensionality. Applying fmap snd* to the result will recover the result of kInstead of returning the Jacobian matrix, the elements of the matrix are combined with the input using the g.  ==  (_ dx -> dx)%Compute the derivative of a function. diff sin 01.0The Z function calculates the result and derivative, as a pair, of a scalar-to-scalar function. diff' sin 0 (0.0,1.0) diff' exp 0 (1.0,1.0)`Compute the derivatives of each result of a scalar-to-vector function with regards to its input.diffF (\a -> [sin a, cos a]) 0 [1.0,0.0]vCompute the derivatives of each result of a scalar-to-vector function with regards to its input along with the answer.diffF' (\a -> [sin a, cos a]) 0[(0.0,1.0),(1.0,0.0)]Compute the hessian via the jacobian of the gradient. gradient is computed in reverse mode and then the jacobian is computed in reverse mode.However, since the  f :: f a -> f ai is square this is not as fast as using the forward-mode Jacobian of a reverse mode gradient provided by  ).hessian (\[x,y] -> x*y) [1,2] [[0,1],[1,0]]Compute the order 3 Hessian tensor on a non-scalar-to-non-scalar function via the reverse-mode Jacobian of the reverse-mode Jacobian of the function.Less efficient than *+.0hessianF (\[x,y] -> [x*y,x+y,exp x*cos y]) [1,2][[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568135,-2.4717266720048188],[-2.4717266720048188,1.1312043837568135]]]  (c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone+3579>ILNKahn is a A using reverse-mode automatic differentiation that provides fast diffFU, diff2FU, grad, grad2 and a fast jacobianE when you have a significantly smaller number of outputs than inputs.A Tape\ records the information needed back propagate from the output to each input during reverse  AD.5*back propagate sensitivities along a tape. hThis returns a list of contributions to the partials. The variable ids returned in the list are likely not unique!  Return an  of  # given bounds for the variable IDs.  Return an  of sparse partials1678 59    :;<=>?@ABCDEFGHIJ          (678 59    :;<=>?@ABCDEFGHIJ(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone 3579INThe h function calculates the gradient of a non-scalar-to-scalar function with kahn-mode AD in a single pass. grad (\[x,y,z] -> x*y+z) [1,2,3][2,1,1]The s function calculates the result and gradient of a non-scalar-to-scalar function with kahn-mode AD in a single pass.+grad' (\[x,y,z] -> 4*x*exp y+cos z) [1,2,3]L(28.566231899122155,[29.5562243957226,29.5562243957226,-0.1411200080598672]) g fE function calculates the gradient of a non-scalar-to-scalar function fq with kahn-mode AD in a single pass. The gradient is combined element-wise with the argument using the function g.  =  (_ dx -> dx) 2 =  const  g fG calculates the result and gradient of a non-scalar-to-scalar function fp with kahn-mode AD in a single pass the gradient is combined element-wise with the argument using the function g.  ==  (_ dx -> dx)The ` function calculates the jacobian of a non-scalar-to-non-scalar function with kahn AD lazily in m passes for m outputs.$jacobian (\[x,y] -> [y,x,x*y]) [2,1][[0,1],[1,0],[1,2]],jacobian (\[x,y] -> [exp y,cos x,x+y]) [1,2]<[[0.0,7.38905609893065],[-0.8414709848078965,0.0],[1.0,1.0]]The b function calculates both the result and the Jacobian of a nonscalar-to-nonscalar function, using m invocations of kahn AD, where m( is the output dimensionality. Applying fmap snd* to the result will recover the result of  | An alias for gradF'Kghci> jacobian' ([x,y] -> [y,x,x*y]) [2,1] [(1,[0,1]),(2,[1,0]),(2,[1,2])]R'jacobianWith g f' calculates the Jacobian of a non-scalar-to-non-scalar function f with kahn AD lazily in m passes for m outputs.kInstead of returning the Jacobian matrix, the elements of the matrix are combined with the input using the g.  =  (_ dx -> dx)  3 = (f x -> 3 x 4 f x) W g f' calculates both the result and the Jacobian of a nonscalar-to-nonscalar function f, using m invocations of kahn AD, where m( is the output dimensionality. Applying fmap snd* to the result will recover the result of kInstead of returning the Jacobian matrix, the elements of the matrix are combined with the input using the g.  ==  (_ dx -> dx)%Compute the derivative of a function. diff sin 01.0cos 01.0The Z function calculates the value and derivative, as a pair, of a scalar-to-scalar function. diff' sin 0 (0.0,1.0) ]Compute the derivatives of a function that returns a vector with regards to its single input.diffF (\a -> [sin a, cos a]) 0 [1.0,0.0]!{Compute the derivatives of a function that returns a vector with regards to its single input as well as the primal answer.diffF' (\a -> [sin a, cos a]) 0[(0.0,1.0),(1.0,0.0)]" Compute the " via the * of the gradient. gradient is computed in  mode and then the  is computed in  mode.However, since the  f :: f a -> f a9 is square this is not as fast as using the forward-mode ( of a reverse mode gradient provided by  ).hessian (\[x,y] -> x*y) [1,2] [[0,1],[1,0]]#RCompute the order 3 Hessian tensor on a non-scalar-to-non-scalar function via the -mode Jacobian of the -mode Jacobian of the function.Less efficient than *+.0hessianF (\[x,y] -> [x*y,x+y,exp x*cos y]) [1,2][[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568135,-2.4717266720048188],[-2.4717266720048188,1.1312043837568135]]] !"#   !"# "# !  !"#(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone 3579IN$The $h function calculates the gradient of a non-scalar-to-scalar function with kahn-mode AD in a single pass. grad (\[x,y,z] -> x*y+z) [1,2,3][2,1,1]%The %s function calculates the result and gradient of a non-scalar-to-scalar function with kahn-mode AD in a single pass.+grad' (\[x,y,z] -> 4*x*exp y+cos z) [1,2,3]L(28.566231899122155,[29.5562243957226,29.5562243957226,-0.1411200080598672])&$ g fE function calculates the gradient of a non-scalar-to-scalar function fq with kahn-mode AD in a single pass. The gradient is combined element-wise with the argument using the function g. $ = & (_ dx -> dx) 2 = & const '% g fG calculates the result and gradient of a non-scalar-to-scalar function fp with kahn-mode AD in a single pass the gradient is combined element-wise with the argument using the function g. % == ' (_ dx -> dx)(The (` function calculates the jacobian of a non-scalar-to-non-scalar function with kahn AD lazily in m passes for m outputs.$jacobian (\[x,y] -> [y,x,x*y]) [2,1][[0,1],[1,0],[1,2]],jacobian (\[x,y] -> [exp y,cos x,x+y]) [1,2]<[[0.0,7.38905609893065],[-0.8414709848078965,0.0],[1.0,1.0]])The )b function calculates both the result and the Jacobian of a nonscalar-to-nonscalar function, using m invocations of kahn AD, where m( is the output dimensionality. Applying fmap snd* to the result will recover the result of ( | An alias for gradF'Kghci> jacobian' ([x,y] -> [y,x,x*y]) [2,1] [(1,[0,1]),(2,[1,0]),(2,[1,2])]*R'jacobianWith g f' calculates the Jacobian of a non-scalar-to-non-scalar function f with kahn AD lazily in m passes for m outputs.kInstead of returning the Jacobian matrix, the elements of the matrix are combined with the input using the g. ( = * (_ dx -> dx) * 3 = (f x -> 3 x 4 f x) +*W g f' calculates both the result and the Jacobian of a nonscalar-to-nonscalar function f, using m invocations of kahn AD, where m( is the output dimensionality. Applying fmap snd* to the result will recover the result of *kInstead of returning the Jacobian matrix, the elements of the matrix are combined with the input using the g. ) == + (_ dx -> dx),%Compute the derivative of a function. diff sin 01.0cos 01.0-The -Z function calculates the value and derivative, as a pair, of a scalar-to-scalar function. diff' sin 0 (0.0,1.0).]Compute the derivatives of a function that returns a vector with regards to its single input.diffF (\a -> [sin a, cos a]) 0 [1.0,0.0]/{Compute the derivatives of a function that returns a vector with regards to its single input as well as the primal answer.diffF' (\a -> [sin a, cos a]) 0[(0.0,1.0),(1.0,0.0)]0 Compute the 0 via the (* of the gradient. gradient is computed in  mode and then the ( is computed in  mode.However, since the $ f :: f a -> f a9 is square this is not as fast as using the forward-mode (( of a reverse mode gradient provided by  ).hessian (\[x,y] -> x*y) [1,2] [[0,1],[1,0]]1RCompute the order 3 Hessian tensor on a non-scalar-to-non-scalar function via the -mode Jacobian of the -mode Jacobian of the function.Less efficient than *+.0hessianF (\[x,y] -> [x*y,x+y,exp x*cos y]) [1,2][[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568135,-2.4717266720048188],[-2.4717266720048188,1.1312043837568135]]]$%&'()*+,-./01 o$%&'()*+,-./01o $%&'()*+01,-./$%&'()*+,-./01(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone+357>ILN22 mode AD6Calculate the 6 using forward mode AD.23456789:KL;<=>?MNOPQRSTUVWXYZ[23456789:;<=>?2345:6879;<=>?23456789:KL;<=>?MNOPQRSTUVWXYZ[(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone@CCompute the directional derivative of a function given a zipped up * of the input values and their derivativesANCompute the answer and directional derivative of a function given a zipped up * of the input values and their derivativesBMCompute a vector of directional derivatives for a function given a zipped up + of the input values and their derivatives.CYCompute a vector of answers and directional derivatives for a function given a zipped up + of the input values and their derivatives.DThe DY function calculates the first derivative of a scalar-to-scalar function by forward-mode AD diff sin 01.0EThe EU function calculates the result and first derivative of scalar-to-scalar function by 2 mode AD E  ==  "#  E f = f "# d f  diff' sin 0 (0.0,1.0) diff' exp 0 (1.0,1.0)FThe FN function calculates the first derivatives of scalar-to-nonscalar function by 2 mode ADdiffF (\a -> [sin a, cos a]) 0 [1.0,-0.0]GThe G\ function calculates the result and first derivatives of a scalar-to-non-scalar function by 2 mode ADdiffF' (\a -> [sin a, cos a]) 0[(0.0,1.0),(1.0,-0.0)]HBA fast, simple, transposed Jacobian computed with forward-mode AD.I2A fast, simple, transposed Jacobian computed with 2 mode AD) that combines the output with the input.JCompute the Jacobian using 2 mode AD%. This must transpose the result, so H( is faster and allows more result types.7jacobian (\[x,y] -> [y,x,x+y,x*y,exp x * sin y]) [pi,1]_[[0.0,1.0],[1.0,0.0],[1.0,1.0],[1.0,3.141592653589793],[19.472221418841606,12.502969588876512]]KCompute the Jacobian using 2 mode ADK and combine the output with the input. This must transpose the result, so I) is faster, and allows more result types.LCompute the Jacobian using 2 mode AD along with the actual answer.MCompute the Jacobian using 2 mode ADW combined with the input using a user specified function, along with the actual answer.N9Compute the gradient of a function using forward mode AD.Note, this performs O(n) worse than $% for n2 inputs, in exchange for better space utilization.ODCompute the gradient and answer to a function using forward mode AD.Note, this performs O(n) worse than $& for n2 inputs, in exchange for better space utilization.PCompute the gradient of a function using forward mode AD and combine the result with the input using a user-specified function.Note, this performs O(n) worse than $' for n2 inputs, in exchange for better space utilization.QCompute the gradient of a function using forward mode AD and the answer, and combine the result with the input using a user-specified function.Note, this performs O(n) worse than $( for n2 inputs, in exchange for better space utilization.gradWith' (,) sum [0..4]$(10,[(0,1),(1,1),(2,1),(3,1),(4,1)])RRCompute the product of a vector with the Hessian using forward-on-forward-mode AD.SJCompute the gradient and hessian product using forward-on-forward-mode AD.@ABCDEFGHIJKLMNOPQRS 2@ABCDEFGHIJKLMNOPQRS2 NOPQJLKMHIRSDEFG@ABC@ABCDEFGHIJKLMNOPQRS(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone 4>LNTThe T function finds a zero of a scalar function using Newton's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned. Examples: take 10 $ findZero (\x->x^2-4) 1I[1.0,2.5,2.05,2.000609756097561,2.0000000929222947,2.000000000000002,2.0].last $ take 10 $ findZero ((+1).(^2)) (1 :+ 1) 0.0 :+ 1.0UThe U function inverts a scalar function using Newton's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned.Example:)last $ take 10 $ inverse sqrt 1 (sqrt 10)10.0VThe V function find a fixedpoint of a scalar function using Newton's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.)SIf the stream becomes constant ("it converges"), no further elements are returned.!last $ take 10 $ fixedPoint cos 10.7390851332151607WThe W function finds an extremum of a scalar function using Newton's method; produces a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned.last $ take 10 $ extremum cos 10.0XThe Xb function performs a multivariate optimization, based on the naive-gradient-descent in the file 1stalingrad/examples/flow-tests/pre-saddle-1a.vlad from the VLAD compiler Stalingrad sources. Its output is a stream of increasingly accurate results. (Modulo the usual caveats.)GIt uses reverse mode automatic differentiation to compute the gradient.Y`Perform a gradient descent using reverse mode automatic differentiation to compute the gradient.TUVWXYTUVWXYTUVWXYTUVWXY(c) Edward Kmett 2015BSD3ekmett@gmail.com experimentalGHC onlyNone 4>LNZThe Z function finds a zero of a scalar function using Newton's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned. Examples: take 10 $ findZero (\x->x^2-4) 1I[1.0,2.5,2.05,2.000609756097561,2.0000000929222947,2.000000000000002,2.0][The [ function inverts a scalar function using Newton's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned.Example:)last $ take 10 $ inverse sqrt 1 (sqrt 10)10.0\The \ function find a fixedpoint of a scalar function using Newton's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.)SIf the stream becomes constant ("it converges"), no further elements are returned.!last $ take 10 $ fixedPoint cos 10.7390851332151607]The ] function finds an extremum of a scalar function using Newton's method; produces a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned.last $ take 10 $ extremum cos 10.0Z[\]Z[\]Z[\]Z[\](c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNoneIN^CCompute the directional derivative of a function given a zipped up * of the input values and their derivatives_NCompute the answer and directional derivative of a function given a zipped up * of the input values and their derivatives`MCompute a vector of directional derivatives for a function given a zipped up + of the input values and their derivatives.aYCompute a vector of answers and directional derivatives for a function given a zipped up + of the input values and their derivatives.bThe bY function calculates the first derivative of a scalar-to-scalar function by forward-mode o diff sin 01.0cThe cU function calculates the result and first derivative of scalar-to-scalar function by 2 mode o c  ==  "#  c f = f "# d f  diff' sin 0 (0.0,1.0) diff' exp 0 (1.0,1.0)dThe dN function calculates the first derivatives of scalar-to-nonscalar function by 2 mode odiffF (\a -> [sin a, cos a]) 0 [1.0,-0.0]eThe e\ function calculates the result and first derivatives of a scalar-to-non-scalar function by 2 mode odiffF' (\a -> [sin a, cos a]) 0[(0.0,1.0),(1.0,-0.0)]fBA fast, simple, transposed Jacobian computed with forward-mode AD.g2A fast, simple, transposed Jacobian computed with 2 mode o) that combines the output with the input.hCompute the Jacobian using 2 mode o%. This must transpose the result, so f( is faster and allows more result types.7jacobian (\[x,y] -> [y,x,x+y,x*y,exp x * sin y]) [pi,1]_[[0.0,1.0],[1.0,0.0],[1.0,1.0],[1.0,3.141592653589793],[19.472221418841606,12.502969588876512]]iCompute the Jacobian using 2 mode oK and combine the output with the input. This must transpose the result, so g) is faster, and allows more result types.jCompute the Jacobian using 2 mode o along with the actual answer.kCompute the Jacobian using 2 mode oW combined with the input using a user specified function, along with the actual answer.l9Compute the gradient of a function using forward mode AD.Note, this performs O(n) worse than $% for n2 inputs, in exchange for better space utilization.mDCompute the gradient and answer to a function using forward mode AD.Note, this performs O(n) worse than $& for n2 inputs, in exchange for better space utilization.nCompute the gradient of a function using forward mode AD and combine the result with the input using a user-specified function.Note, this performs O(n) worse than $' for n2 inputs, in exchange for better space utilization.oCompute the gradient of a function using forward mode AD and the answer, and combine the result with the input using a user-specified function.Note, this performs O(n) worse than $( for n2 inputs, in exchange for better space utilization.gradWith' (,) sum [0..4]$(10,[(0,1),(1,1),(2,1),(3,1),(4,1)])pRCompute the product of a vector with the Hessian using forward-on-forward-mode AD.qJCompute the gradient and hessian product using forward-on-forward-mode AD.^_`abcdefghijklmnopq o2^_`abcdefghijklmnopqo2 lmnohjikfgpqbcde^_`a^_`abcdefghijklmnopq(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone!"-./4>@ILN rFConvex constraint, CC, is a GADT wrapper that hides the existential (sn) which is so prevalent in the rest of the API. This is an engineering convenience for managing the skolems.tThe t function finds a zero of a scalar function using Newton's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned. Examples: take 10 $ findZero (\x->x^2-4) 1I[1.0,2.5,2.05,2.000609756097561,2.0000000929222947,2.000000000000002,2.0].last $ take 10 $ findZero ((+1).(^2)) (1 :+ 1) 0.0 :+ 1.0uThe u function inverts a scalar function using Newton's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned.Example:)last $ take 10 $ inverse sqrt 1 (sqrt 10)10.0vThe v function find a fixedpoint of a scalar function using Newton's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.)SIf the stream becomes constant ("it converges"), no further elements are returned.!last $ take 10 $ fixedPoint cos 10.7390851332151607wThe w function finds an extremum of a scalar function using Newton's method; produces a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned.last $ take 10 $ extremum cos 10.0xThe xb function performs a multivariate optimization, based on the naive-gradient-descent in the file 1stalingrad/examples/flow-tests/pre-saddle-1a.vlad from the VLAD compiler Stalingrad sources. Its output is a stream of increasingly accurate results. (Modulo the usual caveats.)GIt uses reverse mode automatic differentiation to compute the gradient.yconstrainedDescent obj fs env optimizes the convex function obj$ subject to the convex constraints f <= 0 where f \ fs. This is done using a log barrier to model constraints (i.e. Boyd, Chapter 11.3). The returned optimal point for the objective function must satisfy fs , but the initial environment, env, needn't be feasible.]Like y+ except the initial point must be feasible.{The { function approximates the true gradient of the constFunction by a gradient at a single example. As the algorithm sweeps through the training set, it performs the update for each training example.It uses reverse mode automatic differentiation to compute the gradient The learning rate is constant through out, and is set to 0.001|`Perform a gradient descent using reverse mode automatic differentiation to compute the gradient.}Perform a conjugate gradient descent using reverse mode automatic differentiation to compute the gradient, and using forward-on-forward mode for computing extrema.let sq x = x * x7let rosenbrock [x,y] = sq (1 - x) + 100 * sq (y - sq x)rosenbrock [0,0]1Brosenbrock (conjugateGradientDescent rosenbrock [0, 0] !! 5) < 0.1True~iPerform a conjugate gradient ascent using reverse mode automatic differentiation to compute the gradient.rs^_`atuvwxyz]bc{|}de~ rstuvwxyz{|}~ tuvwxyrsz|}~{rs^_`atuvwxyz]bc{|}de~(c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNoneThe  function finds a zero of a scalar function using Halley's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned. Examples: take 10 $ findZero (\x->x^2-4) 1B[1.0,1.8571428571428572,1.9997967892704736,1.9999999999994755,2.0].last $ take 10 $ findZero ((+1).(^2)) (1 :+ 1) 0.0 :+ 1.0The  function inverts a scalar function using Halley's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned. Note: the "take 10 $ inverse sqrt 1 (sqrt 10)j example that works for Newton's method fails with Halley's method because the preconditions do not hold!The  function find a fixedpoint of a scalar function using Halley's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.)SIf the stream becomes constant ("it converges"), no further elements are returned.!last $ take 10 $ fixedPoint cos 10.7390851332151607The  function finds an extremum of a scalar function using Halley's method; produces a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned.take 10 $ extremum cos 1G[1.0,0.29616942658570555,4.59979519460002e-3,1.6220740159042513e-8,0.0](c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNoneINThe  function finds a zero of a scalar function using Halley's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned. Examples: take 10 $ findZero (\x->x^2-4) 1B[1.0,1.8571428571428572,1.9997967892704736,1.9999999999994755,2.0].last $ take 10 $ findZero ((+1).(^2)) (1 :+ 1) 0.0 :+ 1.0The  function inverts a scalar function using Halley's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned. Note: the "take 10 $ inverse sqrt 1 (sqrt 10)j example that works for Newton's method fails with Halley's method because the preconditions do not hold!The  function find a fixedpoint of a scalar function using Halley's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.)SIf the stream becomes constant ("it converges"), no further elements are returned.!last $ take 10 $ fixedPoint cos 10.7390851332151607The  function finds an extremum of a scalar function using Halley's method; produces a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned.take 10 $ extremum cos 1G[1.0,0.29616942658570555,4.59979519460002e-3,1.6220740159042513e-8,0.0](c) Edward Kmett 2015BSD3ekmett@gmail.com experimentalGHC onlyNone 4>ILNThe  function finds a zero of a scalar function using Newton's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned. Examples: take 10 $ findZero (\x->x^2-4) 1I[1.0,2.5,2.05,2.000609756097561,2.0000000929222947,2.000000000000002,2.0]The  function inverts a scalar function using Newton's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned.Example:)last $ take 10 $ inverse sqrt 1 (sqrt 10)10.0The  function find a fixedpoint of a scalar function using Newton's method; its output is a stream of increasingly accurate results. (Modulo the usual caveats.)SIf the stream becomes constant ("it converges"), no further elements are returned.!last $ take 10 $ fixedPoint cos 10.7390851332151607The  function finds an extremum of a scalar function using Newton's method; produces a stream of increasingly accurate results. (Modulo the usual caveats.) If the stream becomes constant ("it converges"), no further elements are returned.last $ take 10 $ extremum cos 10.0Perform a conjugate gradient descent using reverse mode automatic differentiation to compute the gradient, and using forward-on-forward mode for computing extrema.let sq x = x * x7let rosenbrock [x,y] = sq (1 - x) + 100 * sq (y - sq x)rosenbrock [0,0]1Brosenbrock (conjugateGradientDescent rosenbrock [0, 0] !! 5) < 0.1TrueiPerform a conjugate gradient ascent using reverse mode automatic differentiation to compute the gradient.fgfg (c) Edward Kmett 2010-2015BSD3ekmett@gmail.com experimentalGHC onlyNone4>ILN f wv% computes the product of the hessian H$ of a non-scalar-to-scalar function f at w = h  $ wv with a vector v = snd  $ wv# using "Pearlmutter's method" from  ?http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.6143, which states: %H v = (d/dr) grad_w (w + r v) | r = 0Or in other words, we take the directional derivative of the gradient. The gradient is calculated in reverse mode, then the directional derivative is calculated in forward mode. f wv6 computes both the gradient of a non-scalar-to-scalar f at w = h  $ wv and the product of the hessian H at w with a vector v = snd  $ wvT using "Pearlmutter's method". The outputs are returned wrapped in the same functor. %H v = (d/dr) grad_w (w + r v) | r = 0Or in other words, we return the gradient and the directional derivative of the gradient. The gradient is calculated in reverse mode, then the directional derivative is calculated in forward mode.Compute the Hessian via the Jacobian of the gradient. gradient is computed in reverse mode and then the Jacobian is computed in sparse (forward) mode.hessian (\[x,y] -> x*y) [1,2] [[0,1],[1,0]]PCompute the order 3 Hessian tensor on a non-scalar-to-non-scalar function using 'Sparse'-on-'Reverse'0hessianF (\[x,y] -> [x*y,x+y,exp x*cos y]) [1,2][[[0.0,1.0],[1.0,0.0]],[[0.0,0.0],[0.0,0.0]],[[-1.1312043837568135,-2.4717266720048188],[-2.4717266720048188,1.1312043837568135]]]4 orstuvwxy ^_`abcdefgx{|}~4o  fgbdcertsu^_`avwxyx|}~{i,-./0123456789::;<=>?@ABCDEFGHHIJKLMNOPQRSTUVUWXYZR [ [ \ ] ^ P R _ ` a b c d e f g h i j k l m B Enopqrstuvwxyz{%&'(|}~rsnopq   | } ~  r s n o p q n o p q r s t u v w x y z { % & ' (WZRMNX\%&'(xzy{)+%&'(xzy{)+WV\_%&'(xzy{rstu)+WV\_%&'(xzy{rstu)+%&'(xzy{rstu)+VW]^PR\_`abcnopqrstuvwxyz{%&'(nopqrstuvwxyz{%&'( ) +!!!!\\                ! " # $ % & ' ( ) * + , - . / 0 1 2 3456768 9:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijk4l4mnoSp>qrstuvwxyz{|}~ad_6MReIMLst29LshbuCOFFttNumeric.AD.JetNumeric.AD.ModeNumeric.AD.Internal.IdentityNumeric.AD.JacobianNumeric.AD.Rank1.TowerNumeric.AD.Internal.TowerNumeric.AD.Internal.DenseNumeric.AD.Rank1.Forward.Double"Numeric.AD.Internal.Forward.DoubleNumeric.AD.Internal.OrNumeric.AD.Internal.TypeNumeric.AD.Mode.TowerNumeric.AD.Mode.Forward.DoubleNumeric.AD.Rank1.SparseNumeric.AD.Internal.SparseNumeric.AD.Mode.SparseNumeric.AD.Mode.ReverseNumeric.AD.Internal.ReverseNumeric.AD.Internal.OnNumeric.AD.Rank1.KahnNumeric.AD.Internal.KahnNumeric.AD.Mode.KahnNumeric.AD.Rank1.ForwardNumeric.AD.Internal.ForwardNumeric.AD.Rank1.NewtonNumeric.AD.Rank1.Newton.DoubleNumeric.AD.Mode.ForwardNumeric.AD.NewtonNumeric.AD.Rank1.HalleyNumeric.AD.HalleyNumeric.AD.Newton.Double Numeric.ADNumeric.AD.Internal.Combinators Control.Arrow&&&Numeric.AD.Mode.Wengertgradgrad'gradWith gradWith'hessianNumeric.AD.Mode.MixedhessianFJet:-tailJetheadJetjetModeScalarisKnownConstant isKnownZeroauto*^^*^/zeroIdrunIdprobeunprobeprobedunprobedJacobianDunarylift1lift1_binarylift2lift2_TowergetTowerzeroPadzeroPadF transposePadFdd'tangentsbundlewithDapply getADTowertowerDenseLiftZerodsds'vars ForwardDoubleprimaltangentunbundlebindbind'bindWith bindWith' transposeWithOrLRChosenchooseTFrunLrunRchosendudu'duFduF'diffdiff'diffFdiffF' jacobianT jacobianWithTjacobian jacobianWith jacobian' jacobianWith'diffsdiffs0diffsFdiffs0Ftaylortaylor0 maclaurin maclaurin0dusdus0dusFdus0FADrunADGradspacksunpacksGradpackunpackunpack'SparseIndex emptyIndex addToIndexindicesskeletonpartialspartialvgradvgrad'vgradsderivtermsgrads jacobianshessian' hessianF'ReverseTapegetTapeHeadCellsNilUnaryBinary derivativeOf derivativeOf'partialspartialArrayOf partialMapOf reifyTapevarvarIdunbind unbindWith unbindMapunbindMapWithDefaultOnoffKahnVar derivative derivative' partialArray partialMapForwardhessianProducthessianProduct'findZeroinverse fixedPointextremumgradientDescentgradientAscentCCconstrainedDescentevalstochasticGradientDescentconjugateGradientDescentconjugateGradientAscentShowablefree_DeBCXDwdA2o7XBwK8oE1oCControl.Comonad.CofreeCofreeshowable$fTraversableJet $fFoldableJet $fFunctorJet $fShowJet$fShowShowable $fModeRatio $fModeComplex $fModeWord64 $fModeWord32 $fModeWord16 $fModeWord8 $fModeWord $fModeNatural $fModeInt64 $fModeInt32 $fModeInt16 $fModeInt8 $fModeInteger $fModeInt $fModeFloat $fModeDoublepidunpid$fModeIdzipWithTbase Data.FoldableFoldableData.Traversable TraversablezipWithDefaultT withPrimalGHC.EnumEnumfromBy truncated<+><**>mul $fInvErfTower $fErfTower$fRealFracTower$fRealFloatTower $fRealTower $fEnumTower$fFloatingTower$fFractionalTower $fNumTower$fBoundedTower $fOrdTower $fEqTower$fJacobianTower $fModeTower $fShowTower $fInvErfDense $fErfDense$fRealFracDense$fRealFloatDense $fRealDense $fEnumDense$fFloatingDense$fFractionalDense $fNumDense$fBoundedDense $fOrdDense $fEqDense$fJacobianDense $fModeDense $fShowDense$fInvErfForwardDouble$fErfForwardDouble$fRealFracForwardDouble$fRealFloatForwardDouble$fRealForwardDouble$fEnumForwardDouble$fFloatingForwardDouble$fFractionalForwardDouble$fNumForwardDouble$fOrdForwardDouble$fEqForwardDouble$fJacobianForwardDouble$fModeForwardDouble impossible$fModeOr $fRealFloatOr $fInvErfOr$fErfOr $fFloatingOr $fRealFracOr$fFractionalOr$fRealOr$fNumOr $fBoundedOr$fEnumOr$fOrdOr$fEqOr $fChosenT $fChosenFGHC.BaseFunctor GHC.Floatsincos$fModeAD incMonomialisZero$fGrads(->)(->)a$fGradsSparseCofreea$fGrad(->)(->)(->)a$fGradSparse[](,)a$fInvErfSparse $fErfSparse$fRealFracSparse$fRealFloatSparse $fRealSparse $fEnumSparse$fFloatingSparse$fFractionalSparse $fNumSparse$fBoundedSparse $fOrdSparse $fEqSparse$fJacobianSparse $fModeSparsesecondd2d2'unarilybinarily backPropagateGHC.ArrArrayconta_LKCPrTJwOTOLk4OU37YmeNData.IntMap.BaseIntMap dropCellsunbin modifyTape$fInvErfReverse $fErfReverse$fRealFracReverse$fRealFloatReverse $fRealReverse $fEnumReverse$fFloatingReverse$fFractionalReverse $fNumReverse$fBoundedReverse $fOrdReverse $fEqReverse$fJacobianReverse $fModeReverse$fModeOnidconst Data.Functor<$>topSortAcyclic$fGradKahn[](,)a $fInvErfKahn $fErfKahn$fRealFracKahn$fRealFloatKahn $fRealKahn $fEnumKahn$fFloatingKahn$fFractionalKahn $fNumKahn $fBoundedKahn $fOrdKahn$fEqKahn$fJacobianKahn $fModeKahn $fMuRefKahn$fInvErfForward $fErfForward$fRealFracForward$fRealFloatForward $fRealForward $fEnumForward$fFloatingForward$fFractionalForward $fNumForward$fBoundedForward $fOrdForward $fEqForward$fJacobianForward $fModeForwardelemconstrainedConvex'SEnvsValueorigEnvmkOptiHatlfurfu Data.Tuplefst