~| (      !"#$% & '  Safe-Inferred #A linear constraint. For instance, Constr LT 2 (V2 1 3) defines  the constraint  x_1 + 3 x_2 <= 2 A step size schedule ; A list of functions (one per step) which, given a function's ; gradient and value, will provide a size for the next step !projSubgrad stepSizes proj a b x0 minimizes the objective A  x - b< potentially projecting iterates into a feasible space with  proj# with the given step-size schedule 'linearProjSubgrad stepSizes proj a b x0 minimizes the objective A  x - b< potentially projecting iterates into a feasible space with  proj# with the given step-size schedule =The optimal step size schedule when the optimal value of the  objective is known Constant step size schedule "A non-summable step size schedule %A square-summable step size schedule @Project onto a the space of feasible solutions defined by a set  of linear constraints A step size schedule ,Function projecting into the feasible space Gradient of objective function The objective function Initial solution A step size schedule ,Function projecting into the feasible space Coefficient vector A of objective  Constant b of objective Initial solution #The optimal value of the objective The step size The size of the first step The size of the first step A set of linear constraints    None  Opt d f gs hs5 is a Lagrangian optimization problem with objective f  equality ( g(x) == 0) constraints gs and less-than (h(x) < 0)  constraints hs 4Minimize the given constrained optimization problem ) This is a basic penalty method approach 4Maximize the given constrained optimization problem 6The Lagrangian for the given constrained optimization (@The augmented Lagrangian for the given constrained optimization  )* +Primal minimizer %The optimization problem of interest The penalty increase factor The primal starting value The dual starting value Optimizing iterates Primal minimizer %The optimization problem of interest The penalty increase factor The primal starting value The dual starting value Optimizing iterates (   )*  +( Safe-InferredNewton' s method 4Inverse by iterative method of Ben-Israel and Cohen  with given starting condition 4Inverse by iterative method of Ben-Israel and Cohen  starting from  alpha A^T . Alpha should be set such that  0 < alpha < 2/sigma^2 where sigma denotes the largest singular  value of A gradient of function inverse Hessian starting point  iterates  Safe-Inferred=Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) with  constant step size Lipschitz constant, l gradient of function starting point  iterates  Safe-InferredNesterov 1983  $optimalGradient kappa l df alpha0 x0 is Nesterov' s optimal @ gradient method, first described in 1983. This method requires % knowledge of the Lipschitz constant l of the gradient, the condition  number kappa", as well as an initial step size alpha0 in (0,1). ,quadratic a b c/ is the real solutions to a quadratic equation  a x^2 + b x + c == 0 condition number, kappa Lipschitz constant, l gradient of function initial step size, alpha0 starting point  iterates ,,portable provisionalBen Gamari <bgamari@gmail.com> Safe-Inferred A line search method  search df p x determines a step size  in direction p from point x for function f with gradient df -Armijo condition =The Armijo condition captures the intuition that step should H move far enough from its starting point to change the function enough, I as predicted by its gradient. This often finds its place as a criterion  for line-search .Curvature condition #Backtracking line search algorithm BThis is a building block for line search algorithms which reduces 7 its step size until the given condition is satisfied. #backtrackingSearch gamma alpha pred starts with the given step  size alpha and reduces it by a factor of gamma until the given  condition is satisfied. *Armijo backtracking line search algorithm armijoSearch gamma alpha c1! starts with the given step size alpha  and reduces it by a factor of gamma until the Armijo condition  is satisfied. DWolfe backtracking line search algorithm (satisfies both Armijo and  curvature conditions) wolfeSearch gamma alpha c1! starts with the given step size alpha  and reduces it by a factor of gamma until both the Armijo and $ curvature conditions is satisfied. /Line search by Newton' s method 02Line search by secant method with given tolerance Constant line search constantSearch c always chooses a step-size c. gradient of function search direction starting point  step size -Armijo condition strength function value gradient of function point to evaulate at search direction search step size is Armijo condition satisfied? . curvature condition strength c2 gradient of function point to evaluate at search direction search step size !is curvature condition satisfied !step size reduction factor gamma initial step size alpha search condition !step size reduction factor gamma initial step size alpha Armijo condition strength c1 function value !step size reduction factor gamma initial step size alpha Armijo condition strength c1  curvature condition strength c2 function value /0 step size  -./0 Safe-InferredA beta expression beta df0 df1 p is an expression for the 7 conjugate direction contribution given the derivative df0 and  direction p for iteration k, df1 for iteration k+1 AConjugate gradient method with given beta and line search method DThe conjugate gradient method avoids the trouble encountered by the I steepest descent method on poorly conditioned problems (e.g. those with I a wide range of eigenvalues). It does this by choosing directions which  satisfy a condition of A+ orthogonality, ensuring that steps in the   unstretched search space are orthogonal.  TODO: clarify explanation !$Fletcher-Reeves expression for beta ""Polak-Ribiere expression for beta #%Hestenes-Stiefel expression for beta  line search method beta expression gradient of function starting point  iterates !"#  !"# !"# !"# Safe-Inferred$=Barzilai-Borwein 1988 is a non-monotonic optimization method $gradient of function starting point, x0 second starting point, x1  iterates $$$  Safe-Inferred%Steepest descent method steepestDescent search f df x0 optimizes a function f with gradient df  with step size schedule search starting from initial point x0 JThe steepest descent method chooses the negative gradient of the function  as its step direction. %line search method gradient of function starting point  iterates %%%  Safe-Inferred&Mirror descent method. BOriginally described by Nemirovsky and Yudin and later elucidated H by Beck and Teboulle, the mirror descent method is a generalization of ; the projected subgradient method for convex optimization. 4 Mirror descent requires the gradient of a strongly  convex function psi* (and its dual) as well as a way to get a 6 subgradient for each point of the objective function f. &line search method strongly convex function, psi dual of psi gradient of function starting point  iterates &&&  Safe-Inferred'/Broyden Fletcher Goldfarb Shanno (BFGS) method  bfgs search df b0 x0 where b0 is the inverse Hessian (the * identity is often a good initial value). 'line search method gradient of function "inverse Hessian at starting point starting point  iterates '''1   !"#$%&'()*+,-./ 0 1 23456789:;optimization-0.1.2-Optimization.Constrained.ProjectedSubgradient Optimization.Constrained.PenaltyOptimization.TrustRegion.NewtonOptimization.TrustRegion.Fista%Optimization.TrustRegion.Nesterov1983Optimization.LineSearch)Optimization.LineSearch.ConjugateGradient'Optimization.LineSearch.BarzilaiBorwein'Optimization.LineSearch.SteepestDescent%Optimization.LineSearch.MirrorDescentOptimization.LineSearch.BFGS ConstraintConstr StepSched projSubgradlinearProjSubgradoptimalStepSchedconstStepSchedsqrtKStepSched invKStepSchedlinearProjectionOptFUrunFUoptimize constrainEQ constrainLT constrainGTminimizemaximize lagrangiannewtonbicInv'bicInvfistaoptimalGradient LineSearchbacktrackingSearch armijoSearch wolfeSearchconstantSearchBetaconjGradfletcherReeves polakRibierehestenesStiefelbarzilaiBorweinsteepestDescent mirrorDescentbfgs augLagrangianVaugment quadraticarmijo curvature newtonSearch secantSearch