|y(      !"#$% & ' None #A linear constraint. For instance, Constr LT 2 (V2 1 3) defines the constraint x_1 + 3 x_2 <= 2A step size schedule A list of functions (one per step) which, given a function's gradient and value, will provide a size for the next step!projSubgrad stepSizes proj a b x0 minimizes the objective A x - b= potentially projecting iterates into a feasible space with proj" with the given step-size schedule'linearProjSubgrad stepSizes proj a b x0 minimizes the objective A x - b= potentially projecting iterates into a feasible space with proj" with the given step-size schedulePThe optimal step size schedule when the optimal value of the objective is knownConstant step size schedule!A non-summable step size schedule$A square-summable step size schedule VProject onto a the space of feasible solutions defined by a set of linear constraints A step size schedule+Function projecting into the feasible spaceGradient of objective functionThe objective functionInitial solutionA step size schedule+Function projecting into the feasible spaceCoefficient vector A of objective Constant b of objectiveInitial solution"The optimal value of the objective The step sizeThe size of the first stepThe size of the first step A set of linear constraints    None(-./0345;>IKLN  Opt d f gs hs5 is a Lagrangian optimization problem with objective f equality ( g(x) == 0) constraints gs and less-than (h(x) < 0) constraints hs\Minimize the given constrained optimization problem This is a basic penalty method approach3Maximize the given constrained optimization problem5The Lagrangian for the given constrained optimization(?The augmented Lagrangian for the given constrained optimization )* +Primal minimizer$The optimization problem of interestThe penalty increase factorThe primal starting valueThe dual starting valueOptimizing iteratesPrimal minimizer$The optimization problem of interestThe penalty increase factorThe primal starting valueThe dual starting valueOptimizing iterates(   )*  +(NoneNewton's methodRInverse by iterative method of Ben-Israel and Cohen with given starting conditionCInverse by iterative method of Ben-Israel and Cohen starting from  alpha A^T=. Alpha should be set such that 0 < alpha < 2/sigma^2 where sigma) denotes the largest singular value of Agradient of functioninverse Hessianstarting pointiteratesNonePFast Iterative Shrinkage-Thresholding Algorithm (FISTA) with constant step sizeLipschitz constant, lgradient of functionstarting pointiteratesNoneNesterov 1983 $optimalGradient kappa l df alpha0 x0| is Nesterov's optimal gradient method, first described in 1983. This method requires knowledge of the Lipschitz constant l( of the gradient, the condition number kappa", as well as an initial step size alpha0 in (0,1).,quadratic a b c0 is the real solutions to a quadratic equation a x^2 + b x + c == 0condition number, kappaLipschitz constant, lgradient of functioninitial step size, alpha0starting pointiterates,,(c) 2012-2013 Ben Gamari BSD-style (see the file LICENSE)Ben Gamari <bgamari@gmail.com> provisionalportableNone A line search method  search df p x& determines a step size in direction p from point x for function f with gradient df-Armijo conditionThe Armijo condition captures the intuition that step should move far enough from its starting point to change the function enough, as predicted by its gradient. This often finds its place as a criterion for line-search.Curvature condition"Backtracking line search algorithmxThis is a building block for line search algorithms which reduces its step size until the given condition is satisfied.#backtrackingSearch gamma alpha pred" starts with the given step size alpha and reduces it by a factor of gamma) until the given condition is satisfied.)Armijo backtracking line search algorithmarmijoSearch gamma alpha c1! starts with the given step size alpha and reduces it by a factor of gamma* until the Armijo condition is satisfied.ZWolfe backtracking line search algorithm (satisfies both Armijo and curvature conditions)wolfeSearch gamma alpha c1! starts with the given step size alpha and reduces it by a factor of gamma> until both the Armijo and curvature conditions is satisfied./Line search by Newton's method01Line search by secant method with given toleranceConstant line searchconstantSearch c always chooses a step-size c. gradient of functionsearch directionstarting point step size-Armijo condition strengthfunction valuegradient of functionpoint to evaulate atsearch directionsearch step sizeis Armijo condition satisfied?.curvature condition strength c2gradient of functionpoint to evaluate atsearch directionsearch step size is curvature condition satisfied step size reduction factor gammainitial step size alphasearch condition step size reduction factor gammainitial step size alphaArmijo condition strength c1function value step size reduction factor gammainitial step size alphaArmijo condition strength c1curvature condition strength c2function value/0 step size -./0NoneA beta expression beta df0 df1 pQ is an expression for the conjugate direction contribution given the derivative df0 and direction p for iteration k, df1 for iteration k+1 @Conjugate gradient method with given beta and line search methodThe conjugate gradient method avoids the trouble encountered by the steepest descent method on poorly conditioned problems (e.g. those with a wide range of eigenvalues). It does this by choosing directions which satisfy a condition of AV orthogonality, ensuring that steps in the "unstretched" search space are orthogonal.!#Fletcher-Reeves expression for beta"!Polak-Ribiere expression for beta#$Hestenes-Stiefel expression for beta line search methodbeta expressiongradient of functionstarting pointiterates!"#  !"# !"# !"#None$<Barzilai-Borwein 1988 is a non-monotonic optimization method$gradient of functionstarting point, x0second starting point, x1iterates$$$ None%Steepest descent methodsteepestDescent search f df x0 optimizes a function f with gradient df with step size schedule search starting from initial point x0aThe steepest descent method chooses the negative gradient of the function as its step direction.%line search methodgradient of functionstarting pointiterates%%% None&Mirror descent method. Originally described by Nemirovsky and Yudin and later elucidated by Beck and Teboulle, the mirror descent method is a generalization of the projected subgradient method for convex optimization. Mirror descent requires the gradient of a strongly convex function psi` (and its dual) as well as a way to get a subgradient for each point of the objective function f.&line search methodstrongly convex function, psidual of psigradient of functionstarting pointiterates&&& NoneN'0Broyden Fletcher Goldfarb Shanno (BFGS) method bfgs search df b0 x0 where b0F is the inverse Hessian (the identity is often a good initial value).'line search methodgradient of function!inverse Hessian at starting pointstarting pointiterates'''1   !"#$%&'()*+,-./ 0 1 23456789:;18gaCiPW8egxEgE12L1PNO-Optimization.Constrained.ProjectedSubgradient Optimization.Constrained.PenaltyOptimization.TrustRegion.NewtonOptimization.TrustRegion.Fista%Optimization.TrustRegion.Nesterov1983Optimization.LineSearch)Optimization.LineSearch.ConjugateGradient'Optimization.LineSearch.BarzilaiBorwein'Optimization.LineSearch.SteepestDescent%Optimization.LineSearch.MirrorDescentOptimization.LineSearch.BFGS ConstraintConstr StepSched projSubgradlinearProjSubgradoptimalStepSchedconstStepSchedsqrtKStepSched invKStepSchedlinearProjectionOptFUrunFUoptimize constrainEQ constrainLT constrainGTminimizemaximize lagrangiannewtonbicInv'bicInvfistaoptimalGradient LineSearchbacktrackingSearch armijoSearch wolfeSearchconstantSearchBetaconjGradfletcherReeves polakRibierehestenesStiefelbarzilaiBorweinsteepestDescent mirrorDescentbfgs augLagrangianVaugment quadraticarmijo curvature newtonSearch secantSearch