Monday, August 7, 2017

global optimization


Global optimization is a set of problems. Most of them are NP-hard.
An algorithm is said to be of polynomial time if its running time is upper bounded by a polynomial expression in the size of the input, i.e. T(n) = O(n^k). Polynomial time is a synonym for tractable, feasible, efficient or fast.
Complexity class P: problems for which a deterministic polynomial time algorithm exists
NP: non-deterministic polynomial time. The answer is easy to verify, but may be difficult to compute.
e.g. given a set of integers, does some subset of them sum to 0? The algorithm to find such a subset takes exponential time: 2^n-n-1. Hence this problem is in NP (quickly checkable) but not necessarily in P (quickly solvable).
if P= NP, it means the problem can be solved in polynomial time and verified in polynomial time.
if P!=NP, it means there are problems in NP that are harder to compute than to verify.
A decision problem is NP-complete when it is both in NP and NP-hard. NP-complete problems are often addressed by using heuristic) methods and approximation algorithms
NP-hard: class of decision problems which are at least hard as the hardest problems in NP. NP-hard problem does not have to be elements of NP.

  • deterministic methods
  • stochastic methods
  • heuristics and metaheuristics: evolutionary algorithms

2 research groups

Ferenc Domes

Head of software development in DAGOPT at Austria, previous professor at Universtiy of Vienna

Zabinsky, Z.B.

She is a professor of Industrial Engineering at the University of Washington. Her expertise is in operation research, global optimization.
“Stochastic Adaptive Search for Global Optimization” in 2003
  • grid search
  • pure random search
  • simulated annealing
  • pure random search: samples from the domain independently
  • pure adaptive search: samples the next point from the subset of the domain with strictly superior objective function values. The expected number of iterations is linear in dimension.
The book focuses on stochastic methods for global optimization, the theoretical understanding of the performance. Consolidates a collection of papers on the analysis and development of stochastic adaptive search.
Ultimate goal (we are far from that!):
  • works for a large class of problems
  • finds the global optima with an absolute guarantee
  • uses very little computation
Typically the optimization technique is chosen to match the structure of the relevant problem. For example, a practitioner solving a linear program would select a different algorithm than if the problem was nonlinear and convex. He can take advantage of the linear structure of the problem. However, if the problem is nonlinear and multimodal, the practitioner would have difficulty selecting the best algorithms to use. Thus, for global optimization it is still an open research question as to the best choice of algorithm given a particular problem.
Global optimization algorithms are often classified as either deterministic or stochastic. Stochastic methods can be applied to global optimization problems with little-known structure, such as ‘black box‘ functions.
Random search methods have been shown to have a potential to solve large problems efficiently in a way that is not possible for deterministic algorithms. Estimating the volume of convex body takes an exponential number of function evaluations for any deterministic algorithm, but if one is willing to accept a weaker claim of being correct with an estimate that has a high probability of being correct, then a stochastic algorithm can provide such an estimate in polynomial time. Thus there is a trade-off between the amount of computation and the type of guarantee of optimality. This book relax the requirement of providing an absolute guarantee of the global optimum and instead are satisfied with a probablisitc estimate of the global optimum.
Another advantage to stochastic methods is that they are relatively was to implement on complex problems. Methods like simulated annealing, genetic algorithms and tabu search are widely used. They only rely on function evaluations, rather than gradient and Hessian information, so they can be coded and run quickly. But a disadvantage of these methods is that they are currently customized to each specific problem largely through trial and error, and there is little theory to support the quality of the solution.
The basic measure of performance is the number of iterations until first sampling within \epsilon of the global optimum.
pure adaptive search and hesitant adaptive search use an underlying sampling distribution that is restricted to sampling from improving sets, annealing adaptive search always samples from the original feasible region but modifies the sampling distribution so that the probalility of sampling in the improving level set is high.

2 papers

2015

R. H. Abiyev and M. Tunay, “Optimization of High-Dimensional Functions through Hypercube Evaluation,” Computational Intelligence and Neuroscience, vol. 2015, p. 11, 2015.
Hypercube optimization, including initialization and evaluation process, displacement-shrink process, and searching space process.
derivative based vs derivative free learning algorithms:
  • local minima
  • computationally expensive
HO is an evolutionary algorithm, like the dove flies up, chooses the previously marked areas, changes and shrinks the size of the search area.
  • hypercube is used to describe the search area, represented by center and size(radii)
  • evaluate of a set of points randomly distributed in an -m-dimenional hypercube.
  • the point shifts and contracts
  • intense stochastic search
  • initial sampling has to be sufficiently dense so as to probe all the possible zones of higher and lower values;

1997

R. Storn and K. Price, “Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces,” Journal of Global Optimization, vol. 11, pp. 341-359, 1997/12/01 1997.
objective function is more accurately called a cost function.
central to every direct search method is a stategy that generates variations of the parameter vectors.
Most standard direct search methods use the greedy criterion: a new parameter vector is accepted if and only if it reduces the value of cost function. It converges fairly fast but has the risk of becoming trapped in a local minimum.
parallel search techniques like genetic algorithms and evolution strategies can help escape local minima.
Annealing algorithms relax the greedy criterion by occasionally permitting an uphill move, which could help a potential parameter vector to climb out of a local minimum. The uphill probability also decreases as the iteration increases. This lead to greedy criterion in the long run.
Differential Evolution (DE) is a parallel direct search method. It has 3 steps:
  1. mutation
  2. trial vector
  3. selection