Line search in D

Author: Dr Chibisi Chima-Okereke Created: October 20, 2020 20:57:44 GMT Published: October 24, 2020 15:30:00 GMT

Introduction

This article is a follow-up to a previous article on bracketing functions in numeric optimization problems, in which two techniques for creating initial brackets or sections from a point were presented including several methods for refining the brackets to give progressively narrower sections. In this article, we take this one step further and show how to incorporate interpolation methods discussed in the previous article to line search methods.

Line search is a technique used to optimize steps in iterative multivariable optimization problems where $f(\mathbf{x})$ is the function to be minimized, and $\mathbf{x}$ is the vector of parameters we are concerned with. The step length given by $\alpha$ is applied to the function at step $k$ as $f(\mathbf{x}_{k} + \alpha \mathbf{d}_{k})$, and $\mathbf{d}_{k}$ is the given direction at that step. We can represent the line search problem as one to minimize $\phi(\alpha)$ with respect to the step length $\alpha$ where $\phi(\alpha) = f(\mathbf{x}_{k} + \alpha \mathbf{d}_{k})$. Rather that treating this process as a standard optimization problem, which would be computationally expensive, we use inexact line search, which combines the sufficient decrease condition and Wolfe’s strong curvature condition to ensure that the value of step length we choose lies close to the actual minimum. These conditions are presented in the Wikipedia article on Wolfe’s condition and are given below:

Sufficient decrease condition $f(\mathbf{x}_{k} + \alpha_{k} \mathbf{d}_{k}) \leq f(\mathbf{x}_{k}) + c_{1}\alpha_{k}\mathbf{d}_{k}^{T} \nabla{f(\mathbf{x}_{k})}$
Strong Wolfe curvature condition $\lvert\mathbf{d}_{k}^{T} \nabla{f(\mathbf{x}_{k} + \alpha_{k} \mathbf{d}_{k})}\rvert \leq c_{2}\lvert\mathbf{d}_{k}^{T} \nabla{f(\mathbf{x}_{k})}\rvert $

We shall present two line search techniques in this article. The first is the backtracking line search, which fulfills only the sufficient decrease condition, and the other is the strong Wolfe line search which fulfills both the sufficient decrease and the strong curvature condition. Implementation of the latter line search method is a modifed form of that given in algorithm 4.3 of Algorithms for Optimization by M. J. Kochenderfer and T. A. Wheeler.

Preliminaries

First we load math functions and a print function:

import std.stdio: writeln;
import core.stdc.math: isnan, 
                       pow = powf, pow, pow = powl,
                       abs = fabsf, abs = fabs, abs = fabsl,
                       sqrt = sqrtf, sqrt, sqrt = sqrtl;

Then implement a simple floating point type enum template, a map and a reduce function. The map function takes in two arrays and the input function is a function of elements in the two arrays. I am in the habit of implementing these but similar function can be found in the D’s standard library. The map function here is quite different from that in the standard library and may have more in common with R’s mapply function function but with only two arrays.

template isFloatingPoint(T)
{
  enum isFloatingPoint = is(T == float) || is(T == double) || is(T == real);
}

auto map(alias fun, T, U...)(T[] arr1, T[] arr2, U u)
{
  size_t n = arr1.length;
  auto first = fun(arr1[0], arr2[0], u);
  alias R = typeof(first);
  R[] ret = new R[n];
  ret[0] = first;
  if(n > 1)
  {
    for(long i = 1; i < n; ++i)
    {
      ret[i] = fun(arr1[i], arr2[i], u);
    }
  }
  return ret;
}
auto reduce(alias fun, T, U...)(T[] arr, U u)
{
  size_t n = arr.length;
  auto ret = fun(arr[0], arr[1], u);
  if(n > 2)
  {
    for(long i = 2; i < n; ++i)
    {
      ret = fun(ret, arr[i]);
    }
  }
  return ret;
}

Below is an enumeration to distinguish implementations of backtracking and strong Wolfe line searches:

enum Search {Backtracking, StrongWolfe}
alias Backtracking = Search.Backtracking;
alias StrongWolfe = Search.StrongWolfe;

The candidate functions we will be working with are:

$$ f(\mathbf{x}) = 5 + x_1^2 + x_2^2 $$

$$ \nabla{f(\mathbf{x})} = [2x_{1}, 2x_{2}] $$

auto f1 = (double[] x) => 5 + pow(x[0], 2.0) + pow(x[1], 2.0);
auto df1 = (double[] x) => [2*x[0], 2*x[1]];

and

$$ f(\mathbf{x}) = x_1^4 + x_1^2 + x_2^2 $$

$$ \nabla{f(\mathbf{x})} = [4x_{1}^3 + 2x_{1}, 2x_{2}] $$

auto f2 = (double[] x) => x[0]^^4 + x[0]^^2 + x[1]^^2;
auto df2 = (double[] x) => [4*(x[0]^^3) + 2*x[0], 2*x[1]];

Backtracking line search

The backtracking line search is fairly simple. We check if the sufficient decrease condition is met and if not we decrease the value of $\alpha$ by the factor $c_{2}$. The implementation is given below:

T lineSearch(Search search, alias fun, alias dfun, alias c1 = 1E-4, alias c2 = 0.5, T, U...)(T alpha, T[] x, T[] d, U u)
if((search == Backtracking) && isFloatingPoint!(T) && isFloatingPoint!(typeof(c1)) && isFloatingPoint!(typeof(c2)))
{
  T f0 = fun(x, u);
  T[] g0 = dfun(x, u);
  
  T[] x_new = x.dup;
  x_new[] += alpha*d[];
  T f_new = fun(x_new, u);
  T wolfe1 = f0 + c1*alpha*reduce!((a, b) => a + b)(map!((a, b) => a*b)(g0, d));
  
  while(f_new > wolfe1)
  {
    x_new = x.dup;
    x_new[] += alpha*d[];
    f_new = fun(x_new, u);
    alpha *= c2;
  }
  return alpha;
}

$c_{1} = 10^{-4}$ and $c_{2} = 0.5$ are standard values for backtracking line search. In this implementation, $c_{1}$ corresponds to the same constant as in the sufficient decrease condition, but $c_{2}$ *is not the same as the constant for the curvature condition*, I have repurposed $c_{2} \in (0, 1) $ which is now just the decrease multiplier for the next candidate of $\alpha$. Also, $c_{1} \in (0, 1) $.

The calls and outputs for the given objective functions with backtracking line search are:

auto alpha = lineSearch!(Backtracking, f1, df1)(1.0, [-1.0, -1.0], [1.0, 0.0]);
writeln("Backtracking line search alpha: ", alpha);
alpha = lineSearch!(Backtracking, f2, df2)(1.0, [1.0, 1.0], [-3, -1]);
writeln("Backtracking line search alpha: ", alpha);

outputs:

Backtracking line search alpha: 1
Backtracking line search alpha: 0.25

Strong Wolfe line search

The strong Wolfe line search was adapted from the algorithm referenced in the introduction section. The major difference is that we include a choice of interpolation methods as discussed in the bracketing article:

enum Interpolation {Bisection, Quadratic, Cubic}
alias Bisection  =  Interpolation.Bisection;
alias Quadratic  =  Interpolation.Quadratic;
alias Cubic      =  Interpolation.Cubic;

The implementation of the interpolation functions is given below, they are adapted from the previous article:

T interpolation(Interpolation method, alias phi, alias dphi, T, U...)(T[] x, T[] d, T a, T b, U u)
if(isFloatingPoint!(T) && (method == Bisection))
{
  debug writeln("Interpolation type: ", method);
  return (a + b)/2;
}

T interpolation(Interpolation method, alias phi, alias dphi, T, U...)(T[] x, T[] d, T a, T b, U u)
if(isFloatingPoint!(T) && (method == Quadratic))
{
  debug writeln("Interpolation type: ", method);
  T fa = phi(x, d, a, u);
  T fb = phi(x, d, b, u);
  T db = dphi(x, d, b, u);
  return b - ((b - a)*(db*0.5))/(db - ((fb - fa)/(b - a)));
}

T interpolation(Interpolation method, alias phi, alias dphi, T, U...)(T[] x, T[] d, T a, T b, U u)
if(isFloatingPoint!(T) && (method == Cubic))
{
  debug writeln("Interpolation type: ", method);
  auto fa = phi(x, d, a, u); auto fb = phi(x, d, b, u);
  auto da = dphi(x, d, a, u); auto db = dphi(x, d, b, u);
  T s = (3*(fb - fa))/(b - a);
  T z = s - da - db;
  T w = sqrt(z*z - da*db);
  return a + ((b - a)*(w - da - z))/(db - da + 2*w);
}

and the implementation of the strong Wolfe line search is given below:

T lineSearch(Interpolation interp, Search search, alias fun, alias dfun, alias c1 = 1E-4, alias c2 = 0.9, T, U...)(T alpha, T[] x, T[] d, U u)
if((search == StrongWolfe) && isFloatingPoint!(T) && isFloatingPoint!(typeof(c1)) && isFloatingPoint!(typeof(c2)))
{
  T f0 = fun(x, u);
  T[] df0 = dfun(x, u);
  T g0 = reduce!((a, b) => a + b)(map!((a, b) => a*b)(df0, d));
  T f_prev = T.nan;
  T alpha_prev = 0;
  
  T alpha_lo = T.nan;
  T alpha_hi = T.nan;
  T wolfe1 = f0 + c1*alpha*g0;
  T wolfe2 = c2*abs(g0); 
  T[] x_new; T f, g;
  
  while(true)
  {
    x_new = x.dup;
    x_new[] += alpha*d[];
    f = fun(x_new, u);
    if((f > wolfe1) || (!isnan(f_prev) && (f >= f_prev)))
    {
      alpha_lo = alpha_prev;
      alpha_hi = alpha;
      break;
    }
    g = reduce!((a, b) => a + b)(map!((a, b) => a*b)(dfun(x_new, u), d));
    if(abs(g) <= wolfe2)
    {
      return alpha;
    }else if(g >= 0)
    {
      alpha_lo = alpha;
      alpha_hi = alpha_prev;
      break;
    }
    f_prev = f;
    alpha_prev = alpha;
    alpha = 2*alpha;
  }
  
  T[] x_lo = x.dup;
  x_lo[] += alpha_lo*d[];
  T f_lo = fun(x_lo, u);
  
  //Functions for phi(alpha) declared here
  auto phi = (T[] x, T[] d, T alpha){
    T[] x_alpha = x.dup;
    x_alpha[] += alpha*d[];
    return fun(x_alpha, u);
  };
  
  auto dphi = (T[] x, T[] d, T alpha){
    T[] x_alpha = x.dup;
    x_alpha[] += alpha*d[];
    return reduce!((a, b) => a + b)(map!((a, b) => a*b)(dfun(x_alpha, u), d));
  };

  
  while(true)
  {
    // interpolation functions called here
    alpha = interpolation!(interp, phi, dphi)(x, d, alpha_lo, alpha_hi, u);
    x_new = x.dup;
    x_new[] += alpha*d[];
    f = fun(x_new, u);
    if((f > wolfe1) || (f > f_lo))
    {
      alpha_hi = alpha;
    }else{
      g = reduce!((a, b) => a + b)(map!((a, b) => a*b)(dfun(x_new, u), d));
      if(abs(g) <= wolfe2)
      {
        return alpha;
      }else if(g*(alpha_hi - alpha_lo) >= 0)
      {
        alpha_hi = alpha_lo;
      }
      alpha_lo = alpha;
    }
  }
}

In this case $0 \lt c_{1} \lt c_{2} \lt 1$ and both $c_{1}$ and $c_{2}$ correspond to the constants in the sufficient decrease and curvature conditions in the introduction. The calls and outputs for the line search is given as:

alpha = lineSearch!(Quadratic, StrongWolfe, f1, df1)(2, [-1.0, -1.0], [1.0, 0.0]);
writeln("StrongWolfe line search alpha: ", alpha);
alpha = lineSearch!(Cubic, StrongWolfe, f2, df2)(1.0, [1.0, 1.0], [-3, -1]);
writeln("StrongWolfe line search alpha: ", alpha);

output:

StrongWolfe line search alpha: 1
StrongWolfe line search alpha: 0.471638

Conclusion

Line search can be a tricky algorithm to implement, it is a confluence of interpolation, bracketing, and Wolfe’s conditions. This article presents a possible approach to implementing two inexact line searches, the simple backtracking line search, and the standard more involved strong Wolfe’s line search.

Implementing these line searches lays the foundation for implementing certain optimizers, for example Newton and Quasi-Newton optimizers. Since these articles on optimization and other numerical analysis topics are going quite well and are enjoyable to implement and write about, some future articles will hopefully cover the implementation of various numerical optimizers in D.

Thank you for your time.