Multipliers correction methods

In this page, a class of feasible algorithms is presented, called multipliers correction methods, for solve the following optimization problems on Stiefel manifold.

\[ \begin{aligned} \min\limits_{X \in \mathbb{R}^{n \times p}} \hspace{2mm} & f (X) \\ \mathrm{s.t.} \hspace{3.5mm} & X \in \mathcal{S}_{n, p}. \end{aligned} \qquad \qquad \mathrm{(OCP)} \]

Here \(f(X)\) is a locally Lipschitz smooth function in \(\mathbb{R}^{n\times p}\).

ProxOrth 

The motivation of multipliers correction methods comes from the optimality condition of (OCP), which can be expressed as

\[ \begin{equation*} \left\{ \begin{aligned} & (I_n - XX^{\top}) \nabla f (X) = 0, \\ & X^{\top} \nabla f(X) = \nabla f (X)^{\top} X, \\ & X^{\top} X = I_p. \end{aligned} \right. \end{equation*} \]

  • The first equality stands for the stationarity of the gradient in the null space of \(X^{\top}\).

  • The second equality determines the symmetry of Lagrangian multipliers associated with orthogonality constraints.

  • The third equality represents the feasibility of \(X\).

For convenience, these three equalities are called ‘‘sub-stationarity“, ‘‘symmetry” and ‘‘feasibility", respectively.

FOForth

Inspired by the above optimality condition, the authors in [1] propose a firsr-order algorithmic framework FOForth to solve the problem (OCP), which is tailored for the following structured objective functions.

\(f (X) = h (X) + \mathrm{tr} (G^{\top} X)\).

  • \(G \in \mathbb{R}^{n \times p}\) is a constant matrix.

  • \(h (X)\) is invariant invariant under the orthogonal transformation, i.e., \(h(XQ) = h(X)\) holds for any \(Q \in \mathcal{S}_{p,p}\).

  • \(\nabla h(X) = H(X) X\), where \(H:\mathbb{R}^{n \times p} \to \mathbb{S}^{n}\) and \(\mathbb{S}^{n}\) refers to the set of \(n \times n\) symmetric matrices.

Algorithm Development

FOForth consists of two steps: reducing the function value in proportion to the ‘‘sub-stationarity“ violation and preserving the ‘‘symmetry”. During the calculations of these two steps, the ‘‘feasibility" is maintained all the time.

Function value reduction step

Let \(X_{k} \in \mathcal{S}_{n, p}\) be the current iterate. The function value reduction step is trying to find a feasible intermediate point \(Y_{k} \in \mathcal{S}_{n, p}\) satisfying the following sufficient function value reduction condition:

\[ \begin{equation*} f ( X_{k} ) - f ( Y_{k} ) \geq c \| ( I_n - X_{k} X_{k}^{\top} ) \nabla f ( X_{k}) \|_{\mathrm{F}}^2, \end{equation*} \]

where \(c > 0\) is a constant. The right hand side is in proportion to the squared Frobenius norm of ‘‘sub-stationarity" violation at \(X_{k}\). In [1], the authors introduce two algorithms to achieve the sufficient function value reduction.

FOForth 
  • Gradient Reflection (GR). GR takes the reflection point of the current iterate \(X_{k}\) on the null space of \(X_{k} - \tau \nabla f(X_{k})\), which can be calculated by the Householder transformation.

\[ \begin{equation*} \left\{ \begin{aligned} & V_{k} = X_{k} - \tau \nabla f (X_{k}) \mbox{  for a fixed chosen step size  } \tau > 0, \\ & Y_{k} = (- I_n + 2 V_{k} (V_{k}^{\top} V_{k})^{\dagger} V_{k}^{\top}) X_{k}. \end{aligned} \right. \end{equation*} \]

  • Gradient Projection (GP). GP directly projects \(X_{k} - \tau \nabla f (X_{k})\) onto the Stiefel manifold, which can be calculated by the following projection.

\[ \begin{equation*} %\textbf{GP:} \quad \left\{ \begin{aligned} & V_{k} = X_{k} - \tau \nabla f (X_{k}) \mbox{  for a fixed chosen step size  } \tau > 0, \\ & Y_{k} = {\cal P}_{\mathcal{S}_{n, p}} (V_{k}). \end{aligned} \right. \end{equation*} \]

Correction step

The intermediate point \(Y_{k} \in \mathcal{S}_{n, p}\) obtained in the previous step does not necessarily satisfy the ‘‘symmetry" equality \(Y_{k}^{\top} \nabla f(Y_{k}) = \nabla f(Y_{k})^{\top} Y_{k}\). Therefore, the authors in [1] introduce a correction step for FOForth.

To be specific, the next iterate \(X_{k + 1}\) is obtained through a rotation on \(Y_{k}\). Namely, \(X_{k + 1} = Y_{k} Q^{\ast}\) and \(Q^{\ast} \in \mathcal{S}_{p, p}\) is the global minimizer of the following subproblem:

\[ \begin{equation*} \min_{Q \in \mathcal{S}_{p, p}} \hspace{2mm} \mathrm{tr} \left( (Y_{k}Q)^{\top} G \right), \end{equation*} \]

It is straightforward to obtain that \(Q^{\ast} := -U V^{\top}\), where \(U \in \mathbb{R}^{p \times p}\) and \(V \in \mathbb{R}^{p \times p}\) come from the singular value decomposition \(Y_{k}^{\top} G = U \Sigma V^{\top}\).

Code

You can find the description of codes here.

ProxOrth

Since FOForth can only deal with a specific class of objective functions, the authors in [2] propose an improved algorithmic framework ProxOrth for generic smooth objectives.

Algorithm Development

ProxOrth also consists of two steps: function value reduction step and proximal correction step. The former one searches along an arbitrary descent direction in the Euclidean space instead of a vector in the tangent space of the Stiefel manifold. Meanwhile, the latter one minimizes a first-order proximal approximation of the objective function in the range space of the current iterate to make Lagrangian multipliers associated with orthogonality constraints symmetric at any accumulation point.

Function value reduction step

Let \(X_{k} \in \mathcal{S}_{n, p}\) be the current iterate. The function value reduction step aims to find a feasible intermediate point \(Y_{k} \in \mathcal{S}_{n, p}\) such that the following sufficient function value reduction condition holds:

\[ \begin{equation*} f ( X_{k} ) - f ( Y_{k} ) \geq c \| ( I_n - X_{k} X_{k}^{\top} ) \nabla f ( X_{k}) \|_{\mathrm{F}}^2, \end{equation*} \]

where \(c > 0\) is a constant.

ProxOrth shares the same function value reduction step as FOForth. The details are omited here.

Proximal correction step

For generic smooth objectives, the ‘‘symmetry’’ condition can not be satisfied at each iterate. Alternatively, the authors in [2] design a proximal correction step to have the ‘‘symmetry’’ condition being achieved asymptotically. The next iterate \(X_{k + 1}\) is still obtained through a rotation on \(Y_{k}\). Namely, \(X_{k + 1} = Y_{k} Q^{\ast}\) and \(Q^{\ast} \in \mathcal{S}_{p, p}\) is the global minimizer of the following subproblem:

\[ \begin{equation*} \min_{Q \in \mathcal{S}_{p, p}} \hspace{2mm} \tilde{f}(Y_{k} Q), \end{equation*} \]

where

\[ \begin{equation*} \tilde{f}(X) := f(Y_{k}) + \left\langle \nabla f(Y_{k}), X - Y_{k} \right\rangle + \dfrac{\gamma}{2} \left\| X - Y_{k} \right\|_{\mathrm{F}}^2, \end{equation*} \]

and \(\gamma > 0\) is a constant. The above subproblem is equivalent to:

\[ \begin{equation*} \min_{Q \in \mathcal{S}_{p, p}} \hspace{2mm} g(Q) := \mathrm{tr} \left( Q^{\top} Z_{k} \right), \end{equation*} \]

where \(Z_{k} := Y_{k}^{\top} \nabla f(Y_{k}) - \gamma I_p\). If \(Z_{k} = 0\), the above subproblem is trivial and we choose \(X_{k + 1} = Y_{k}\). Otherwise, the global minimizer is \(Q^{\ast} := -U V^{\top}\), where \(U \in \mathbb{R}^{p \times p}\) and \(V \in \mathbb{R}^{p \times p}\) come from the singular value decomposition \(Z_{k} = U \Sigma V^{\top}\).

Code

You can find the description of codes here.

References

  1. B. Gao, X. Liu, X. Chen, and Y. Yuan, A New First-order Algorithmic Framework for Optimization Problems with Orthogonality Constraints, SIAM Journal on Optimization, 28-1(2018), 302–332.

  2. L. Wang, B. Gao, and X. Liu, Multipliers Correction Methods for Optimization Problems over the Stiefel Manifold, CSIAM Transactions on Applied Mathematics, 2-3(2021), 508-531.