Multipliers correction methodsIn this page, a class of feasible algorithms is presented, called multipliers correction methods, for solve the following optimization problems on Stiefel manifold. \[ \begin{aligned} \min\limits_{X \in \mathbb{R}^{n \times p}} \hspace{2mm} & f (X) \\ \mathrm{s.t.} \hspace{3.5mm} & X \in \mathcal{S}_{n, p}. \end{aligned} \qquad \qquad \mathrm{(OCP)} \] Here \(f(X)\) is a locally Lipschitz smooth function in \(\mathbb{R}^{n\times p}\).
FOForthInspired by the above optimality condition, the authors in [1] propose a firsr-order algorithmic framework FOForth to solve the problem (OCP), which is tailored for the following structured objective functions. \(f (X) = h (X) + \mathrm{tr} (G^{\top} X)\).
Algorithm DevelopmentFOForth consists of two steps: reducing the function value in proportion to the ‘‘sub-stationarity“ violation and preserving the ‘‘symmetry”. During the calculations of these two steps, the ‘‘feasibility" is maintained all the time. Function value reduction stepLet \(X_{k} \in \mathcal{S}_{n, p}\) be the current iterate. The function value reduction step is trying to find a feasible intermediate point \(Y_{k} \in \mathcal{S}_{n, p}\) satisfying the following sufficient function value reduction condition: \[ \begin{equation*} f ( X_{k} ) - f ( Y_{k} ) \geq c \| ( I_n - X_{k} X_{k}^{\top} ) \nabla f ( X_{k}) \|_{\mathrm{F}}^2, \end{equation*} \] where \(c > 0\) is a constant. The right hand side is in proportion to the squared Frobenius norm of ‘‘sub-stationarity" violation at \(X_{k}\). In [1], the authors introduce two algorithms to achieve the sufficient function value reduction.
Correction stepThe intermediate point \(Y_{k} \in \mathcal{S}_{n, p}\) obtained in the previous step does not necessarily satisfy the ‘‘symmetry" equality \(Y_{k}^{\top} \nabla f(Y_{k}) = \nabla f(Y_{k})^{\top} Y_{k}\). Therefore, the authors in [1] introduce a correction step for FOForth. To be specific, the next iterate \(X_{k + 1}\) is obtained through a rotation on \(Y_{k}\). Namely, \(X_{k + 1} = Y_{k} Q^{\ast}\) and \(Q^{\ast} \in \mathcal{S}_{p, p}\) is the global minimizer of the following subproblem: \[ \begin{equation*} \min_{Q \in \mathcal{S}_{p, p}} \hspace{2mm} \mathrm{tr} \left( (Y_{k}Q)^{\top} G \right), \end{equation*} \] It is straightforward to obtain that \(Q^{\ast} := -U V^{\top}\), where \(U \in \mathbb{R}^{p \times p}\) and \(V \in \mathbb{R}^{p \times p}\) come from the singular value decomposition \(Y_{k}^{\top} G = U \Sigma V^{\top}\). CodeYou can find the description of codes here. ProxOrthSince FOForth can only deal with a specific class of objective functions, the authors in [2] propose an improved algorithmic framework ProxOrth for generic smooth objectives. Algorithm DevelopmentProxOrth also consists of two steps: function value reduction step and proximal correction step. The former one searches along an arbitrary descent direction in the Euclidean space instead of a vector in the tangent space of the Stiefel manifold. Meanwhile, the latter one minimizes a first-order proximal approximation of the objective function in the range space of the current iterate to make Lagrangian multipliers associated with orthogonality constraints symmetric at any accumulation point. Function value reduction stepLet \(X_{k} \in \mathcal{S}_{n, p}\) be the current iterate. The function value reduction step aims to find a feasible intermediate point \(Y_{k} \in \mathcal{S}_{n, p}\) such that the following sufficient function value reduction condition holds: \[ \begin{equation*} f ( X_{k} ) - f ( Y_{k} ) \geq c \| ( I_n - X_{k} X_{k}^{\top} ) \nabla f ( X_{k}) \|_{\mathrm{F}}^2, \end{equation*} \] where \(c > 0\) is a constant. ProxOrth shares the same function value reduction step as FOForth. The details are omited here. Proximal correction stepFor generic smooth objectives, the ‘‘symmetry’’ condition can not be satisfied at each iterate. Alternatively, the authors in [2] design a proximal correction step to have the ‘‘symmetry’’ condition being achieved asymptotically. The next iterate \(X_{k + 1}\) is still obtained through a rotation on \(Y_{k}\). Namely, \(X_{k + 1} = Y_{k} Q^{\ast}\) and \(Q^{\ast} \in \mathcal{S}_{p, p}\) is the global minimizer of the following subproblem: \[ \begin{equation*} \min_{Q \in \mathcal{S}_{p, p}} \hspace{2mm} \tilde{f}(Y_{k} Q), \end{equation*} \] where \[ \begin{equation*} \tilde{f}(X) := f(Y_{k}) + \left\langle \nabla f(Y_{k}), X - Y_{k} \right\rangle + \dfrac{\gamma}{2} \left\| X - Y_{k} \right\|_{\mathrm{F}}^2, \end{equation*} \] and \(\gamma > 0\) is a constant. The above subproblem is equivalent to: \[ \begin{equation*} \min_{Q \in \mathcal{S}_{p, p}} \hspace{2mm} g(Q) := \mathrm{tr} \left( Q^{\top} Z_{k} \right), \end{equation*} \] where \(Z_{k} := Y_{k}^{\top} \nabla f(Y_{k}) - \gamma I_p\). If \(Z_{k} = 0\), the above subproblem is trivial and we choose \(X_{k + 1} = Y_{k}\). Otherwise, the global minimizer is \(Q^{\ast} := -U V^{\top}\), where \(U \in \mathbb{R}^{p \times p}\) and \(V \in \mathbb{R}^{p \times p}\) come from the singular value decomposition \(Z_{k} = U \Sigma V^{\top}\). CodeYou can find the description of codes here. References
|