CVX notes

2018/07/31 21:54
阅读数 16

CVX notes

[TOC]


Preliminaries

1. PSD

M is positive semidefinite matrix $\iff$ all principal submatrices $P$ of $M$ are PSD

Note: This follows by considering the quadratic form $x^T Mx$ and looking at the components of $x$ corresponding to the defining subset of principal submatrix. The converse is trivially true.

M is PSD $\iff$ all principal minors are non-negative (所有主子式非负)

将M写成二次型: $$ x^T M x = \sum_{i,j}M_{ij}x_ix_j $$ 于是取 $x$ 为标准基 $e_i ~\implies M_{ii} \ge 0 \implies \mathbf{tr}(M) \ge 0$ , 再取$x$为零向量只有 i,j两个位置为 1,则 $$ \begin{gathered} x^T M x = M_{ii}M_{jj} - M_{ij}^2 \ge 0 ~~(PSD) \ \implies M_{ij} \le \sqrt{M_{ii}M_{jj}} \le \frac{M_{ii} + M_{jj}}{2} \end{gathered} $$


2. Matrix norm

General definition of a norm:![2018-06-05 20-39-34 的屏幕截图](/home/yychi/Pictures/2018-06-05 20-39-34 的屏幕截图.png)

Matrix norm:![2018-06-05 20-44-07 的屏幕截图](/home/yychi/Pictures/2018-06-05 20-44-07 的屏幕截图.png)

  • Frobenius norm: $|A|_F := \sqrt{\langle A,A\rangle_F} = \sqrt{\mathbf{tr}(A^*A)}$
  • Induced norm: $|A|p := \sup\limits{|x|_p = 1} |Ax|_p$
  • Nuclear norm: $|A|_{nuclear} := \sum \sigma_i(A)$ (奇异值之和)
  • Spectral norm: $|A|_{spectral} := \lambda_1$ (最大特征值)

Spectrial radius

![2018-06-05 21-01-46 的屏幕截图](/home/yychi/Pictures/2018-06-05 21-01-46 的屏幕截图.png)

![2018-06-05 21-02-47 的屏幕截图](/home/yychi/Pictures/2018-06-05 21-02-47 的屏幕截图.png)


3. Duality

Two ==equivalent== ways to represent a convex set:

  • standard representation: The family of points in the set
  • dual representation: The set of halfspaces containing the set (半平面的交集)

A closed convex set $S$ is the intersection of all closed halfspaces $H$ containing it.![2018-06-05 21-27-43 的屏幕截图](/home/yychi/Pictures/2018-06-05 21-27-43 的屏幕截图.png)

![2018-06-05 21-29-20 的屏幕截图](/home/yychi/Pictures/2018-06-05 21-29-20 的屏幕截图.png)

![2018-06-05 21-31-13 的屏幕截图](/home/yychi/Pictures/2018-06-05 21-31-13 的屏幕截图.png)

Polar Let $S \subseteq \mathbb{R}^n$ be a convex set containing the origin. The polar of $S$ is defined as follows $$ S^{\circ} := {y ~|~ y^Tx \le 1, ~\forall x \in S} $$

<u>Note</u>

  • polar is one way of representing the all halfspaces containing a convex set
  • every halfspace $a^Tx \le b$ with $b \neq 0$ can be written as a “normalized” inequality $y^T x \le 1$, by dividing by $b$
  • $S^{\circ}$ can be thought of as the normalized representations of halfspaces containing $S$

Properties of the polar:

  1. $S^{\circ\circ} = S$
  2. $S^{\circ}$ is a closed convex set containing the origin
  3. When 0 is in the interior of $S$, then $S^{\circ}$ is bounded
  4. When $S$ is non-convex, $S^{\circ} = (\mathbf{conv}(S))^{\circ}$, and $S^{\circ\circ} = \mathbf{conv}(S)$

![2018-06-05 22-03-51 的屏幕截图](/home/yychi/Pictures/2018-06-05 22-03-51 的屏幕截图.png)

![2018-06-05 22-04-27 的屏幕截图](/home/yychi/Pictures/2018-06-05 22-04-27 的屏幕截图.png)

Polar duality of convex cones![2018-06-05 22-17-25 的屏幕截图](/home/yychi/Pictures/2018-06-05 22-17-25 的屏幕截图.png)

![2018-06-05 22-18-54 的屏幕截图](/home/yychi/Pictures/2018-06-05 22-18-54 的屏幕截图.png)

<u>Notes</u>

  1. $K^{\circ\circ} = K$
  2. $K^{\circ}$ is closed and convex

Conjugation of convex functions Let $f: \mathbb{R}^n \mapsto \mathbb{R}\cup{\infty}$ be a convex function. The ==conjugation== of $f$ is $$ f^*(y) := \sup_\limits{x}(y^Tx - f(x)) $$ ![2018-06-05 23-57-38 的屏幕截图](/home/yychi/Pictures/2018-06-05 23-57-38 的屏幕截图.png)

Properties of the conjugate

  1. $f^{**} = f$
  2. $f^*$ is convex (supremum of affine functions of $y$)

![2018-06-06 00-03-01 的屏幕截图](/home/yychi/Pictures/2018-06-06 00-03-01 的屏幕截图.png)


Convex sets


Convex functions

![2018-05-14 09-26-39 的屏幕截图](/home/yychi/Pictures/2018-05-14 09-26-39 的屏幕截图.png)

  1. <u>affine</u> is convex: $f(x) = a^T x+b$

    affine 既凸也凹

  2. 任何_<u>范数</u>_是凸的

    Proof: let $\pi(x)$ be a norm of $x$, then![2018-05-14 09-32-58 的屏幕截图](/home/yychi/Pictures/2018-05-14 09-32-58 的屏幕截图.png)

  3. $f$ is convex $\iff$ epi($f$) is convex


1. Closed convex

A convex function $f$ is called closed if its epigraph is a closed set.

  1. $f$ which is convex and continuous on a closed domain is a closed function. (norms)
  2. all differentiable convex functions are closed with dom$f = \mathbb{R}^n$.
  3. 当考虑一个凸函数时,通常认为在dom$f$外取值为$\infty$
  4. <u>Jensen's inequality</u>:![2018-05-14 09-52-18 的屏幕截图](/home/yychi/Pictures/2018-05-14 09-52-18 的屏幕截图.png)
  5. Corollary:![2018-06-04 18-35-09 的屏幕截图](/home/yychi/Pictures/2018-06-04 18-35-09 的屏幕截图.png)

    pf: $f(x) = f(\sum\alpha_i x_i) \le \sum \alpha_i f(x_i) \le \max_\limits{i} f(x_i)$


2. Level sets

![2018-05-14 09-57-20 的屏幕截图](/home/yychi/Pictures/2018-05-14 09-57-20 的屏幕截图.png)

Note: the convexity of level sets does not characterize convex functions, but quasiconvex functions.

  1. convex $f$ is closed $\implies$ all its level sets are closed

Some convex sets

  1. norm ball (${x\in \mathbb{R}^n | |x| \le 1}$) is convex and closed
  2. 椭球(${x | (x-a)^T Q (x-a) \le r^2}$) is convex and closed

    pf: $x^TQy := \langle x, y \rangle$ 满足内积的三条性质

    • bilinearity
    • symmetry
    • positivity 上述三条性质 $\iff$ Q is PSD
  3. $\epsilon$-neighborhood: ![2018-06-04 20-56-40 的屏幕截图](/home/yychi/Pictures/2018-06-04 20-56-40 的屏幕截图.png)

3. Operations perserving convexity of functions

  1. stability under taking weighted sums: $f,g \mapsto \lambda f + \mu g, ; \lambda,\mu \ge 0$
  2. stability under affine substitutions of the argument: $x \mapsto Ax+b$ or $f(x) \mapsto \phi(x) = f(Ax+b)$
  3. stability under taking pointwise sup: ${f_i}{i \in \mathcal{I}} \mapsto g(x) := \sup\limits{i \in \mathcal{I}}f_i(x)$, 凸函数族 ${f_i}_{i \in \mathcal{I}}$ 逐点取上确界而成的函数也是凸的
  4. stability under partial minimization: $f(x,y)$ jointly convex in $(x,y)$, then $g(x) = \inf_\limits{y} f(x,y)$ is convex (suppose g is proper, i.e., > -$\infty$ everywhere and is finite at least at one point)
  5. stability under perspective: $f(x) \mapsto g(x,t) = tf(x/t), \mathbf{dom}g = {(x,t) | x/t \in \mathbf{dom}f, t > 0}$

4. Detect convexity

<u>Necessary and Sufficient Convexity Condition for smooth function:</u>

  • 一阶可微的光滑函数 $f$ 是凸的 $\iff$ $f'(x)$ 单调非减
  • 二阶可微的光滑函数 $f$ 是凸的 $\iff$ $f''(x) \ge 0$

<u>subgradient property is characteristic of convex functions:</u>![2018-06-05 08-38-57 的屏幕截图](/home/yychi/Pictures/2018-06-05 08-38-57 的屏幕截图.png)


5. Subgradient

Examples

  • ![2018-06-05 10-08-50 的屏幕截图](/home/yychi/Pictures/2018-06-05 10-08-50 的屏幕截图.png)
  • ![2018-06-05 10-09-54 的屏幕截图](/home/yychi/Pictures/2018-06-05 10-09-54 的屏幕截图.png)
  • ![2018-06-05 10-10-54 的屏幕截图](/home/yychi/Pictures/2018-06-05 10-10-54 的屏幕截图.png)
  • ![2018-06-05 10-08-04 的屏幕截图](/home/yychi/Pictures/2018-06-05 10-08-04 的屏幕截图.png)

6. Optimality conditions

凸函数的局部最优等价于全局最优。

<u>第一充要条件(凸函数)</u> $x^* \in \mathbf{dom}f​$ is the minimizer $\iff​$ $0 \in \partial f(x^*)​$


7. Strong convexity

A differentiable function f is strongly convex if $$ f(y) \ge f(x) + \nabla f(x)^T(y-x) + \frac{\mu}{2} |y-x|^2 $$

<u>Note</u>

  1. $f$ is not necessarily differentiable, (see the equivalent definition)
  2. if $f$ is non-smooth, gradient -> subgradient
  3. strong convexity $\implies$ strict convexity

Note: Intuitively speaking, strong convexity means that there exists a quartic lower bound on the growth of the function.

Equivalent definition $$ \begin{align} &(i)~f(y)\ge f(x)+\nabla f(x)^T(y-x)+\frac{\mu}{2}\lVert y-x \rVert^2,~\forall x, y.\ &(ii)~g(x) = f(x)-\frac{\mu}{2}\lVert x \rVert^2~\text{is convex},~\forall x.\ &(iii)~\langle \nabla f(x) - \nabla f(y),x-y \rangle \ge \mu \lVert x-y\rVert^2,~\forall x, y.\ &(iv)~f(\alpha x+ (1-\alpha) y) \le \alpha f(x) + (1-\alpha) f(y) - \frac{\alpha (1-\alpha)\mu}{2}\Vert x-y\rVert^2,~\alpha \in [0,1].\ &(v)~\nabla^2 f(x) \succeq \mu \boldsymbol{I} \end{align} $$


Lagrange Duality

Consider an optimization problem in standard form (not necessarily convex) $$ \begin{array}{ll} \underset{x}{\text{minimize}} & f_0(x) \ \text{subject to} & f_i(x) \le 0, ~i=1,\cdots,m \ ~ & h_i(x) = 0, ~i=1,\cdots,p \end{array} $$

The Lagrangian is $$ L(x,\boldsymbol{\lambda},\boldsymbol{\mu}) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \mu_i h_i(x) $$

The Lagrange dual function is defined as $$ g(\lambda, \mu) = \inf_{x} L(x,\lambda,\mu) $$

![2018-06-11 12-48-10 的屏幕截图](/home/yychi/Pictures/2018-06-11 12-48-10 的屏幕截图.png)

Lagrange dual problem $$ \begin{array}{ll} \underset{\lambda, \mu}{\text{maximize}} & g(\lambda, \mu) \ \text{subject to} & \boldsymbol{\lambda} \succeq \mathbf{0} \end{array} $$

Weak duality $$ d^* \le p^* $$

  • $d^*$: optima of dual problem
  • $p^*$: optima of primal problem
  • duality gap: $p^* - d^*$
  • always hold

Strong dualiy $$ d^* = p^* $$

  • constraint qualifications $\implies$ strong duality
  • Slater’s Constraint Qualification: a convex problem is strictly feasible (i.e., $\exists ~x \in \mathbf{int} \mathcal{D}: x \in \Omega$)

Complementary slackness ![2018-06-11 13-05-01 的屏幕截图](/home/yychi/Pictures/2018-06-11 13-05-01 的屏幕截图.png)

KKT conditions ![2018-06-11 13-06-37 的屏幕截图](/home/yychi/Pictures/2018-06-11 13-06-37 的屏幕截图.png)

![2018-06-11 13-08-39 的屏幕截图](/home/yychi/Pictures/2018-06-11 13-08-39 的屏幕截图.png)


Cones

Tagent cone Let M be a (nonempty) convex set and $x^* \in M$, the tagent cone of $M$ at $x^*$ is the cone $$ \begin{split} T_M(x^) &= {h \in \mathbb{R}^n | x^ + th \in M, ; \forall t > 0 } \ &= {y \in \mathbb{R}^n ~|~ y - x^* \in M} \end{split} $$

<u>Note:</u>

  • Geometrically, this is the set of all directions leading from $x^*$ inside $M$
  • convex but not necessarily closed
  • fact: if $x^$ is a minimizer, then $\forall h \in T_M(x^) \implies h^T \nabla f(x^*) \ge 0$. (因为tangent cone里面都是可行解,所以必须不是下降方向)
  • $T_M(x^) = \mathbb{R}^n \iff x^ \in \mathbf{int}M$

e.g. 多面体 $$ M = {x | Ax \le b} = {x | a_i^Tx \le b_i, ; i = 1,\dots,m} $$ the tangent cone at $x^$ is $$ T_M(x^) = {h~|~a_i^T h \le 0, ~\forall i, ~a_i^T x^* = b_i} $$


Normal cone: the polar cone of tangent cone $$ N_M(x^) = {g \in \mathbb{R}^n ~|~ \langle g, y-x^\rangle \le 0, ~\forall y \in M} $$

<u>Note:</u>

  • normal cone is the polar to tangent cone, i.e., $$ \begin{split} T_M(x^) &= {g \in \mathbb{R}^n ~|~ \langle g, y-x^\rangle \ge 0, ~\forall y \in M} \ N_M(x^) &= {g \in \mathbb{R}^n ~|~ \langle g, y-x^\rangle \le 0, ~\forall y \in M} \end{split} $$
  • fact: if $x^$ is a minimizer, then $-\nabla f(x^) \in N_M(x^*)$.
  • ![2018-06-05 15-11-35 的屏幕截图](/home/yychi/Pictures/2018-06-05 15-11-35 的屏幕截图.png)

Algorithm convergence

~ Stepsize Rule Convergence Rate Iteration Complexity
Gradient descent
strongly convex & smooth $\eta_t = \frac{2}{\mu + L}$ $O\left(\frac{\kappa -1}{\kappa +1}\right)^t$ $O\left(\frac{\log\frac{1}{\epsilon}}{\log\frac{\kappa+1}{\kappa-1}}\right)$
convex & smooth $\eta_t = \frac{1}{L}$ $O(\frac{1}{\sqrt{t}})$ $O(\frac{1}{\epsilon})$
Frank-Wolfe
(strongly) convex & smooth $\eta_t = \frac{1}{t}$ $O(\frac{1}{\sqrt{t}})$ $O(\frac{1}{\epsilon})$
Projected GD
convex & smooth $\eta_t = \frac{1}{L}$ $O(\frac{1}{\sqrt{t}})$ $O(\frac{1}{\epsilon})$
strongly convex & smooth $\eta_t = \frac{1}{L}$ $O\left((1-\frac{1}{\kappa})^t\right)$ $O(\kappa\log\frac{1}{\epsilon})$
Subgradient method
convex & Lipschitz $\eta_t = \frac{1}{\sqrt{t}}$ $O(\frac{1}{\sqrt{t}})$ $O(\frac{1}{\epsilon^2})$
strongly convex & Lipschitz $\eta_t = \frac{1}{t}$ $O\left(\frac{1}{t}\right)$ $O(\frac{1}{\epsilon})$
Proximal GD
convex & smooth (w.r.t. $f$) $\eta_t = \frac{1}{L}$ $O(\frac{1}{t})$ $O(\frac{1}{\epsilon})$
strongly convex & smooth (w.r.t. $f$) $\eta_t = \frac{1}{L}$ $O\left((1-\frac{1}{\kappa})^t\right)$ $O(\kappa\log\frac{1}{\epsilon})$
展开阅读全文
打赏
0
0 收藏
分享
加载中
更多评论
打赏
0 评论
0 收藏
0
分享
返回顶部
顶部