高級優化理論與方法(二)
- 上節回顧
- Constrained
- Unconstrained
- FONC
- SONC
- example
- 這節課的內容
- SOSC
- 定理敘述
- 證明
- 例子
- One-dimensional Search Methods
- Iterative Method
- Golden Section Search
- Method
- Issues
- 方法推理
- 算法描述
- Time
- Example
- Fibonacci Method
- Bisection Method
- Newton Method
- Example
- Secant Method
- Bracketing
- 總結
上節回顧
Constrained
m i n f ( x ) s . t . x ∈ Ω min f(x)\\ s.t. x\in \Omega minf(x)s.t.x∈Ω
Unconstrained
m i n f ( x ) min f(x) minf(x)
FONC
x ? x^* x? is optimal, ? d , ? f ( x ? ) T d ≥ 0 \forall d, \nabla f(x^*)^Td \geq 0 ?d,?f(x?)Td≥0
(interior) ? f ( x ? ) = 0 \nabla f(x^*)=0 ?f(x?)=0
SONC
x ? x^* x? local optimal, ? d , d T ? F ( x ) T d ≥ 0 \forall d, d^T\nabla F(x)^Td \geq 0 ?d,dT?F(x)Td≥0
(interior) ? f ( x ? ) = 0 , F ( x ? ) ≥ 0 \nabla f(x^*)=0,F(x^*)\geq0 ?f(x?)=0,F(x?)≥0
example
m i n f ( x 1 , x 2 ) = x 1 2 ? x 2 2 min f(x_1,x_2)=x_1^2-x_2^2 minf(x1?,x2?)=x12??x22?
x ? = [ 0 , 0 ] T x^*=[0,0]^T x?=[0,0]T
? f ( x ) = [ 2 x 1 , ? 2 x 2 ] T \nabla f(x)=[2x_1,-2x_2]^T ?f(x)=[2x1?,?2x2?]T
? f ( x ? ) = [ 0 , 0 ] T \nabla f(x^*)=[0,0]^T ?f(x?)=[0,0]T
H ( x ) = [ 2 0 0 ? 2 ] > 0 H(x)=\begin{bmatrix} 2 & 0 \\ 0 & -2 \end{bmatrix}>0 H(x)=[20?0?2?]>0
d 1 = [ 1 , 0 ] T d_1=[1,0]^T d1?=[1,0]T
d 1 T F ( x ? ) d 1 = [ 2 , 0 ] [ 1 , 0 ] T = 2 > 0 d_1^TF(x^*)d_1=[2,0][1,0]^T=2>0 d1T?F(x?)d1?=[2,0][1,0]T=2>0
d 2 = [ 0 , 1 ] T d_2=[0,1]^T d2?=[0,1]T
d 2 T F ( x ? ) d 2 = ? 2 < 0 d_2^TF(x^*)d_2=-2<0 d2T?F(x?)d2?=?2<0
根據SONC, [ 0 , 0 ] T [0,0]^T [0,0]T not local minimizer.
這節課的內容
SOSC
定理敘述
【Second-order Sufficient Condition]
Let f ∈ C 2 f\in C^2 f∈C2 be defined on a region in which x ? x^* x? is an interior point.Suppose that:
① ? f ( x ? ) = 0 \nabla f(x^*)=0 ?f(x?)=0
② F ( x ? ) > 0 F(x^*)>0 F(x?)>0
Then, x ? x^* x? is a strict local minimizer of f. ? x ∈ N ? ( x ? ) : f ( x ? ) < f ( x ) \forall x\in N_{\epsilon}(x^*):f(x^*)<f(x) ?x∈N??(x?):f(x?)<f(x)
注:對于無約束優化問題,我們只能給出一些充分條件或者必要條件,充要條件是數學界的一個公開問題,目前還沒有答案。
證明
證:
f ∈ C 2 ? F ( x ? ) = F ( x ? ) T f \in C^2 \Rightarrow F(x^*)=F(x^*)^T f∈C2?F(x?)=F(x?)T
(由Clairaut’s Theorem and Schwarz’s Therem, ? i , j ∈ [ 1 , n ] , ? 2 f ( x ? ) ? x i ? x j = ? 2 f ( x ? ) ? x j ? x i \forall i,j \in [1,n],\frac{\partial^2 f(x^*)}{\partial x_i \partial x_j}=\frac{\partial^2 f(x^*)}{\partial x_j \partial x_i} ?i,j∈[1,n],?xi??xj??2f(x?)?=?xj??xi??2f(x?)?)
Rayleigh’s Inequality:for a P ∈ R n × n P \in \mathbb{R}^{n \times n} P∈Rn×n,symmetric, positive definite:
λ m i n ( P ) ∣ ∣ x ∣ ∣ 2 ≤ x T P x ≤ λ m a x ( P ) ∣ ∣ x ∣ ∣ 2 \lambda_{min}(P)||x||^2\leq x^TPx \leq \lambda_{max}(P)||x||^2 λmin?(P)∣∣x∣∣2≤xTPx≤λmax?(P)∣∣x∣∣2
where λ m i n ( P ) \lambda_{min}(P) λmin?(P) and λ m a x ( P ) \lambda_{max}(P) λmax?(P) are the minmal and maximal eigenvalue value of P, respectively.
a symmetric matrix is positive definite ? \Leftrightarrow ?all its eigenvalues are positive.
∵ d T F ( x ? ) d ≥ λ m i n ( F ( x ? ) ) ∣ ∣ d ∣ ∣ 2 > 0 \because d^TF(x^*)d \geq \lambda_{min}(F(x^*))||d||^2>0 ∵dTF(x?)d≥λmin?(F(x?))∣∣d∣∣2>0
∴ f ( x ? + d ) ? f ( x ? ) = 1 2 d T F ( x ? ) d + o ( ∣ ∣ d ∣ ∣ 2 ) > 0 \therefore f(x^*+d)-f(x^*)=\frac{1}{2}d^TF(x^*)d+o(||d||^2)>0 ∴f(x?+d)?f(x?)=21?dTF(x?)d+o(∣∣d∣∣2)>0
例子
f ( x ) = x 1 2 + x 2 2 f(x)=x_1^2+x_2^2 f(x)=x12?+x22?
? f ( x ) = [ 2 x 1 , 2 x 2 ] T \nabla f(x)=[2x_1,2x_2]^T ?f(x)=[2x1?,2x2?]T
H ( x ) = [ 2 0 0 2 ] > 0 H(x)=\begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}>0 H(x)=[20?02?]>0
x ? = [ 0 , 0 ] T x^*=[0,0]^T x?=[0,0]T
One-dimensional Search Methods
Iterative Method
Iterative Method意為迭代算法。此處算法用algorithm其實不太嚴謹,因為要設計到算法的復雜度證明、正確性證明、能否停止等等的算法嚴謹性問題,而method這個詞則不用考慮這么多。迭代意為由某個初始點出發,找一些方向,往某些方向更新的過程。
Golden Section Search
Assume f: unimodular on [ a 0 , b 0 ] [a_0,b_0] [a0?,b0?] (only one minimizer in [ a 0 , b 0 ] [a_0,b_0] [a0?,b0?])
Basic Idea: “Narrow Down”
Binary Search does not work out.
Pick two instead of one points.
Method
input: a 0 , b 0 , f , ? a_0,b_0,f,\epsilon a0?,b0?,f,?
1. i = 0 i=0 i=0
2.while b i ? a i ≥ ? b_i-a_i\geq \epsilon bi??ai?≥? do
3.Pick x < y x<y x<y from [a_i,b_i]
4.If f ( x ) < f ( y ) f(x)<f(y) f(x)<f(y) then a i + 1 = a i , b i + 1 = y a_{i+1}=a_i,b_{i+1}=y ai+1?=ai?,bi+1?=y;
else b i + 1 = b i , a i + 1 = x b_{i+1}=b_i,a_{i+1}=x bi+1?=bi?,ai+1?=x
5.i++
6.END while
Issues
1.# while-loop
2.# computation of f ( ? ) f(\cdot) f(?)
方法推理
W.O.L.G.(Without Loss of Generality)
Assume b 0 ? a 0 = 1 b_0-a_0=1 b0??a0?=1
a 1 ? a 0 = b 1 ? b 0 = ρ < 1 2 a_1-a_0=b_1-b_0=\rho<\frac{1}{2} a1??a0?=b1??b0?=ρ<21?
? i : b i + 1 ? a i + 1 = ( 1 ? ρ ) ( b i ? a i ) \forall i: b_{i+1}-a_{i+1}=(1-\rho)(b_i-a_i) ?i:bi+1??ai+1?=(1?ρ)(bi??ai?)
b 1 ? a 1 = 1 ? 2 ρ b_1-a_1=1-2\rho b1??a1?=1?2ρ
b 1 ? a 1 = ρ ( b 1 ? a 0 ) = ρ ( 1 ? ρ ) ? 1 ? 2 ρ = ρ ? ρ 2 ? ρ 2 ? 3 ρ + 1 = 0 b_1-a_1=\rho(b_1-a_0)=\rho(1-\rho) \Rightarrow 1-2\rho=\rho-\rho^2 \Rightarrow \rho^2-3\rho+1=0 b1??a1?=ρ(b1??a0?)=ρ(1?ρ)?1?2ρ=ρ?ρ2?ρ2?3ρ+1=0
ρ 1 = 3 + 5 2 > 1 2 \rho_1=\frac{3+\sqrt{5}}{2}>\frac{1}{2} ρ1?=23+5??>21?(舍去), ρ 2 = 3 ? 5 2 < 1 2 \rho_2=\frac{3-\sqrt{5}}{2}<\frac{1}{2} ρ2?=23?5??<21?
算法描述
1.compile b 1 = a 0 + ( 1 ? ρ ) ( b 0 ? a 0 ) , a 1 = a 0 + ρ ( b 0 ? a 0 ) , f ( a 1 ) , f ( b 1 ) b_1=a_0+(1-\rho)(b_0-a_0),a_1=a_0+\rho(b_0-a_0),f(a_1),f(b_1) b1?=a0?+(1?ρ)(b0??a0?),a1?=a0?+ρ(b0??a0?),f(a1?),f(b1?)
2.i=0
3.while b i ? a i ≥ ? b_i-a_i\geq \epsilon bi??ai?≥? do
if f ( a i + 1 ) < f ( b i + 1 ) f(a_{i+1})<f(b_{i+1}) f(ai+1?)<f(bi+1?) then
b i + 2 = a i + 1 , a i + 2 = a i + ρ ( b i + 1 ? a i ) , a i + 1 = a i b_{i+2}=a_{i+1},a_{i+2}=a_i+\rho(b_{i+1}-a_i),a_{i+1}=a_i bi+2?=ai+1?,ai+2?=ai?+ρ(bi+1??ai?),ai+1?=ai?
else
a i + 2 = b i + 1 , b i + 2 = b i ? ρ ( b i ? a i + 1 ) , b i + 1 = b i a_{i+2}=b_{i+1},b_{i+2}=b_i-\rho(b_i-a_{i+1}),b_{i+1}=b_i ai+2?=bi+1?,bi+2?=bi??ρ(bi??ai+1?),bi+1?=bi?
4.i++
5.END while
Time
1.While-Loop: time of f ( ? ) f(\cdot) f(?)+O(1)
2.Loop: ( 1 ? ρ ) N ( b 0 ? a 0 ) < ? (1-\rho)^N(b_0-a_0)<\epsilon (1?ρ)N(b0??a0?)<?
N= a r g m i n ( l o g 1 ? ρ ? b 0 ? a 0 ) argmin(log_{1-\rho}\frac{\epsilon}{b_0-a_0}) argmin(log1?ρ?b0??a0???)
Example
? = 0.3 \epsilon=0.3 ?=0.3
f ( x ) = x 4 ? 14 x 3 + 60 x 2 ? 70 x f(x)=x^4-14x^3+60x^2-70x f(x)=x4?14x3+60x2?70x
[0,2]
( 1 ? ρ ) N < 0.3 2 = 0.15 ? N = 4 (1-\rho)^N<\frac{0.3}{2}=0.15\Rightarrow N=4 (1?ρ)N<20.3?=0.15?N=4
1. a 1 = a 0 + ρ ( b 0 ? a 0 ) = 0.7633 a_1=a_0+\rho(b_0-a_0)=0.7633 a1?=a0?+ρ(b0??a0?)=0.7633
b 1 = a 0 + ( 1 ? ρ ) ( b 0 ? a 0 ) = 1.236 b_1=a_0+(1-\rho)(b_0-a_0)=1.236 b1?=a0?+(1?ρ)(b0??a0?)=1.236
f ( a 1 ) = ? 24.36 f(a_1)=-24.36 f(a1?)=?24.36
f ( b 1 ) = ? 18.96 f(b_1)=-18.96 f(b1?)=?18.96
2.[0,1.236]
b 2 = a 1 = 0.7639 b_2=a_1=0.7639 b2?=a1?=0.7639
a 1 = a 0 + ρ ( 1.236 ? 0 ) = 0.4721 a_1=a_0+\rho(1.236-0)=0.4721 a1?=a0?+ρ(1.236?0)=0.4721
f ( b 2 ) = ? 24.36 f(b_2)=-24.36 f(b2?)=?24.36
KaTeX parse error: Expected 'EOF', got '}' at position 6: f(a_2}?=-21.10
3.[0.4721,1.236]
a 3 = b 2 = 0.7639 a_3=b_2=0.7639 a3?=b2?=0.7639
b 3 = a 2 + ( 1 ? ρ ) ( 1.236 ? 0.4721 ) = 0.9443 b_3=a_2+(1-\rho)(1.236-0.4721)=0.9443 b3?=a2?+(1?ρ)(1.236?0.4721)=0.9443
f ( a 3 ) = ? 24.36 f(a_3)=-24.36 f(a3?)=?24.36
f ( b 3 ) = ? 23.59 f(b_3)=-23.59 f(b3?)=?23.59
4.[0.4721,0.9443]
b 4 = a 3 = 0.7639 b_4=a_3=0.7639 b4?=a3?=0.7639
a 4 = 0.4721 + ρ ( 0.7443 ? 0.4721 ) = 0.6525 a_4=0.4721+\rho(0.7443-0.4721)=0.6525 a4?=0.4721+ρ(0.7443?0.4721)=0.6525
f ( b 4 ) = ? 24.36 f(b_4)=-24.36 f(b4?)=?24.36
$f(a_4)=-23.86
5.[0.6525,09443]
0.9443 ? 0.6525 < 0.3 = ? 0.9443-0.6525<0.3=\epsilon 0.9443?0.6525<0.3=?
算法終止
Fibonacci Method
事實上,每一輪的 ρ \rho ρ不一定要固定,也可以變化。假設 ρ \rho ρ會變化,我們來推導一下每一輪之間 ρ \rho ρ的關系。
ρ 1 ( 1 ? ρ 0 ) = 1 ? 2 ρ 0 \rho_1(1-\rho_0)=1-2\rho_0 ρ1?(1?ρ0?)=1?2ρ0?
ρ k + 1 ( 1 ? ρ k ) = 1 ? 2 ρ k \rho_{k+1}(1-\rho_k)=1-2\rho_k ρk+1?(1?ρk?)=1?2ρk?
ρ k + 1 = 1 ? ρ k 1 ? ρ k \rho_{k+1}=1-\frac{\rho_k}{1-\rho_k} ρk+1?=1?1?ρk?ρk??
問題轉化為
min ( 1 ? ρ 0 ) ( 1 ? ρ 1 ) ? ( 1 ? ρ k ) (1-\rho_0)(1-\rho_1)\cdots (1-\rho_k) (1?ρ0?)(1?ρ1?)?(1?ρk?)
s.t. ρ k + 1 = 1 ? ρ k 1 ? ρ k \rho_{k+1}=1-\frac{\rho_k}{1-\rho_k} ρk+1?=1?1?ρk?ρk??
結論為 ρ 0 = 1 ? F N F N + 1 , ρ N ? 1 = 1 ? F 1 F 2 \rho_0=1-\frac{F_N}{F_{N+1}},\rho_{N-1}=1-\frac{F_1}{F_2} ρ0?=1?FN+1?FN??,ρN?1?=1?F2?F1??
F k F_k Fk?為Fibonacci數列的第 k k k項, F 0 = 0 , F 1 = 1 , F k + 2 = F k + F k + 1 F_0=0,F_1=1,F_{k+2}=F_k+F_{k+1} F0?=0,F1?=1,Fk+2?=Fk?+Fk+1?
注:用該方法來做比黃金分割法要快。
Bisection Method
Assume:f: unimodular on [ a 0 , b 0 ] [a_0,b_0] [a0?,b0?], f continuously differentiable.
f ′ ( c ) < 0 : [ c , b 0 ] f'(c)<0:[c,b_0] f′(c)<0:[c,b0?]
f ′ ( c ) > 0 : [ a 0 , c ] f'(c)>0:[a_0,c] f′(c)>0:[a0?,c]
f ′ ( c ) = 0 : f'(c)=0: f′(c)=0:return c c c
( 1 2 ) N < ? (\frac{1}{2})^N<\epsilon (21?)N<?
Newton Method
Assume: f ∈ C 2 ? x ? ∈ [ a , b ] : f ′ ( x ? ) = 0 f \in C^2\Rightarrow x^*\in [a,b]: f'(x^*)=0 f∈C2?x?∈[a,b]:f′(x?)=0
x k + 1 = x k ? f ( x k ) f ′ ( x k ) x_{k+1}=x_k-\frac{f(x_k)}{f'(x_k)} xk+1?=xk??f′(xk?)f(xk?)?或 x k + 1 = x k ? f ′ ( x k ) f ′ ′ ( x k ) x_{k+1}=x_k-\frac{f'(x_k)}{f''(x_k)} xk+1?=xk??f′′(xk?)f′(xk?)?
該方法只有在初始點選的比較好的時候才管用,若初始點選的不好,可能產生振蕩不收斂的問題。
Example
f ( x ) = 1 2 x 2 ? s i n x f(x)=\frac{1}{2}x^2-sinx f(x)=21?x2?sinx
x 0 = 0.5 x_0=0.5 x0?=0.5
? = 1 0 ? 5 \epsilon=10^{-5} ?=10?5
f ′ ( x ) = x ? c o s x f'(x)=x-cosx f′(x)=x?cosx
f ′ ′ ( x ) = 1 + s i n x f''(x)=1+sinx f′′(x)=1+sinx
x 1 = 0.5 ? 0.5 ? c o s 0 , 5 1 + s i n 0.5 = 0.7552 x_1=0.5-\frac{0.5-cos0,5}{1+sin0.5}=0.7552 x1?=0.5?1+sin0.50.5?cos0,5?=0.7552
x 2 = 0.7391 x_2=0.7391 x2?=0.7391
x 3 = 0.7390 x_3=0.7390 x3?=0.7390
x 4 = 0.7390 x_4=0.7390 x4?=0.7390
Secant Method
secant意為切線。
f ∈ C 1 f \in C^1 f∈C1
f ′ ′ ≈ f ′ ( x k + 1 ) ? f ′ ( x k ) x k + 1 ? x k f''\approx\frac{f'(x_{k+1})-f'(x_k)}{x_{k+1}-x_k} f′′≈xk+1??xk?f′(xk+1?)?f′(xk?)?
x k + 1 = x k ? f ′ ( x k ) ( x k ? x k ? 1 ) f ′ ( x k ) ? f ′ ( x k ? 1 ) x_{k+1}=x_k-\frac{f'(x_k)(x_k-x_{k-1})}{f'(x_k)-f'(x_{k-1})} xk+1?=xk??f′(xk?)?f′(xk?1?)f′(xk?)(xk??xk?1?)?
Bracketing
Find the initial a 0 , b 0 a_0,b_0 a0?,b0?
Suffice: a 0 , c , b 0 ← f ( a 0 ) > f ( c ) , f ( b 0 ) > f ( c ) a_0,c,b_0\leftarrow f(a_0)>f(c),f(b_0)>f(c) a0?,c,b0?←f(a0?)>f(c),f(b0?)>f(c)
該方法用于求得一個理想的區間,然后使用其它算法來做,但在實際應用中比較少見,且不太好用。
總結
本節課先回顧了FONC和SONC這兩個找最值點的必要條件,然后給出了SOSC這個找最值點的充分條件。雖然看上去比較簡單,但是關于無約束優化的定理目前也只發展到這種程度。目前數學界還沒有找出一個充分必要條件。然后介紹了一維搜索方法中的迭代方法。重點介紹了黃金分割法,簡略介紹了斐波那契法、二分法、牛頓法、割線法等方法。