令 A , B ∈ R n × n {\displaystyle A,B\in R^{n\times n}\!\,} 为对称正定矩阵,令 b ∈ R n {\displaystyle b\in R^{n}\!\,} 。考虑二次函数 Q ( x ) = 1 2 x T A x − x T b {\displaystyle Q(x)={\frac {1}{2}}x^{T}Ax-x^{T}b\!\,} ,其中 x ∈ R n {\displaystyle x\in R^{n}\!\,} ,并使用一种下降方法来逼近 A x = b {\displaystyle Ax=b\!\,} 的解
x k + 1 = x k + α k d k {\displaystyle x_{k+1}=x_{k}+\alpha _{k}d_{k}\!\,}
定义最速下降 d k {\displaystyle d_{k}\!\,} 的概念,并说明如何计算最佳步长 α k {\displaystyle \alpha _{k}\!\,}
∇ Q ( x k ) ⋅ d k < 0 {\displaystyle \nabla Q(x_{k})\cdot d_{k}<0\!\,}
选择 α k {\displaystyle \alpha _{k}\!\,} 使 Q ( x k + α k d k ) {\displaystyle Q(x_{k}+\alpha _{k}d_{k})\!\,} 最小化,即
α k : min α k Q ( x k + α k d k ) {\displaystyle \alpha _{k}:\min _{\alpha _{k}}Q(x_{k}+\alpha _{k}d_{k})\!\,}
Q ( x k + α k d k ) = ⟨ x k + α k d k , A x k + α k A d k ⟩ − ⟨ x k + α k d k , b ⟩ {\displaystyle Q(x_{k}+\alpha _{k}d_{k})=\langle x_{k}+\alpha _{k}d_{k},Ax_{k}+\alpha _{k}Ad_{k}\rangle -\langle x_{k}+\alpha _{k}d_{k},b\rangle \!\,}
d d α Q ( x k + α k d k ) = ⟨ A d k , x k ⟩ + α k ⟨ A d k , d k ⟩ − ⟨ d k , b ⟩ {\displaystyle {\frac {d}{d\alpha }}Q(x_{k}+\alpha _{k}d_{k})=\langle Ad_{k},x_{k}\rangle +\alpha _{k}\langle Ad_{k},d_{k}\rangle -\langle d_{k},b\rangle \!\,}
将上述表达式设为零,得到最优 α k {\displaystyle \alpha _{k}\!\,}
α k = ⟨ d k , b − A x k ⟩ ⟨ A d k , d k ⟩ {\displaystyle \alpha _{k}={\frac {\langle d_{k},b-Ax_{k}\rangle }{\langle Ad_{k},d_{k}\rangle }}\!\,}
注意,由于 A {\displaystyle A\!\,} 是对称的
⟨ d k , A x k ⟩ = ⟨ A d k , x k ⟩ {\displaystyle \langle d_{k},Ax_{k}\rangle =\langle Ad_{k},x_{k}\rangle \!\,}
制定最速下降法(或梯度法),并编写实现该方法的伪代码。
请注意 d k = − ∇ Q ( x k ) = b − A x k = r k {\displaystyle d_{k}=-\nabla Q(x_{k})=b-Ax_{k}=r_{k}\!\,} . 然后,最小 α k {\displaystyle \alpha _{k}\!\,} 由 ⟨ r k , r k ⟩ ⟨ r k , r k ⟩ A {\displaystyle {\frac {\langle r_{k},r_{k}\rangle }{\langle r_{k},r_{k}\rangle _{A}}}\!\,} 给出。
给定 x 0 ∈ R n {\displaystyle x_{0}\in R^{n}\!\,}
For k = 0 , 1 , 2 , … {\displaystyle k=0,1,2,\ldots \!\,} r k = b − A x k If r k = 0 , then quit d k = r k α k = ⟨ r k , r k ⟩ ⟨ r k , r k ⟩ A x k + 1 = x k + α k d k {\displaystyle {\begin{aligned}r_{k}&=b-Ax_{k}\\&{\mbox{If }}r_{k}=0,{\mbox{ then quit }}\\d_{k}&=r_{k}\\\alpha _{k}&={\frac {\langle r_{k},r_{k}\rangle }{\langle r_{k},r_{k}\rangle _{A}}}\\x_{k+1}&=x_{k}+\alpha _{k}d_{k}\end{aligned}}}
令 B − 1 {\displaystyle B^{-1}\!\,} 是 A {\displaystyle A\!\,} 的预处理器。展示如何修改最速下降法使其适用于 B − 1 A x = B − 1 b {\displaystyle B^{-1}Ax=B^{-1}b\!\,} 并编写伪代码。请注意, B − 1 A {\displaystyle B^{-1}A\!\,} 可能不是对称的。(提示:按照共轭梯度法进行)。
由于 B {\displaystyle B\!\,} 是对称的,正定的, B = E T E {\displaystyle B=E^{T}E\!\,} ,其中 E {\displaystyle E\!\,} 是上三角矩阵(Cholesky 分解)。
那么 B − 1 = E − 1 E − T {\displaystyle B^{-1}=E^{-1}E^{-T}\!\,}
因此,
B − 1 A x = B − 1 b E − 1 E − T A x = E − 1 E − T b E − T A x = E − T b E − T A E − 1 ⏟ A ~ E x ⏟ x ~ = E − T b ⏟ b ~ {\displaystyle {\begin{aligned}B^{-1}Ax&=B^{-1}b\\E^{-1}E^{-T}Ax&=E^{-1}E^{-T}b\\E^{-T}Ax&=E^{-T}b\\\underbrace {E^{-T}AE^{-1}} _{\tilde {A}}\underbrace {Ex} _{\tilde {x}}&=\underbrace {E^{-T}b} _{\tilde {b}}\end{aligned}}\!\,}
A ~ {\displaystyle {\tilde {A}}\!\,} 是对称的
( E − T A E − 1 ) T = E − T A E − 1 {\displaystyle (E^{-T}AE^{-1})^{T}=E^{-T}AE^{-1}\!\,} 因为 A {\displaystyle A\!\,} 是对称的
A ~ {\displaystyle {\tilde {A}}\!\,} 是正定的
x T E − T A E − 1 x = ( E − 1 x ) T A ( E − 1 x ) > 0 {\displaystyle x^{T}E^{-T}AE^{-1}x=(E^{-1}x)^{T}A(E^{-1}x)>0\!\,} 因为 A {\displaystyle A\!\,} 是正定的
For k = 0 , 1 , 2 , … {\displaystyle k=0,1,2,\ldots \!\,} x k ~ = E x k r k = b ~ − A ~ x k ~ If r k = 0 , then quit d k = r k α k = ⟨ r k , r k ⟩ ⟨ r k , r k ⟩ A ~ x k + 1 = x k + α k d k {\displaystyle {\begin{aligned}{\tilde {x_{k}}}&=Ex_{k}\\r_{k}&={\tilde {b}}-{\tilde {A}}{\tilde {x_{k}}}\\&{\mbox{If }}r_{k}=0,{\mbox{ then quit }}\\d_{k}&=r_{k}\\\alpha _{k}&={\frac {\langle r_{k},r_{k}\rangle }{\langle r_{k},r_{k}\rangle _{\tilde {A}}}}\\x_{k+1}&=x_{k}+\alpha _{k}d_{k}\end{aligned}}}