Sherman-Morrison公式
Sherman-Morrison公式以 Jack Sherman 和 Winifred J. Morrison命名,在线性代数中,是求解逆矩阵的一种方法。本篇博客将介绍该公式及其应用,首先我们来看一下该公式的内容及其证明。
(Sherman-Morrison公式)假设$A\in\mathbb{R}^{n\times n}$为可逆矩阵,$u,v\in\mathbb{R}^{n}$为列向量,则$A+uv^{T}$可逆当且仅当$1+v^{T}A^{-1}u\neq 0$, 且当$A+uv^{T}$可逆时,该逆矩阵由以下公式给出:
$$(A+uv^{T})^{-1}=A^{-1}-{A^{-1}uv^{T}A^{-1} \over 1+v^{T}A^{-1}u}.$$
证明:
$(\Leftarrow)$当$1+v^{T}A^{-1}u\neq 0$时,令$X=A+uv^{T}, Y=A^{-1}-{A^{-1}uv^{T}A^{-1} \over 1+v^{T}A^{-1}u}$,则只需证明$XY=YX=I$即可,其中$I$为n阶单位矩阵。
$$ {\displaystyle {\begin{aligned} XY&=(A+uv^{T})\left(A^{-1}-{A^{-1}uv^{T}A^{-1} \over 1+v^{T}A^{-1}u}\right)\\ &=AA^{-1}+uv^{T}A^{-1}-{AA^{-1}uv^{T}A^{-1}+uv^{T}A^{-1}uv^{T}A^{-1} \over 1+v^{T}A^{-1}u}\\ &=I+uv^{T}A^{-1}-{uv^{T}A^{-1}+uv^{T}A^{-1}uv^{T}A^{-1} \over 1+v^{T}A^{-1}u}\\ &=I+uv^{T}A^{-1}-{u(1+v^{T}A^{-1}u)v^{T}A^{-1} \over 1+v^{T}A^{-1}u}\\ &=I+uv^{T}A^{-1}-uv^{T}A^{-1}\\ &=I\end{aligned}}} $$
同理,有$YX=I$.因此,当$1+v^{T}A^{-1}u\neq 0$时,$(A+uv^{T})^{-1}=A^{-1}-{A^{-1}uv^{T}A^{-1} \over 1+v^{T}A^{-1}u}.$
$(\Rightarrow)$当$u=0$时,显然有$1+v^{T}A^{-1}u=1\neq 0.$当$u\neq0$时,用反正法证明该命题成立。假设$A+uv^{T}$可逆,但$1+v^{T}A^{-1}u = 0$,则有
$$(A+uv^{T})A^{-1}u=u+u(v^{T}A^{-1}u)=(1+v^{T}A^{-1}u)u=0.$$
因为$A+uv^{T}$可逆,故$A^{-1}$u=0,又因为$A^{-1}$可逆,故$u=0$,此与假设$u\neq 0$矛盾。因此,当$A+uv^{T}$可逆时,有$1+v^{T}A^{-1}u \neq 0.$
Sherman-Morrison公式的应用
应用1:$A=I$时的Sherman-Morrison公式
在Sherman-Morrison公式中,令$A=I$,则有:$I+uv^{T}$可逆当且仅当$1+v^{T}u\neq 0$, 且当$I+uv^{T}$可逆时,该逆矩阵由以下公式给出:
$$(I+uv^{T})^{-1}=I-{uv^{T} \over 1+v^{T}u}.$$
再令$v=u$,则$1+u^{T}u > 0$, 因此,$I+uu^{T}$可逆,且
$$(I+uu^{T})^{-1}=I-{uu^{T} \over 1+u^{T}u}.$$
应用2:BFGS算法
Sherman-Morrison公式在BFGS算法中的应用,可用来求解BFGS算法中近似Hessian矩阵的逆。本篇博客并不打算给出Sherman-Morrison公式在BFGS算法中的应用,将会再写篇博客介绍BFGS算法,到时再给出该公式的应用,并会在之后补上该博客的链接(因为笔者还没写)。
应用3:循环三对角线性方程组的求解
本篇博客将详细讲述Sherman-Morrison公式在循环三对角线性方程组的求解中的应用。
首先给给出理论知识介绍部分。
对于$A\in\mathbb{R}^{n\times n}$为可逆矩阵,$u,v\in\mathbb{R}^{n}$为列向量,$1+v^{T}A^{-1}u\neq 0$,需要求解方程$(A+uv^{T})x=b.$对此,我们可以先求解以下两个方程:
$$Ay=b,\qquad Az=u$$.
然后令$x=y-\frac{v^{T}y}{1+v^{T}z}z$,该解即为原方程的解,验证如下:
$$ {\displaystyle {\begin{aligned} (A+uv^{T})x&=(A+uv^{T})(y-\frac{v^{T}y}{1+v^{T}z}z)\\ &=Ay+uv^{T}y-\frac{v^{T}y}{1+v^{T}z}Az-\frac{v^{T}y}{1+v^{T}z}uv^{T}z\\ &=b+uv^{T}y-\frac{v^{T}yu+v^{T}yuv^{T}z}{1+v^{T}z}\\ &=b+uv^{T}y-\frac{(1+v^{T}z)v^{T}yu}{1+v^{T}z}\\ &=b+uv^{T}y-uv^{T}y\\ &=b\end{aligned}}} $$
这样将原方程$ (A+uv^{T})x=b$分成两个方程,可以在一定程度上简化原方程。接下来,我们将介绍循环三对角线性方程组的求解。
所谓循环三对角线性方程组,指的是系数矩阵为如下形式:
$$ A=\begin{bmatrix} b_1&c_1&0&\cdots&0&a_1\\ a_2&b_2&c_2&0&\vdots&0\\ 0&\ddots&\ddots&\ddots&0&\vdots\\ \vdots&\vdots&a_{n-2}&b_{n-2}&c_{n-2}&0\\ 0&\cdots&\cdots&a_{n-1}&b_{n-1}&c_{n-1}\\ c_n&0&\cdots&0&a_n&b_n\end{bmatrix} $$
循环三对角线性方程组可写成$Ax=d$,其中$d=(d_{1},d_2,…,d_{n})^{T}.$
对于此方程的求解,我们令$u=(\gamma, 0,0,…,c_{n})^{T}, v=(1,0,0,…,\frac{a_1}{\gamma})^{T}$, 且$A=A^{‘}+uv^{T}$,其中$A^{‘}$如下:
$$ A^{‘}=\begin{bmatrix} b_1-\gamma&c_1&0&\cdots&0&0\\ a_2&b_2&c_2&0&\vdots&0\\ 0&\ddots&\ddots&\ddots&0&\vdots\\ \vdots&\vdots&a_{n-2}&b_{n-2}&c_{n-2}&0\\ 0&\cdots&\cdots&a_{n-1}&b_{n-1}&c_{n-1}\\ 0&0&\cdots&0&a_n&b_n-\frac{a_1c_n}{\gamma}\end{bmatrix} $$
$A^{‘}$为三对角矩阵。根据以上的理论知识,我们只需要求解以下两个方程
$$A^{‘}y=d,\qquad A^{‘}z=u,$$
然后,就能根据$y,z$求出$x$.而以上两个方程为三对角线性方程组,可以用追赶法(或Thomas法)求解,具体算法可以参考博客:三对角线性方程组(tridiagonal systems of equations)的求解 。
综上,我们利用Sherman-Morrison公式的思想,可以将循环三对角线
性方程组转化为三对角线性方程组求解。我们将会在下面给出该算法的Python语言实现。
Python实现
我们要解的循环三对角线性方程组如下:
$$ {\begin{bmatrix} 4&1&{0}&{0}&{2}\\ {1}&{4}&{1}&{0}&{0}\\ {0}&{1}&{4}&{1}&{0}\\ {0}&{0}&{1}&{4}&{1}\\ {3}&{0}&{0}&{1}&{4}\\ \end{bmatrix}} {\begin{bmatrix}{x_{1}}\\{x_{2}}\\{x_{3}}\\{x_{4}} \\{x_{5}}\\\end{bmatrix}}={\begin{bmatrix}{7\\6\\ 6\\6\\8}\\\end{bmatrix}} $$
用Python实现解该方程的Python完整代码如下:
# use Sherman-Morrison Formula and Thomas Method to solve cyclic tridiagonal linear equation
import numpy as np
# Thomas Method for soling tridiagonal linear equation Ax=d
# parameter: a,b,c,d are list-like of same length
# b: main diagonal of matrix A
# a: main diagonal below of matrix A
# c: main diagonal upper of matrix A
# d: Ax=d
# return: x(type=list), the solution of Ax=d
def TDMA(a,b,c,d):
try:
n = len(d) # order of tridiagonal square matrix
# use a,b,c to create matrix A, which is not necessary in the algorithm
A = np.array([[0]*n]*n, dtype='float64')
for i in range(n):
A[i,i] = b[i]
if i > 0:
A[i, i-1] = a[i]
if i < n-1:
A[i, i+1] = c[i]
# new list of modified coefficients
c_1 = [0]*n
d_1 = [0]*n
for i in range(n):
if not i:
c_1[i] = c[i]/b[i]
d_1[i] = d[i] / b[i]
else:
c_1[i] = c[i]/(b[i]-c_1[i-1]*a[i])
d_1[i] = (d[i]-d_1[i-1]*a[i])/(b[i]-c_1[i-1] * a[i])
# x: solution of Ax=d
x = [0]*n
for i in range(n-1, -1, -1):
if i == n-1:
x[i] = d_1[i]
else:
x[i] = d_1[i]-c_1[i]*x[i+1]
x = [round(_, 4) for _ in x]
return x
except Exception as e:
return e
# Sherman-Morrison Fomula for soling cyclic tridiagonal linear equation Ax=d
# parameter: a,b,c,d are list-like of same length
# b: main diagonal of matrix A
# a: main diagonal below of matrix A
# c: main diagonal upper of matrix A
# d: Ax=d
# return: x(type=list), the solution of Ax=d
def Cyclic_Tridiagnoal_Linear_Equation(a,b,c,d):
try:
# use a,b,c to create cyclic tridiagonal matrix A
n = len(d)
A = np.array([[0] * n] * n, dtype='float64')
for i in range(n):
A[i, i] = b[i]
if i > 0:
A[i, i - 1] = a[i]
if i < n - 1:
A[i, i + 1] = c[i]
A[0, n - 1] = a[0]
A[n - 1, 0] = c[n - 1]
gamma = 1 # gamma can be set freely
u = [gamma] + [0] * (n - 2) + [c[n - 1]]
v = [1] + [0] * (n - 2) + [a[0] / gamma]
# modify the coefficient to form A'
b[0] -= gamma
b[n - 1] -= a[0] * c[n - 1] / gamma
a[0] = 0
c[n - 1] = 0
# solve A'y=d, A'z=u by using Thomas Method
y = np.array(TDMA(a, b, c, d))
z = np.array(TDMA(a, b, c, u))
# use y and z to calculate x
# x = y-(v·y)/(1+v·z) *z
# x is the solution of Ax=d
x = y - (np.dot(np.array(v), y)) / (1 + np.dot(np.array(v), z)) * z
x = [round(_, 3) for _ in x]
return x
except Exception as e:
return e
def main():
'''
equation:
A = [[4,1,0,0,2],
[1,4,1,0,0],
[0,1,4,1,0],
[0,0,1,4,1],
[3,0,0,1,4]]
d = [7,6,6,6,8]
solution x should be [1,1,1,1,1]
'''
a = [2, 1, 1, 1, 1]
b = [4, 4, 4, 4, 4]
c = [1, 1, 1, 1, 3]
d = [7, 6, 6, 6, 8]
x = Cyclic_Tridiagnoal_Linear_Equation(a,b,c,d)
print('The solution is %s'%x)
main()
输出结果如下:
The solution is [1.0, 1.0, 1.0, 1.0, 1.0]