Skip to content

Lesson-6

梯度

对于向量-标量之间求导后的形状如下图: alt text (1)标量对向量求导,若向量为列向量,求导后的结果为行向量,具体公式如下:

\[x = [x_1,x_2,...,x_n]^T,\frac{\partial y}{\partial x}=[\frac{\partial y}{\partial x_1},\frac{\partial y}{\partial x_2},...,\frac{\partial y}{\partial x_n}]\]

例:设\(y=x_1^2+2x_2^2,x=[x_1,x_2]^T\),则有:\(\frac{\partial y}{\partial x} =[2x_1,4x_2]\) 理解:\(y\)的图像可以理解为等高线,假设我们取\((x_1,x_2)=(1,1)\),那么梯度方向\((2,4)\)和该点的等高线正交。梯度是该点值变化最大的方向。alt text 别的样例如下:alt textalt text (2)向量对标量求导,若该向量为列向量,求导后的结果为列向量,具体公式如下:

\[y=[y_1,y_2,...,y_n]^T,\frac{\partial y}{\partial x}=[\frac{\partial y_1}{\partial x},\frac{\partial y_2}{\partial x},...,\frac{\partial y_n}{\partial x}]^T\]

(3)向量对向量求导,结果是矩阵,具体公式如下:

\[x = [x_1,x_2,...,x_n]^T,y=[y_1,y_2,...,y_m]^T\]
\[\frac{\partial y}{\partial x}=[\frac{\partial y_1}{\partial x},\frac{\partial y_2}{\partial x},...,\frac{\partial y_m}{\partial x}]^T= \left[ \begin{matrix} \frac{\partial y_1}{\partial x_1},\frac{\partial y_1}{\partial x_2},...,\frac{\partial y_m}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1},\frac{\partial y_2}{\partial x_2},...,\frac{\partial y_m}{\partial x_n}\\ ... \\ \frac{\partial y_n}{\partial x_1},\frac{\partial y_n}{\partial x_2},...,\frac{\partial y_m}{\partial x_n} \end{matrix} \right] \]

链式法则

alt text 例题:\(X\in R^{m \times n},w \in R^n,y\in R^m,z = ||Xw-y||^2\),求\(\frac{\partial z}{\partial w}\)。 解:设\(a=Xw,b=a-y\),则\(z=||b||^2\) 于是有:\(\frac{\partial z}{\partial w}=\frac{\partial z}{\partial b}\frac{\partial b}{\partial a}\frac{\partial a}{\partial w}=\frac{\partial ||b||^2}{\partial b}\frac{\partial a-y}{\partial a}\frac{\partial Xw}{\partial w}=2b^T · 1 · X=2(Xw-y)^TX\)

自动求导

自动求导计算一个函数在指定值上的倒数,它有别于符号求导和数值求导。 这里引入计算图的概念:

(1)将代码分解成操作子

(2)将计算表示成无环图 alt text Pytorch中是隐式的构造计算图。

计算链式法则根据计算顺序有两种不同的计算方法

(1)正向积累:\(\frac{\partial y}{\partial x}=\frac{\partial y}{\partial u_n}(\frac{\partial u_n}{\partial u_{n-1}}(...(\frac{\partial u_2}{\partial u_1}\frac{\partial u_1}{\partial x})))\),即从\(\frac{\partial u_1}{\partial x}\)开始算起,需要存储所有的中间计算结果

(2)反向积累(又称反向传递):\(\frac{\partial y}{\partial x}=(((\frac{\partial y}{\partial u_n}\frac{\partial u_n}{\partial u_{n-1}})...)\frac{\partial u_2}{\partial u_1})\frac{\partial u_1}{\partial x}\),即从\(\frac{\partial y}{\partial u_n}\)开始算起,可以有效去除不必要的枝 alt text