还是先画关系图
- 解法一:局部到全体
- 解法二:反向递推
神经元间到神经元内:由于网络结构以神经元为单位,以 z 代表一个神经元求,先省略 w,b
神经元间
解法一:局部到全体 (分子布局)
先求解各个边,再找路径,路径上相乘,路径间相加。
输出层到损失函数
$$
\frac{ \partial C}{\partial z_i^{(L)}} = \frac{ \partial C}{\partial a_i^{(L)}} \frac{\partial a_i^{(L)}}{ \partial z_i^{(L)}}
= \frac{ \partial C}{\partial a_i^{(L)}} f’({ z_i^{(L)}})
$$
分子布局的矩阵形式为(只有 a_i 和 z_i 有关,所以为对角阵):
$$
\frac{ \partial C}{\partial \mathbf z^{(L)}} = \frac{ \partial C}{\partial \mathbf a^{(L)}} \frac{\partial \mathbf a^{(L)}}{ \partial \mathbf z^{(L)}}
= \frac{ \partial C}{\partial \mathbf a^{(L)}} diag(f’(\mathbf z^{(L)}))
$$
Hadamard 乘积形式为 (混合布局,维持主体):
$$
\frac{ \partial C}{\partial \mathbf z^{(L)}} = C’(\mathbf a^{(L)}) \odot f’(\mathbf z^{(L)})
$$
相邻层
$$
\frac{ \partial z^{(l+1)}_j}{\partial z_i^{(l)}} = \frac{ \partial z^{(l+1)}_j}{\partial a_i^{(l)}} \frac{\partial a_i^{(l)}}{ \partial zi^{(l)}}
= w^{(l+1)}{j←i} f’({ z_i^{(l)}})
$$
矩阵形式:$w^{(l+1)}{j←i}$ 合并成 $\mathbf W^{(l+1)} $ 矩阵的元素,l+1行,l列,矩阵每一行都是一个权值向量,$w^{(l+1)}{j←i} = w^{(l+1)}_{ji} $
$$
\frac{ \partial \mathbf z^{(l+1)}}{\partial \mathbf z^{(l)}} = \frac{ \partial \mathbf z^{(l+1)}}{\partial \mathbf a^{(l)}} \frac{\partial \mathbf a^{(l)}}{ \partial \mathbf z^{(l)}}
= \mathbf W^{(l+1)} diag(f’(\mathbf z^{(l)}))
$$
无法写成 Hadamard 乘积形式
路径
$$
\frac{ \partial C}{\partial zi^{(l)}} = \sum{mnk … pq} \frac{ \partial C}{\partial z_m^{(L)}} \frac{\partial z_m^{(L)}}{ \partial z_n^{(L-1)}} \frac{\partial z_n^{(L-1)}}{ \partial z_k^{(L-2)}} … \frac{\partial z_p^{(l+1)}}{ \partial z_q^{(l)}}
$$
不好计算,直接看矩阵形式:
$$
\begin{align}
\frac{ \partial C}{\partial \mathbf z^{(l)}}
=& \frac{ \partial C}{\partial \mathbf z^{(L)}} \frac{\partial \mathbf z^{(L)}}{ \partial \mathbf z^{(L-1)}} \frac{\partial \mathbf z^{(L-1)}}{ \partial \mathbf z^{(L-2)}} … \frac{\partial \mathbf z^{(l+1)}}{ \partial \mathbf z^{(l)}}
\
=&
\frac{ \partial C}{\partial \mathbf a^{(L)}} diag(f’(\mathbf z^{(L)}))
\mathbf W^{(L)} diag(f’(\mathbf z^{(L-1)}))
\mathbf W^{(L-1)} diag(f’(\mathbf z^{(L-2)}))
… \
& \mathbf W^{(l+1)} diag(f’(\mathbf z^{(l)}))
\end{align}
$$
Hadamard 乘积形式:
$$
\begin{align}
\frac{ \partial C}{\partial \mathbf z^{(l)}}
=& \frac{ \partial C}{\partial \mathbf z^{(L)}} \frac{\partial \mathbf z^{(L)}}{ \partial \mathbf z^{(L-1)}} \frac{\partial \mathbf z^{(L-1)}}{ \partial \mathbf z^{(L-2)}} … \frac{\partial \mathbf z^{(l+1)}}{ \partial \mathbf z^{(l)}}
\
=&
C’(\mathbf a^{(L)}) \odot f’(\mathbf z^{(L)})
\mathbf W^{(L)} \odot f’(\mathbf z^{(L-1)})
\mathbf W^{(L-1)} \odot f’(\mathbf z^{(L-2)})
… \
& \mathbf W^{(l+1)} \odot f’(\mathbf z^{(l)})
\end{align}
$$
递推关系:
$$
\begin{align}
\frac{ \partial C}{\partial \mathbf z^{(l)}}
=& \frac{ \partial C}{\partial \mathbf z^{(l+1)}} \frac{\partial \mathbf z^{(l+1)}}{ \partial \mathbf z^{(l)}}
\
=&
\frac{ \partial C}{\partial \mathbf z^{(l)}} \mathbf W^{(l+1)} diag(f’(\mathbf z^{(l)}))
\end{align}
$$
解法二:反向递推 (分母布局)
定义
$$
\delta_i^{(l)} = \frac{ \partial C}{\partial z_i^{(l)}}
$$
列向量
$$
\boldsymbol \delta^{(l)} = \frac{ \partial C}{\partial \mathbf z^{(l)}}
$$
递推首项 $\delta_i^{(L)} $
$$
\delta_i^{(L)} = \frac{\partial a_i^{(L)}}{ \partial z_i^{(L)}} \frac{ \partial C}{\partial a_i^{(L)}}
= f’({ z_i^{(L)}}) \frac{ \partial C}{\partial a_i^{(L)}}
$$
矩阵形式
$$
\boldsymbol \delta^{(L)} = \frac{\partial \mathbf a^{(L)}}{ \partial \mathbf z^{(L)}} \frac{ \partial C}{\partial \mathbf a^{(L)}}
= diag(f’(\mathbf z^{(L)})) \frac{ \partial C}{\partial \mathbf a^{(L)}}
$$
Hadamard 乘积形式:
$$
\boldsymbol \delta^{(L)} = f’(\mathbf z^{(L)}) \odot C’(\mathbf a^{(L)})
$$
递推公式 $\delta_i^{(l)}$与 $\delta_i^{(l+1)} $
$$
\delta_i^{(l)}
= \frac{ \partial C}{\partial z_i^{(l)}} = \sum_j \frac{ \partial z_j^{(l+1)}}{\partial z_i^{(l)}} \frac{ \partial C}{\partial z_j^{(l+1)}}
= \sum_j \frac{ \partial z_j^{(l+1)}}{\partial z_i^{(l)}} \delta_j^{(l+1)}
$$
$$
\delta_i^{(l)} = \sum_j f’({ zi^{(l)}}) w^{(l+1)}{j←i} \delta_j^{(l+1)}
$$
矩阵形式:
$$
\boldsymbol \delta^{(l)} =
diag(f’(\mathbf z^{(L)})) (\mathbf W^{(l+1)})^T \boldsymbol \delta^{(l+1)}
$$
展开后和前面的结果只差一个转置的关系。
Hadamard 乘积形式:
$$
\boldsymbol \delta^{(l)} = f’({ \mathbf z^{(l)}}) \odot ((\mathbf W^{(l+1)} )^T \boldsymbol \delta^{(l+1)} )
$$
可以看出递推的写法更简洁。
神经元内
下面看 w 和 b
$$
\begin{align}
a^{(l)}_i &= f ( z^{(l)}_i ) \
z^{(l)}i &= \sum{j} w^{(l)}_{ i ←j} \times a^{(l-1)}_j + b^{(l)}i
\end{align}
$$
偏导:
$$
\begin{align}
\frac{ \partial C}{\partial w{ij}^{(l)}} &= \delta_i^{(L)} {a_j}^{(l-1)}\
\frac{ \partial C}{\partial b_i^{(l)}} & = \delta_i^{(L)}
\end{align}
$$
矩阵形式 $\mathbf{w}^{(l)}i :1 \times N{l-1}$,$\mathbf{W}^{(l)}:Nl \times N{l-1}$:
$$
z^{(l)}_i = \mathbf{w}^{(l)}_i \mathbf{a}^{(l-1)} + {b}^{(l)}_i, \
\mathbf{z}^{(l)} = \mathbf{W}^{(l)} \mathbf{a}^{(l-1)} + \mathbf{b}^{(l)},
$$
偏导(分子布局):
$$
\begin{align}
\frac{ \partial \mathbf z^{(l)}}{\partial \mathbf a^{(l-1)}} & = \mathbf{W}^{(l)}\
\frac{ \partial \mathbf z^{(l)}}{\partial \mathbf W^{(l)}} & = (\mathbf{a}^{(l-1)})^T\
\frac{ \partial \mathbf z^{(l)}}{\partial \mathbf b^{(l)}} & = \mathbf I^{(l\times l)}\
\end{align}
$$
结论 递推式:
$$
\begin{align}
\frac{ \partial C}{\partial \mathbf W^{(l)}} & = \boldsymbol \delta^{(l)} (\mathbf{a}^{(l-1)})^T\
\frac{ \partial C}{\partial \mathbf b^{(l)}} & = \boldsymbol \delta^{(l)}
\end{align}
$$
梯度消失和梯度爆炸
下一讲,程序实现。