还是先画关系图
- 解法一:局部到全体
 - 解法二:反向递推
 
神经元间到神经元内:由于网络结构以神经元为单位,以 z 代表一个神经元求,先省略 w,b
神经元间
解法一:局部到全体 (分子布局)
先求解各个边,再找路径,路径上相乘,路径间相加。
输出层到损失函数
$$
\frac{ \partial C}{\partial z_i^{(L)}} = \frac{ \partial C}{\partial a_i^{(L)}} \frac{\partial a_i^{(L)}}{ \partial z_i^{(L)}}
= \frac{ \partial C}{\partial a_i^{(L)}} f’({  z_i^{(L)}})
$$
分子布局的矩阵形式为(只有 a_i 和 z_i 有关,所以为对角阵):
$$
\frac{ \partial C}{\partial \mathbf z^{(L)}} = \frac{ \partial C}{\partial \mathbf a^{(L)}} \frac{\partial \mathbf a^{(L)}}{ \partial \mathbf z^{(L)}}
= \frac{ \partial C}{\partial \mathbf a^{(L)}} diag(f’(\mathbf z^{(L)}))
$$
Hadamard 乘积形式为 (混合布局,维持主体):
$$
\frac{ \partial C}{\partial \mathbf z^{(L)}} = C’(\mathbf a^{(L)})  \odot f’(\mathbf z^{(L)})
$$
相邻层
$$
\frac{ \partial z^{(l+1)}_j}{\partial z_i^{(l)}} = \frac{ \partial z^{(l+1)}_j}{\partial a_i^{(l)}} \frac{\partial a_i^{(l)}}{ \partial zi^{(l)}}
= w^{(l+1)}{j←i} f’({  z_i^{(l)}})
$$
矩阵形式:$w^{(l+1)}{j←i}$ 合并成 $\mathbf W^{(l+1)} $ 矩阵的元素,l+1行,l列,矩阵每一行都是一个权值向量,$w^{(l+1)}{j←i} = w^{(l+1)}_{ji} $
$$
\frac{ \partial \mathbf z^{(l+1)}}{\partial \mathbf z^{(l)}} = \frac{ \partial \mathbf z^{(l+1)}}{\partial \mathbf a^{(l)}} \frac{\partial \mathbf a^{(l)}}{ \partial \mathbf z^{(l)}}
= \mathbf W^{(l+1)} diag(f’(\mathbf z^{(l)}))
$$
无法写成 Hadamard 乘积形式
路径
$$
\frac{ \partial C}{\partial zi^{(l)}} = \sum{mnk … pq} \frac{ \partial C}{\partial z_m^{(L)}} \frac{\partial z_m^{(L)}}{ \partial z_n^{(L-1)}}  \frac{\partial z_n^{(L-1)}}{ \partial z_k^{(L-2)}} … \frac{\partial z_p^{(l+1)}}{ \partial z_q^{(l)}}
$$
不好计算,直接看矩阵形式:
$$
\begin{align}
\frac{ \partial C}{\partial \mathbf z^{(l)}}
=& \frac{ \partial C}{\partial \mathbf z^{(L)}} \frac{\partial \mathbf z^{(L)}}{ \partial \mathbf z^{(L-1)}}  \frac{\partial \mathbf z^{(L-1)}}{ \partial \mathbf z^{(L-2)}} … \frac{\partial \mathbf z^{(l+1)}}{ \partial \mathbf z^{(l)}}
\
=&
\frac{ \partial C}{\partial \mathbf a^{(L)}} diag(f’(\mathbf z^{(L)}))
\mathbf W^{(L)} diag(f’(\mathbf z^{(L-1)}))
\mathbf W^{(L-1)} diag(f’(\mathbf z^{(L-2)}))
… \
& \mathbf W^{(l+1)} diag(f’(\mathbf z^{(l)}))
\end{align}
$$
Hadamard 乘积形式:
$$
\begin{align}
\frac{ \partial C}{\partial \mathbf z^{(l)}}
=& \frac{ \partial C}{\partial \mathbf z^{(L)}} \frac{\partial \mathbf z^{(L)}}{ \partial \mathbf z^{(L-1)}}  \frac{\partial \mathbf z^{(L-1)}}{ \partial \mathbf z^{(L-2)}} … \frac{\partial \mathbf z^{(l+1)}}{ \partial \mathbf z^{(l)}}
\
=&
 C’(\mathbf a^{(L)})  \odot f’(\mathbf z^{(L)})
\mathbf W^{(L)} \odot f’(\mathbf z^{(L-1)})
\mathbf W^{(L-1)} \odot f’(\mathbf z^{(L-2)})
… \
& \mathbf W^{(l+1)} \odot f’(\mathbf z^{(l)})
\end{align}
$$
递推关系:
$$
\begin{align}
\frac{ \partial C}{\partial \mathbf z^{(l)}}
=& \frac{ \partial C}{\partial \mathbf z^{(l+1)}} \frac{\partial \mathbf z^{(l+1)}}{ \partial \mathbf z^{(l)}}
\
=&
\frac{ \partial C}{\partial \mathbf z^{(l)}}  \mathbf W^{(l+1)} diag(f’(\mathbf z^{(l)}))
\end{align}
$$
解法二:反向递推 (分母布局)
定义
$$
\delta_i^{(l)} =  \frac{ \partial C}{\partial z_i^{(l)}}
$$
列向量
$$
\boldsymbol \delta^{(l)} =  \frac{ \partial C}{\partial \mathbf z^{(l)}}
$$
递推首项 $\delta_i^{(L)} $
$$
\delta_i^{(L)}  = \frac{\partial a_i^{(L)}}{ \partial z_i^{(L)}}  \frac{ \partial C}{\partial a_i^{(L)}}
= f’({  z_i^{(L)}}) \frac{ \partial C}{\partial a_i^{(L)}}
$$
矩阵形式
$$
\boldsymbol \delta^{(L)} = \frac{\partial \mathbf a^{(L)}}{ \partial \mathbf z^{(L)}}  \frac{ \partial C}{\partial \mathbf a^{(L)}}
= diag(f’(\mathbf z^{(L)}))  \frac{ \partial C}{\partial \mathbf a^{(L)}}
$$
Hadamard 乘积形式:
$$
\boldsymbol \delta^{(L)} =  f’(\mathbf z^{(L)})  \odot C’(\mathbf a^{(L)})
$$
递推公式 $\delta_i^{(l)}$与 $\delta_i^{(l+1)} $
$$
\delta_i^{(l)}
= \frac{ \partial C}{\partial z_i^{(l)}} = \sum_j \frac{ \partial z_j^{(l+1)}}{\partial z_i^{(l)}} \frac{ \partial C}{\partial z_j^{(l+1)}}
= \sum_j \frac{ \partial z_j^{(l+1)}}{\partial z_i^{(l)}} \delta_j^{(l+1)}
$$
$$
\delta_i^{(l)}  = \sum_j f’({  zi^{(l)}}) w^{(l+1)}{j←i}   \delta_j^{(l+1)}
$$
矩阵形式:
$$
\boldsymbol \delta^{(l)}  =
diag(f’(\mathbf z^{(L)}))  (\mathbf W^{(l+1)})^T   \boldsymbol \delta^{(l+1)}
$$
展开后和前面的结果只差一个转置的关系。
Hadamard 乘积形式:
$$
\boldsymbol \delta^{(l)}  = f’({  \mathbf z^{(l)}})  \odot ((\mathbf W^{(l+1)} )^T  \boldsymbol \delta^{(l+1)} )
$$
可以看出递推的写法更简洁。
神经元内
下面看 w 和 b
$$
\begin{align}
a^{(l)}_i &= f ( z^{(l)}_i ) \
z^{(l)}i  &= \sum{j}     w^{(l)}_{ i ←j} \times a^{(l-1)}_j + b^{(l)}i
\end{align}
$$
偏导:
$$
\begin{align}
\frac{ \partial C}{\partial w{ij}^{(l)}}  &= \delta_i^{(L)}  {a_j}^{(l-1)}\
\frac{ \partial C}{\partial b_i^{(l)}} & = \delta_i^{(L)}
\end{align}
$$
矩阵形式 $\mathbf{w}^{(l)}i :1 \times N{l-1}$,$\mathbf{W}^{(l)}:Nl \times N{l-1}$:
$$
z^{(l)}_i = \mathbf{w}^{(l)}_i \mathbf{a}^{(l-1)}  +   {b}^{(l)}_i, \
\mathbf{z}^{(l)}  =    \mathbf{W}^{(l)} \mathbf{a}^{(l-1)}  +   \mathbf{b}^{(l)},
$$
偏导(分子布局):
$$
\begin{align}
\frac{ \partial \mathbf z^{(l)}}{\partial \mathbf a^{(l-1)}} & =  \mathbf{W}^{(l)}\
\frac{ \partial \mathbf z^{(l)}}{\partial \mathbf W^{(l)}} & =  (\mathbf{a}^{(l-1)})^T\
\frac{ \partial \mathbf z^{(l)}}{\partial \mathbf b^{(l)}} & =  \mathbf I^{(l\times l)}\
\end{align}
$$
结论 递推式:
$$
\begin{align}
\frac{ \partial C}{\partial \mathbf W^{(l)}} & = \boldsymbol \delta^{(l)}  (\mathbf{a}^{(l-1)})^T\
\frac{ \partial C}{\partial \mathbf b^{(l)}} & = \boldsymbol \delta^{(l)}
\end{align}
$$
梯度消失和梯度爆炸
下一讲,程序实现。