前兩個筆記筆者集中探討了卷積神經網絡中的卷積原理,對于二維卷積和三維卷積的原理進行了深入的剖析,對 CNN 的卷積、池化、全連接、濾波器、感受野等關鍵概念進行了充分的理解。本節內容將繼續秉承之前 DNN 的學習路線,在利用Tensorflow搭建神經網絡之前,先嘗試利用numpy手動搭建卷積神經網絡,以期對卷積神經網絡的卷積機制、前向傳播和反向傳播的原理和過程有更深刻的理解。
在正式搭建 CNN 之前,我們先依據前面筆記提到的卷積機制的線性計算的理解,利用numpy定義一個單步卷積過程。代碼如下:
def conv_single_step(a_slice_prev, W, b): s = a_slice_prev * W # Sum over all entries of the volume s. Z = np.sum(s) # Add bias b to Z. Cast b to a float() so that Z results in a scalar value. Z = float(Z + b) return Z
在上述的單步卷積定義中,我們傳入了一個前一層輸入的要進行卷積的區域,即感受野 a_slice_prev,濾波器W,即卷積層的權重參數,偏差b,對其執行Z=Wx+b的線性計算即可實現一個單步的卷積過程。
正如 DNN 中一樣,CNN 即使多了卷積和池化過程,模型仍然是前向傳播和反向傳播的訓練過程。CNN 的前向傳播包括卷積和池化兩個過程,我們先來看如何利用numpy基于上面定義的單步卷積實現完整的卷積過程。卷積計算并不難,我們在單步卷積中就已經實現了,難點在于如何實現濾波器在輸入圖像矩陣上的的掃描和移動過程。
def conv_forward(A_prev, W, b, hparameters): """ Arguments: A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev) W -- Weights, numpy array of shape (f, f, n_C_prev, n_C) b -- Biases, numpy array of shape (1, 1, 1, n_C) hparameters -- python dictionary containing "stride" and "pad" Returns: Z -- conv output, numpy array of shape (m, n_H, n_W, n_C) cache -- cache of values needed for the conv_backward() function """ # 前一層輸入的shape (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape # 濾波器權重的shape (f, f, n_C_prev, n_C) = W.shape # 步幅參數 stride = hparameters['stride'] # 填充參數 pad = hparameters['pad'] # 計算輸出圖像的高寬 n_H = int((n_H_prev + 2 * pad - f) / stride + 1) n_W = int((n_W_prev + 2 * pad - f) / stride + 1) # 初始化輸出 Z = np.zeros((m, n_H, n_W, n_C)) # 對輸入執行邊緣填充 A_prev_pad = zero_pad(A_prev, pad) for i in range(m): a_prev_pad = A_prev_pad[i, :, :, :] for h in range(n_H): for w in range(n_W): for c in range(n_C): # 濾波器在輸入圖像上掃描 vert_start = h * stride vert_end = vert_start + f horiz_start = w * stride horiz_end = horiz_start + f # 定義感受野 a_slice_prev = a_prev_pad[vert_start : vert_end, horiz_start : horiz_end, :] # 對感受野執行單步卷積 Z[i, h, w, c] = conv_single_step(a_slice_prev, W[:,:,:,c], b[:,:,:,c]) assert(Z.shape == (m, n_H, n_W, n_C)) cache = (A_prev, W, b, hparameters) return Z, cache
def pool_forward(A_prev, hparameters, mode = "max"): """ Arguments: A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev) hparameters -- python dictionary containing "f" and "stride" mode -- the pooling mode you would like to use, defined as a string ("max" or "average") Returns: A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C) cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters """ # 前一層輸入的shape (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape # 步幅和權重參數 f = hparameters["f"] stride = hparameters["stride"] # 計算輸出圖像的高寬 n_H = int(1 + (n_H_prev - f) / stride) n_W = int(1 + (n_W_prev - f) / stride) n_C = n_C_prev # 初始化輸出 A = np.zeros((m, n_H, n_W, n_C)) for i in range(m): for h in range(n_H): for w in range(n_W): for c in range (n_C): # 樹池在輸入圖像上掃描 vert_start = h * stride vert_end = vert_start + f horiz_start = w * stride horiz_end = horiz_start + f # 定義池化區域 a_prev_slice = A_prev[i, vert_start:vert_end, horiz_start:horiz_end, c] # 選擇池化類型 if mode == "max": A[i, h, w, c] = np.max(a_prev_slice) elif mode == "average": A[i, h, w, c] = np.mean(a_prev_slice) cache = (A_prev, hparameters) assert(A.shape == (m, n_H, n_W, n_C)) return A, cache
定義好前向傳播之后,難點和關鍵點就在于如何給卷積和池化過程定義反向傳播過程。卷積層的反向傳播向來是個復雜的過程,Tensorflow中我們只要定義好前向傳播過程,反向傳播會自動進行計算。但利用numpy搭建 CNN 反向傳播就還得我們自己定義了。其關鍵還是在于準確的定義損失函數對于各個變量的梯度:
def conv_backward(dZ, cache): """ Arguments: dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C) cache -- cache of values needed for the conv_backward(), output of conv_forward() Returns: dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev), numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev) dW -- gradient of the cost with respect to the weights of the conv layer (W) numpy array of shape (f, f, n_C_prev, n_C) db -- gradient of the cost with respect to the biases of the conv layer (b) numpy array of shape (1, 1, 1, n_C) """ # 獲取前向傳播中存儲的cache (A_prev, W, b, hparameters) = cache # 前一層輸入的shape (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape # 濾波器的 shape (f, f, n_C_prev, n_C) = W.shape # 步幅和權重參數 stride = hparameters['stride'] pad = hparameters['pad'] # dZ 的shape (m, n_H, n_W, n_C) = dZ.shape # 初始化 dA_prev, dW, db dA_prev = np.zeros((m, n_H_prev, n_W_prev, n_C_prev)) dW = np.zeros((f, f, n_C_prev, n_C)) db = np.zeros((1, 1, 1, n_C)) # 對A_prev 和 dA_prev 執行零填充 A_prev_pad = zero_pad(A_prev, pad) dA_prev_pad = zero_pad(dA_prev, pad) for i in range(m): # select ith training example from A_prev_pad and dA_prev_pad a_prev_pad = A_prev_pad[i,:,:,:] da_prev_pad = dA_prev_pad[i,:,:,:] for h in range(n_H): for w in range(n_W): for c in range(n_C): # 獲取當前感受野 vert_start = h * stride vert_end = vert_start + f horiz_start = w * stride horiz_end = horiz_start + f # 獲取當前濾波器矩陣 a_slice = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] # 梯度更新 da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c] dW[:,:,:,c] += a_slice * dZ[i, h, w, c] db[:,:,:,c] += dZ[i, h, w, c] dA_prev[i, :, :, :] = da_prev_pad[pad:-pad, pad:-pad, :] assert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev)) return dA_prev, dW, db
def create_mask_from_window(x): """ Creates a mask from an input matrix x, to identify the max entry of x. Arguments: x -- Array of shape (f, f) Returns: mask -- Array of the same shape as window, contains a True at the position corresponding to the max entry of x. """ mask = (x == np.max(x)) return mask
def distribute_value(dz, shape): """ Distributes the input value in the matrix of dimension shape Arguments: dz -- input scalar shape -- the shape (n_H, n_W) of the output matrix for which we want to distribute the value of dz Returns: a -- Array of size (n_H, n_W) for which we distributed the value of dz """ (n_H, n_W) = shape # Compute the value to distribute on the matrix average = dz / (n_H * n_W) # Create a matrix where every entry is the "average" value a = np.full(shape, average) return a
def pool_backward(dA, cache, mode = "max"): """ Arguments: dA -- gradient of cost with respect to the output of the pooling layer, same shape as A cache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters mode -- the pooling mode you would like to use, defined as a string ("max" or "average") Returns: dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev """ # Retrieve information from cache (A_prev, hparameters) = cache # Retrieve hyperparameters from "hparameters" stride = hparameters['stride'] f = hparameters['f'] # Retrieve dimensions from A_prev's shape and dA's shape m, n_H_prev, n_W_prev, n_C_prev = A_prev.shape m, n_H, n_W, n_C = dA.shape # Initialize dA_prev with zeros dA_prev = np.zeros((m, n_H_prev, n_W_prev, n_C_prev)) for i in range(m): # select training example from A_prev a_prev = A_prev[i,:,:,:] for h in range(n_H): for w in range(n_W): for c in range(n_C): # Find the corners of the current "slice" vert_start = h * stride vert_end = vert_start + f horiz_start = w * stride horiz_end = horiz_start + f # Compute the backward propagation in both modes. if mode == "max": a_prev_slice = a_prev[vert_start:vert_end, horiz_start:horiz_end, c] mask = create_mask_from_window(a_prev_slice) dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += np.multiply(mask, dA[i,h,w,c]) elif mode == "average": # Get the value a from dA da = dA[i,h,w,c] # Define the shape of the filter as fxf shape = (f,f) # Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(da, shape) # Making sure your output shape is correct assert(dA_prev.shape == A_prev.shape) return dA_prev
