數(shù)據(jù)科學家Prakash Jay介紹了遷移學習的原理,基于Keras實現(xiàn)遷移學習,以及遷移學習的常見情形。
Inception-V3
什么是遷移學習?
機器學習中的遷移學習問題,關注如何保存解決一個問題時獲得的知識,并將其應用于另一個相關的不同問題。
為什么遷移學習?
在實踐中,很少有人從頭訓練一個卷積網(wǎng)絡,因為很難獲取足夠的數(shù)據(jù)集。使用預訓練的網(wǎng)絡有助于解決大多數(shù)手頭的問題。
訓練深度網(wǎng)絡代價高昂。即使使用數(shù)百臺配備了昂貴的GPU的機器,訓練最復雜的模型也需要好多周。
決定深度學習的拓撲/特色/訓練方法/超參數(shù)是沒有多少理論指導的黑魔法。
我的經(jīng)驗
不要試圖成為英雄。
—— Andrej Karapathy
我面對的大多數(shù)計算機視覺問題沒有非常大的數(shù)據(jù)集(5000-40000圖像)。即使使用極端的數(shù)據(jù)增強策略,也很難達到像樣的精確度。而在少量數(shù)據(jù)集上訓練數(shù)百萬參數(shù)的網(wǎng)絡通常會導致過擬合。所以遷移學習是我的救星。
遷移學習為何有效?
讓我們看下深度學習網(wǎng)絡學習了什么,靠前的層嘗試檢測邊緣,中間層嘗試檢測形狀,而靠后的層嘗試檢測高層數(shù)據(jù)特征。這些訓練好的網(wǎng)絡通常有助于解決其他計算機視覺問題。
下面,讓我們看下如何使用Keras實現(xiàn)遷移學習,以及遷移學習的常見情形。
基于Keras的簡單實現(xiàn)
from keras import applications
from keras.preprocessing.image importImageDataGenerator
from keras import optimizers
from keras.models importSequential, Model
from keras.layers importDropout, Flatten, Dense, GlobalAveragePooling2D
from keras import backend as k
from keras.callbacks importModelCheckpoint, LearningRateScheduler, TensorBoard, EarlyStopping
img_width, img_height = 256, 256
train_data_dir = "data/train"
validation_data_dir = "data/val"
nb_train_samples = 4125
nb_validation_samples = 466
batch_size = 16
epochs = 50
model = applications.VGG19(weights = "imagenet", include_top=False, input_shape = (img_width, img_height, 3))
"""
層 (類型) 輸出形狀 參數(shù)數(shù)量
=================================================================
input_1 (InputLayer) (None, 256, 256, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 256, 256, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 256, 256, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 128, 128, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 128, 128, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 128, 128, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 64, 64, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 64, 64, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 64, 64, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 64, 64, 256) 590080
_________________________________________________________________
block3_conv4 (Conv2D) (None, 64, 64, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 32, 32, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 32, 32, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
block4_conv4 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 16, 16, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block5_conv4 (Conv2D) (None, 16, 16, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 8, 8, 512) 0
=================================================================
總參數(shù): 20,024,384.0
可訓練參數(shù): 20,024,384.0
不可訓練參數(shù): 0.0
"""
# 凍結不打算訓練的層。這里我凍結了前5層。
for layer in model.layers[:5]:
layer.trainable = False
# 增加定制層
x = model.output
x = Flatten()(x)
x = Dense(1024, activation="relu")(x)
x = Dropout(0.5)(x)
x = Dense(1024, activation="relu")(x)
predictions = Dense(16, activation="softmax")(x)
# 創(chuàng)建最終模型
model_final = Model(input = model.input, output = predictions)
# 編譯最終模型
model_final.compile(loss = "categorical_crossentropy", optimizer = optimizers.SGD(lr=0.0001, momentum=0.9), metrics=["accuracy"])
# 數(shù)據(jù)增強
train_datagen = ImageDataGenerator(
rescale = 1./255,
horizontal_flip = True,
fill_mode = "nearest",
zoom_range = 0.3,
width_shift_range = 0.3,
height_shift_range=0.3,
rotation_range=30)
test_datagen = ImageDataGenerator(
rescale = 1./255,
horizontal_flip = True,
fill_mode = "nearest",
zoom_range = 0.3,
width_shift_range = 0.3,
height_shift_range=0.3,
rotation_range=30)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size = (img_height, img_width),
batch_size = batch_size,
class_mode = "categorical")
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size = (img_height, img_width),
class_mode = "categorical")
# 保存模型
checkpoint = ModelCheckpoint("vgg16_1.h5", monitor='val_acc', verbose=1, save_best_only=True, save_weights_only=False, mode='auto', period=1)
early = EarlyStopping(monitor='val_acc', min_delta=0, patience=10, verbose=1, mode='auto')
# 訓練模型
model_final.fit_generator(
train_generator,
samples_per_epoch = nb_train_samples,
epochs = epochs,
validation_data = validation_generator,
nb_val_samples = nb_validation_samples,
callbacks = [checkpoint, early])
遷移學習的常見情形
別忘了,靠前的層中的卷積特征更通用,靠后的層中的卷積特征更針對原本的數(shù)據(jù)集。遷移學習有4種主要場景:
1. 新數(shù)據(jù)集較小,和原數(shù)據(jù)集相似
如果我們嘗試訓練整個網(wǎng)絡,容易導致過擬合。由于新數(shù)據(jù)和原數(shù)據(jù)相似,因此我們期望卷積網(wǎng)絡中的高層特征和新數(shù)據(jù)集相關。因此,建議凍結所有卷積層,只訓練分類器(比如,線性分類器):
for layer in model.layers:
layer.trainable = False
2. 新數(shù)據(jù)集較大,和原數(shù)據(jù)集相似
由于我們有更多數(shù)據(jù),我們更有自信,如果嘗試對整個網(wǎng)絡進行精細調(diào)整,不會導致過擬合。
for layer in model.layers:
layer.trainable = True
其實默認值就是True,上面的代碼明確指定所有層可訓練,是為了更清楚地強調(diào)這一點。
由于開始的幾層檢測邊緣,你也可以選擇凍結這些層。比如,以下代碼凍結VGG19的前5層:
for layer in model.layers[:5]:
layer.trainable = False
3. 新數(shù)據(jù)集很小,但和原數(shù)據(jù)很不一樣
由于數(shù)據(jù)集很小,我們大概想要從靠前的層提取特征,然后在此之上訓練一個分類器:(假定你對h5py有所了解)
from keras import applications
from keras.preprocessing.image importImageDataGenerator
from keras import optimizers
from keras.models importSequential, Model
from keras.layers importDropout, Flatten, Dense, GlobalAveragePooling2D
from keras import backend as k
from keras.callbacks importModelCheckpoint, LearningRateScheduler, TensorBoard, EarlyStopping
img_width, img_height = 256, 256
### 創(chuàng)建網(wǎng)絡
img_input = Input(shape=(256, 256, 3))
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
# 塊2
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)
model = Model(input = img_input, output = x)
model.summary()
"""
_________________________________________________________________
層 (類型) 輸出形狀 參數(shù)數(shù)量
=================================================================
input_1 (InputLayer) (None, 256, 256, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 256, 256, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 256, 256, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 128, 128, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 128, 128, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 128, 128, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 64, 64, 128) 0
=================================================================
總參數(shù):260,160.0
可訓練參數(shù):260,160.0
不可訓練參數(shù):0.0
"""
layer_dict = dict([(layer.name, layer) for layer in model.layers])
[layer.name for layer in model.layers]
"""
['input_1',
'block1_conv1',
'block1_conv2',
'block1_pool',
'block2_conv1',
'block2_conv2',
'block2_pool']
"""
import h5py
weights_path = 'vgg19_weights.h5'# ('https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels.h5)
f = h5py.File(weights_path)
list(f["model_weights"].keys())
"""
['block1_conv1',
'block1_conv2',
'block1_pool',
'block2_conv1',
'block2_conv2',
'block2_pool',
'block3_conv1',
'block3_conv2',
'block3_conv3',
'block3_conv4',
'block3_pool',
'block4_conv1',
'block4_conv2',
'block4_conv3',
'block4_conv4',
'block4_pool',
'block5_conv1',
'block5_conv2',
'block5_conv3',
'block5_conv4',
'block5_pool',
'dense_1',
'dense_2',
'dense_3',
'dropout_1',
'global_average_pooling2d_1',
'input_1']
"""
# 列出模型中的所有層的名稱
layer_names = [layer.name for layer in model.layers]
"""
# 提取`.h5`文件中每層的模型權重
>>> f["model_weights"]["block1_conv1"].attrs["weight_names"]
array([b'block1_conv1/kernel:0', b'block1_conv1/bias:0'],
dtype='|S21')
# 將這一數(shù)組分配給weight_names
>>> f["model_weights"]["block1_conv1"]["block1_conv1/kernel:0]
# 列表推導(weights)儲存層的權重和偏置
>>>layer_names.index("block1_conv1")
1
>>> model.layers[1].set_weights(weights)
# 為特定層設置權重。
使用for循環(huán)我們可以為整個網(wǎng)絡設置權重。
"""
for i in layer_dict.keys():
weight_names = f["model_weights"][i].attrs["weight_names"]
weights = [f["model_weights"][i][j] for j in weight_names]
index = layer_names.index(i)
model.layers[index].set_weights(weights)
import cv2
import numpy as np
import pandas as pd
from tqdm import tqdm
import itertools
import glob
features = []
for i in tqdm(files_location):
im = cv2.imread(i)
im = cv2.resize(cv2.cvtColor(im, cv2.COLOR_BGR2RGB), (256, 256)).astype(np.float32) / 255.0
im = np.expand_dims(im, axis =0)
outcome = model_final.predict(im)
features.append(outcome)
## 收集這些特征,創(chuàng)建一個dataframe,在其上訓練一個分類器
以上代碼提取block2_pool特征。通常而言,由于這層有64 x 64 x 128特征,在其上訓練一個分類器可能于事無補。我們可以加上一些全連接層,然后在其基礎上訓練神經(jīng)網(wǎng)絡。
增加少量全連接層和一個輸出層。
為靠前的層設置權重,然后凍結。
訓練網(wǎng)絡。
4. 新數(shù)據(jù)集很大,和原數(shù)據(jù)很不一樣
由于你有一個很大的數(shù)據(jù)集,你可以設計你自己的網(wǎng)絡,或者使用現(xiàn)有的網(wǎng)絡。
你可以基于隨機初始化權重或預訓練網(wǎng)絡權重初始化訓練網(wǎng)絡。一般選擇后者。
你可以使用不同的網(wǎng)絡,或者基于現(xiàn)有網(wǎng)絡做些改動。
-
機器學習
+關注
關注
66文章
8406瀏覽量
132565 -
深度學習
+關注
關注
73文章
5500瀏覽量
121113
原文標題:基于Keras進行遷移學習
文章出處:【微信號:jqr_AI,微信公眾號:論智】歡迎添加關注!文章轉載請注明出處。
發(fā)布評論請先 登錄
相關推薦
評論