by Shashank Pandey
There have been a lot of attempts to combine CNN and RNN for image based sequence recognition or video classification tasks. LRCN was proposed by Jeff Donhue in 2016. It is a combination of both RNN and CNN, end-to-end trainable and suitable for large-scale visual understanding tasks such as video description, activity recognition and image captioning. If we talk about the working of LRCN then, it works by passing each visual input (an image) through a feature transformation with parameters, usually a CNN, to produce a fixed-length vector representation.
The outputs are then passed into a recurrent sequence learning module. The ease with which these tools can be incorporated into existing visual recognition pipelines makes them a natural choice for perceptual problems with time-varying visual input or sequential outputs, which these methods are able to handle with little input preprocessing and no hand-designed feature.
Implementation Code:-
def LRCN(self):
model = Sequential()
model.add(TimeDistributed(Convolution2D(32, (7,7), strides=(2, 2),
padding=’same’, activation=’relu’), input_shape=self.input_shape))
model.add(TimeDistributed(Convolution2D(32, (3,3),
kernel_initializer=”he_normal”, activation=’relu’)))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Convolution2D(64, (3,3),
padding=’same’, activation=’relu’)))
model.add(TimeDistributed(Convolution2D(64, (3,3),
padding=’same’, activation=’relu’)))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Convolution2D(128, (3,3),
padding=’same’, activation=’relu’)))
model.add(TimeDistributed(Convolution2D(128, (3,3),
padding=’same’, activation=’relu’)))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Convolution2D(256, (3,3),
padding=’same’, activation=’relu’)))
model.add(TimeDistributed(Convolution2D(256, (3,3),
padding=’same’, activation=’relu’)))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(Dropout(0.7))
model.add(LSTM(512, return_sequences=False, dropout=0.5))
model.add(Dense(self.nb_classes, activation=’softmax’))
return model
References
Comentarios