RNN-with-and-without-CNN

This project deals with the approach of recurrent neural networks (RNN) by implementing six models (three RNN models and three RNN + CNN models) designed to classify 12 human actions (at night). names_class : ['Drink', 'Picking', 'Push', 'Run', 'Throwing objects', 'boxing', 'lifting weights', 'receiving the phone', 'stand', 'walking on stairs', 'walking with flashlight', 'waving']

Download dataset

The dataset is 'https://www.kaggle.com/api/v1/datasets/download/lakavathakshay/noctact-har', used in this project, contains 6613 .mp4 videos of up to 15 seconds. (download_data.ipynb)

Data preprocessing

Data preparation for processing is performed in the video_to_dataset.ipynb file using the video_convert class. This class has the following features:

Creating the database, i.e., retrieving information from the directory structure containing the films (indexing characteristics and indexing labe
Changing image size in two steps:
- Define the desired image ratio between width and height by adding values of 0, as the case may be, left and right, or top and bottom
- changing the resolution of images to the desired form
extracting the desired number of frames so that the captured frames are distributed evenly over the length of the film
generating the training dataset
generation of the data set for validation
generating a small test data set (normally, the test set should be much larger, but in this case, it is used to test functionality)
method of saving data sets on physical media

Recurrent neural networks (RNN)

Recurrent neural networks (RNNs) are neural networks used especially for time series or models whose predictions are based on data sets that have the characteristic of a chain of successive phenomena, such as sequences of film, sound, text, etc.

Recurrent neural networks (RNNs) have an architecture similar to artificial neural networks (ANNs), but unlike ANNs, RNNs use the same weights in each run sequence, while also passing on a feature emitted by the previous sequence, i.e., HL t1 combined with the characteristics from sequence HL t2

Taking the idea illustrated on the right side of the diagram above, the distribution of the data set ordered over a length of time (xt) in the cells of the RNN network is illustrated in the diagram below. It should be noted that in the case of multiple RNN layers, y’t (the prediction from the current layer) becomes the input feature of the upper layer, i.e., xt.

Simple RNN

SimpleRNN is the simplest form of the recurrent neural networks. Its functionality is achieved by combining the input features of sequence t1 with the features emitted by the previous sequence t and activated with a hyperbolic tangent. The equation of the activation function is

Equations for implementing the SimpleRNN block

A simple approach :
# Define sample input shape (e.g., 16 sequences, 5 time steps, 3 features)
batch_size = 16
time_steps = 5
features = 3
input_data = tf.random.normal((batch_size, time_steps, features), dtype = tf.float32)
print(input_data.shape)
# number of units per layer
units_size_per_layer = 4
SimpleRNN_whole_sequence_output, SimpleRNN_final_memory_state = SimpleRNN(units_size_per_layer, return_sequences=True, return_state=True)(input_data)
print(SimpleRNN_whole_sequence_output.shape)
print(SimpleRNN_final_memory_state.shape)
(16, 5, 3)
(16, 5, 4)
(16, 4)

Long Short-Term Memory ( LSTM )

Long Short-Term Memory (LSTM) is an improved version of simple Recurrent Neural Networks. The main difference between SimpleRNN and LSTM is that, in addition to the hidden state taken from the previous sequence and concatenated with the input features corresponding to the time sequence, LSTM networks have a memory cell with extended information as a period (Cell State).

The architecture of LSTM networks consists of three gates:

Forget gate: determines what information is deleted from the cell memory
Input gate: controls what information is added to the cell memory
Output gate: controls what information comes out of the cell memory

Equations for implementing the LSTM block
- Forget gate
- Input gate

- Cell State
- Output gate

A simple approach :
# Define sample input shape (e.g., 16 sequences, 5 time steps, 3 features)
batch_size = 16
time_steps = 5
features = 3
input_data = tf.random.normal((batch_size, time_steps, features), dtype = tf.float32)
print(input_data.shape)
# number of units per layer
units_size_per_layer = 4
LSTM_whole_sequence_output, LSTM_final_memory_state, LSTM_final_carry_state = LSTM(units_size_per_layer, return_sequences=True, return_state=True)(input_data)
print(LSTM_whole_sequence_output.shape)
print(LSTM_final_memory_state.shape)
print(LSTM_final_carry_state.shape)
(16, 5, 3)
(16, 5, 4)
(16, 4)
(16, 4)

Gated Recurrent Unit ( GRU )

Gate recurrent units (GRUs) are a type of RNN whose principle is to use gate mechanisms to selectively update the hidden state at each time step, allowing them to retain important information and eliminate irrelevant details. GRU is a simplified version of LSTM architectures and consists of two main gates: the update gate and the reset gate.

Update gate zt: this gate decides how much information from the previous hidden state h (t-1) should be retained for the next
Reset Gate rt: This gate determines how much of the hidden state from the past h (t-1) should be forgotten.

The implementation equations of the GRU block

Update Gate

Reset Gate

Ouput

A simple approach :
# Define sample input shape (e.g., 16 sequences, 5 time steps, 3 features)
batch_size = 16
time_steps = 5
features = 3
input_data = tf.random.normal((batch_size, time_steps, features), dtype = tf.float32)
print(input_data.shape)
# number of units per layer
units_size_per_layer = 4
GRU_whole_sequence_output, GRU_final_memory_state = GRU(units_size_per_layer, return_sequences=True, return_state=True, unroll=True)(input_data)
print(GRU_whole_sequence_output.shape)
print(GRU_final_memory_state.shape)
(16, 5, 3)
(16, 5, 4)
(16, 4)

Time distributed layer

TimeDistributed is a method whereby a certain layer, method function, etc., can be executed successively with different input characteristics, returning a number of results equal to the number of inputs. This method is used, for example, in processing data series, video frames, audio sequences, etc., where each time step is treated independently with the same method. For example, the figure below shows the successive approach of a CNN layer using TimeDistributed.

Description of models

The six models were designed to highlight the functionality of RNNs in the context of their use in models involving the analysis of actions in a video recording. To this end, three models were created that use only the three types of RNN, namely SimpleRNN, LSTM, and GRU. As can be seen in the diagram below, these models are composed of three RNN layers and a final Dense layer.

SimpleRNN_model - Trainable params: 29,523,564 (112.62 MB)
LSTM_model - Trainable params: 118,093,068 (450.49 MB)
GRU_model - Trainable params: 88,570,572 (337.87 MB)

Due to the large input characteristics, i.e., 10 time steps containing 240x320x3 frames, the parameter matrices are very large, as can be seen above. One solution for improving the efficiency of models containing RNN networks for predicting datasets composed of video recordings is to use CNN networks in the composition of the models. Thus, three other models were created by extending the models mentioned above using the TimeDistributed method, in which a CNN layer was integrated. The diagram below shows the approach of the three new models.

CNN_SimpleRNN_model - Trainable params: 3,089,004 (11.78 MB)
CNN_LSTM_model - Trainable params: 11,835,660 (45.15 MB)
CNN_GRU_model - Trainable params: 8,920,780 (34.03 MB)

It should be noted that the descriptions of the Input, MaxPooling2...., and other methods have been omitted !

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
CNN_GRU.ipynb		CNN_GRU.ipynb
CNN_GRU_model_matrix_conf.png		CNN_GRU_model_matrix_conf.png
CNN_LSTM.ipynb		CNN_LSTM.ipynb
CNN_LSTM_model_matrix_conf.png		CNN_LSTM_model_matrix_conf.png
CNN_SimpleRNN.ipynb		CNN_SimpleRNN.ipynb
CNN_SimpleRNN_model_matrix_conf.png		CNN_SimpleRNN_model_matrix_conf.png
GRU.ipynb		GRU.ipynb
GRU_model_matrix_conf.png		GRU_model_matrix_conf.png
LSMT.ipynb		LSMT.ipynb
LSTM_model_matrix_conf.png		LSTM_model_matrix_conf.png
README.md		README.md
SimpleRNN.ipynb		SimpleRNN.ipynb
SimpleRNN_model_matrix_conf.png		SimpleRNN_model_matrix_conf.png
compare_results.ipynb		compare_results.ipynb
download_data.ipynb		download_data.ipynb
learning_RNN.ipynb		learning_RNN.ipynb
video_to_dataset.ipynb		video_to_dataset.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNN-with-and-without-CNN

Download dataset

Data preprocessing

Recurrent neural networks (RNN)

Simple RNN

Long Short-Term Memory ( LSTM )

Gated Recurrent Unit ( GRU )

Time distributed layer

Description of models

Comparative diagram of the loss function for the 6 models created

Comparative diagram of efficiency (prediction accuracy) for the six models created

Confusion matrices for models built with RNN

Confusion matrices for models built with RNN and CNN

Bibliographer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RNN-with-and-without-CNN

Download dataset

Data preprocessing

Recurrent neural networks (RNN)

Simple RNN

Long Short-Term Memory ( LSTM )

Gated Recurrent Unit ( GRU )

Time distributed layer

Description of models

Comparative diagram of the loss function for the 6 models created

Comparative diagram of efficiency (prediction accuracy) for the six models created

Confusion matrices for models built with RNN

Confusion matrices for models built with RNN and CNN

Bibliographer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages