Understanding CNN (Convolution Neural Network)

3 min readApr 6, 2024

What is CNN ?

A Convolutional Neural Network (CNN) is a type of artificial neural network that is commonly used for image processing and analysis. It is designed to automatically learn and extract features from images, reducing the need for manual feature engineering. CNNs consist of layers of convolutional filters that scan over the input image, detecting patterns such as edges, shapes, and textures. These filters help to identify important features that can be used for image classification, object detection, and other tasks. CNNs can also be trained to recognize patterns in large datasets, making them useful for a variety of applications such as medical imaging, facial recognition, and self-driving cars. Overall, CNNs are a powerful tool for image analysis and have revolutionized the field of computer vision.

Layers in CNN

Layers are the building blocks of a CNN and perform computations on the input data.
Convolutional layers apply a set of filters to the input image to extract features such as edges, corners, and textures.
Pooling layers reduce the spatial dimensions of the feature maps by taking the maximum or average value within a sliding window.
Fully connected layers connect all the neurons in the previous layer to the neurons in the next layer and perform the final classification task.
Layers in a CNN work together to automatically learn and extract features from images, reducing the need for manual feature engineering.

Here is a small project on CNN

Step 1 : Data Preprocessing

pip install tensorflow
pip install keras
import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator

ImageDataGenerator creates modified versions of the images in the training data by randomly rotating, shifting, shearing, zooming, and flipping them. These modified images are then used to train the model, which helps to improve its ability to generalize and reduces overfitting.

train_datagen=ImageDataGenerator(rescale=1./255,
                                shear_range=0.2,
                                zoom_range=0.2,
                                horizontal_flip=True)
training_set=train_datagen.flow_from_directory(r"path of training dataset",target_size=(64,64),batch_size=32,class_mode='binary')

test_datagen=ImageDataGenerator(rescale=1./255)
test_set=train_datagen.flow_from_directory(r"path of test dataset",target_size=(64,64),batch_size=32,class_mode='binary')

target_size is a tuple that specifies the desired dimensions of the input images. For example, if you set target_size=(64,64) then all input images will be resized to have a height and width of 64 pixels. This is important to do because neural networks require fixed-size input.
batch_size is an integer that specifies the number of images to process at once. This is a trade-off between memory usage and training speed. Larger batch sizes allow for more efficient use of GPU resources, but require more memory.
class_mode is a string that specifies the type of label to use. For binary classification, you can set class_mode='binary'. For multi-class classification, you can set class_mode='categorical' or class_mode='sparse'.
shear_range, zoom_range, and horizontal_flip are parameters that control data augmentation. Data augmentation is a technique that generates new training examples by applying random transformations to the existing data. This can help improve the generalization performance of the model.
shear_range is a float that specifies the maximum amount of shear to apply. Shear transformations distort the image by slanting it in a certain direction.
zoom_range is a tuple that specifies the minimum and maximum amount of zoom to apply. Zooming in can help the model learn to recognize fine details.
horizontal_flip is a boolean that specifies whether to flip the image horizontally. This can help the model learn to recognize symmetrical features.

Step 2: Building the CNN

cnn=tf.keras.models.Sequential() #Initializing CNN
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=[64, 64, 3]))# Convolution
cnn.add(tf.keras.layers.MaxPooling2D(pool_size=2, strides=2))#Pooling

# Second Convolution
cnn.add(tf.keras.layers.Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=[64, 64, 3]))
cnn.add(tf.keras.layers.MaxPooling2D(pool_size=2, strides=2))

#Flattening Layer
cnn.add(tf.keras.layers.Flatten())

#Full Connected Layer
cnn.add(tf.keras.layers.Dense(units=128,activation='relu'))

#output Layer
cnn.add(tf.keras.layers.Dense(units=1,activation='sigmoid'))

#Compiling The CNN
cnn.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])

#Training the cnn on the training Set and evaluating it on test set
cnn.fit(x=training_set, validation_data=test_set,epochs=25)

#Predicting The single output
import numpy as np
from keras.preprocessing import image
test_image = image.load_img(r'add the path of the image', target_size=(64, 64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis=0)
result = cnn.predict(test_image)
training_set.class_indices
if result[0][0] == 1:
    prediction = 'dog'
else:
    prediction = 'cat'

print(prediction)

This was a small project that will predict whether the image was of dog or cat using CNN . I have also uploaded the project with dataset on my github account(https://github.com/namanupmanyu).