Handwritten digit classification (DL)

Chaitanya Singh
Jul 25
2 min read

Introduction

In this tutorial we build a convolutional neural network that learns to recognize handwritten digits from a custom image dataset. You will see how to load image files and labels, preprocess the data, define and train a compact CNN in Keras, and finally evaluate its performance on unseen examples.

Data Loading

First, we unzip the archive containing 49 000 PNG images and a CSV file mapping filenames to digit labels (0–9). We read the CSV into a pandas DataFrame so that each row gives us an image filename and its corresponding label. Using Keras’s load_img and img_to_array, we load every 28×28 grayscale image from the Images/train folder into a NumPy array X of shape (49000, 28, 28, 1). The labels go into a vector y of length 49000.

Train‑Test Split and Normalization

We split X and y into an 80 % training set and a 20 % test set, stratifying by digit label to keep class proportions consistent. Every pixel value, originally an integer from 0 to 255, is divided by 255 to bring it into the 0.0–1.0 range. This normalization helps the network converge more quickly during training.

Model Architecture

We define a Keras Sequential model with the following stages.

A 2D convolution layer with 28 filters of size 3×3, which learns local patterns in the image.
A 2×2 max‑pooling layer that halves the spatial dimensions.
A flatten operation that turns the remaining feature maps into a single vector.
A dense (fully connected) layer of 128 units with ReLU activation, introducing nonlinearity.
A dropout layer with rate 0.3, which randomly deactivates neurons during training to reduce overfitting.
A final dense layer of 10 units with softmax activation, producing a probability distribution over the ten digit classes.

We compile the model using the Adam optimizer and sparse categorical cross‑entropy loss, and we track accuracy as the key metric.

Training

We train for 30 epochs with a batch size of 32, feeding the model our normalized training data and validating on the test set at the end of each epoch. During the first few epochs the loss falls rapidly and accuracy climbs above ninety percent. As training continues the model reaches the high nineties in accuracy on the training set while validation accuracy stabilizes slightly lower, indicating good generalization without severe overfitting.

Evaluation and Predictions

After training completes, we evaluate on the held‑out test set to obtain the final accuracy score. To visualize individual predictions, we pick a test example, display its 28×28 grayscale image with Matplotlib, and run model.predict on that single sample. The network outputs probabilities for digits 0–9, and we select the class with the highest probability. Displaying the predicted digit alongside the image confirms the network’s ability to recognize handwritten numbers.

Extensions

You can build on this pipeline by adding more convolutional layers or varying their filter sizes, adjusting dropout rates, or applying data augmentation such as random rotations and translations. These modifications often boost accuracy even further. Feel free to experiment and observe how each change affects your model’s learning and performance.