Skip to content

Tayyabah-Rehman/Image-Classification

Repository files navigation

Image-Classification

Sign language digit classification comparing custom CNN vs pretrained ResNet-34. Trained on 0-9 hand gesture images. Evaluates accuracy and performance of both models with and without freezing layer.

📁 Dataset

Sign Language Digits Dataset - Hand gesture images for digits 0 through 9.

Split Number of Images
Train 1,649 (80%)
Test 413 (20%)
Total 2,062

Preprocessing:

  • Resize to 224×224 pixels
  • Convert grayscale to RGB (3 channels)
  • Random horizontal flip and rotation (±10°)
  • Normalization (mean=0.5, std=0.5)

🏗️ Models Compared

Custom CNN Architecture

Layer Details
Conv1 3→6 channels, kernel 5×5 + ReLU + MaxPool
Conv2 6→16 channels, kernel 5×5 + ReLU + MaxPool
FC1 16×53×53 → 120 neurons + ReLU
FC2 120 → 84 neurons + ReLU
FC3 84 → 10 neurons (digits 0-9)

ResNet-34 (Pretrained)

  • Pretrained on ImageNet
  • Modified final fully connected layer for 10 classes
  • Two training approaches tested:
    • No Freezing: All layers trainable
    • With Freezing: Only final FC layer + layer4 trainable

📊 Results Comparison

Model Approach Train Accuracy Test Accuracy
ResNet-34 No Freezing 98.85% 99.03%
ResNet-34 With Freezing 97.88% 96.37%
Custom CNN No Freezing 97.21% 89.83%
Custom CNN With Freezing 96.60% 89.10%

🎯 Per-Class Accuracy (Best Model: ResNet-34 - No Freezing)

Digit Accuracy
0 100.00%
1 97.44%
2 98.18%
3 100.00%
4 100.00%
5 97.67%
6 97.62%
7 100.00%
8 100.00%
9 100.00%

📈 Key Findings

  • Best overall: ResNet-34 without freezing - 99.03% test accuracy
  • ✅ Unfreezing all ResNet-34 layers outperformed frozen version by 2.66%
  • ✅ Custom CNN achieved consistent ~89% accuracy
  • ✅ 6 out of 10 digits achieved 100% accuracy with best model

🚀 Getting Started

Prerequisites

pip install torch torchvision numpy matplotlib pillow rarfile

Conclusion

ResNet-34 with transfer learning significantly outperforms Custom CNN for sign language digit classification. Unfreezing all layers during fine-tuning provides better results than freezing most layers. The model achieves near-perfect classification (99.03%) on the test set.

About

Sign language digit classification comparing custom CNN vs pretrained ResNet-34. Trained on 0-9 hand gesture images. Evaluates accuracy and performance of both models with and without freezing layer.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors