Sign language digit classification comparing custom CNN vs pretrained ResNet-34. Trained on 0-9 hand gesture images. Evaluates accuracy and performance of both models with and without freezing layer.
Sign Language Digits Dataset - Hand gesture images for digits 0 through 9.
| Split | Number of Images |
|---|---|
| Train | 1,649 (80%) |
| Test | 413 (20%) |
| Total | 2,062 |
Preprocessing:
- Resize to 224×224 pixels
- Convert grayscale to RGB (3 channels)
- Random horizontal flip and rotation (±10°)
- Normalization (mean=0.5, std=0.5)
| Layer | Details |
|---|---|
| Conv1 | 3→6 channels, kernel 5×5 + ReLU + MaxPool |
| Conv2 | 6→16 channels, kernel 5×5 + ReLU + MaxPool |
| FC1 | 16×53×53 → 120 neurons + ReLU |
| FC2 | 120 → 84 neurons + ReLU |
| FC3 | 84 → 10 neurons (digits 0-9) |
- Pretrained on ImageNet
- Modified final fully connected layer for 10 classes
- Two training approaches tested:
- No Freezing: All layers trainable
- With Freezing: Only final FC layer + layer4 trainable
| Model | Approach | Train Accuracy | Test Accuracy |
|---|---|---|---|
| ResNet-34 | No Freezing | 98.85% | 99.03% |
| ResNet-34 | With Freezing | 97.88% | 96.37% |
| Custom CNN | No Freezing | 97.21% | 89.83% |
| Custom CNN | With Freezing | 96.60% | 89.10% |
| Digit | Accuracy |
|---|---|
| 0 | 100.00% |
| 1 | 97.44% |
| 2 | 98.18% |
| 3 | 100.00% |
| 4 | 100.00% |
| 5 | 97.67% |
| 6 | 97.62% |
| 7 | 100.00% |
| 8 | 100.00% |
| 9 | 100.00% |
- ✅ Best overall: ResNet-34 without freezing - 99.03% test accuracy
- ✅ Unfreezing all ResNet-34 layers outperformed frozen version by 2.66%
- ✅ Custom CNN achieved consistent ~89% accuracy
- ✅ 6 out of 10 digits achieved 100% accuracy with best model
pip install torch torchvision numpy matplotlib pillow rarfileResNet-34 with transfer learning significantly outperforms Custom CNN for sign language digit classification. Unfreezing all layers during fine-tuning provides better results than freezing most layers. The model achieves near-perfect classification (99.03%) on the test set.