You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hierarchical Relational Networks for Group Activity Recognition
A PyTorch implementation of Hierarchical Relational Networks for Group Activity Recognition, based on the ECCV 2018 paper by Ibrahim & Mori. This implementation extends the original work with modern training practices, ResNet50 backbone, and Graph Attention Networks.
This project addresses the challenge of understanding collective behavior from individual person features and their relationships in volleyball game scenarios.
The relational layer is the core building block. Given K people and a relationship graph G:
Input: K person feature vectors + relationship graph encoding player connections
Processing: Shared neural network F maps connected person pairs to relational representations
Aggregation: Messages from neighbors are summed to create new representations
Output: K relational feature vectors encoding individual features and relationships
Relational Unit
Mathematical formulation:
P_i^l = Σ F(P_i^(l-1) ⊕ P_j^(l-1); θ) for all j ∈ neighbors(i)
Complete Pipeline
Stage
Operation
Output Dimension
Input
12 players with CNN features
2048-D
Layer 1
4 cliques (3 players each)
512-D
Layer 2
2 cliques (teams)
256-D
Layer 3
1 clique (all players)
128-D
Pooling
Team-aware max pooling
256-D
Output
Softmax classification
8 classes
Results
Comparison with Original Paper
Original ECCV 2018 results (VGG19 backbone, Lasagne framework)
My Scores
Stage 1: Person Action Classification
Model
Backbone
Accuracy
Person Classifier
ResNet50
80.95%
Stage 2: Non-Temporal Models
Model
Paper
Ours
Δ
B1-NoRelations
85.1%
90.06%
+4.96%
RCRG-1R-1C
86.5%
90.82%
+4.32%
RCRG-2R-11C
86.1%
90.28%
+4.18%
RCRG-2R-11C-conc
88.3%
90.15%
+1.85%
RCRG-2R-21C
87.2%
90.54%
+3.34%
RCRG-3R-421C
86.4%
89.97%
+3.57%
Stage 3: Temporal Models
Model
Paper
Ours
Δ
RCRG-2R-11C-conc-Temporal
89.5%
91.02%
+1.52%
RCRG-2R-21C-Temporal
89.4%
91.32%
+1.92%
Extended: Attention Models (Our Contribution)
Model
Accuracy
RCRG-2R-21C-GAT
90.92%
RCRG-2R-11C-conc-Temp-GAT
91.85% ⭐
Best Model: RCRG-2R-11C-conc-Temp-GAT
Our best performing model combines Graph Attention Networks with temporal LSTM modeling, achieving 91.85% accuracy on the test set.
Per-Class Performance
Class
Precision
Recall
F1-Score
Support
l-pass
0.923
0.951
0.937
226
r-pass
0.900
0.900
0.900
210
l-spike
0.954
0.927
0.941
179
r-spike
0.910
0.936
0.923
173
l-set
0.927
0.905
0.916
168
r-set
0.913
0.875
0.894
192
l-winpoint
0.898
0.951
0.924
102
r-winpoint
0.919
0.908
0.913
87
Weighted Avg
0.919
0.919
0.918
1337
Confusion Matrix
The confusion matrix shows strong diagonal dominance with minimal misclassifications. The model performs particularly well on l-spike (95.4% precision) and l-pass (95.1% recall). Most confusion occurs between similar activities on opposite sides (e.g., l-set vs r-set).
Key Findings
ResNet50 > VGG19 — Backbone upgrade improved all variants by 2-5%
Relational layers help — Even 1-layer models outperform the baseline
✅ Full PyTorch re-implementation with modern practices
✅ ResNet50 backbone (+2-5% accuracy)
✅ Graph Attention Network extension
✅ Multi-GPU distributed training (DDP)
✅ Automatic Mixed Precision (AMP)
✅ TensorBoard logging
About
A PyTorch implementation of Hierarchical Relational Networks for Group Activity Recognition, based on the ECCV 2018 paper by Ibrahim & Mori. This implementation extends the original work with modern training practices, ResNet50 backbone, and Graph Attention Networks.