CartPole Environment with Reinforcement Learning

📋 Table of Contents

Team Members
Project Overview
Environment Analysis
Implementation
Project Structure
Installation
Usage
Results
Models and Architectures
Contributing

👥 Team Members

First Name	Last Name	Student ID	Email
Antonis	Zikas	1115202100038	sdi2100038@di.uoa.gr
Panagiotis	Papapostolou	1115202100142	sdi2100142@di.uoa.gr

🎯 Project Overview

This project implements and experiments with various Reinforcement Learning algorithms to train agents on the CartPole-v1 environment from OpenAI Gymnasium. The main focus is on Deep Q-Network (DQN) implementations with different architectural variations and comparative analysis with other RL algorithms.

Key Features

🧠 Multiple DQN Implementations: Standard DQN, Dueling Architecture, and Transformer-based Q-Networks
📊 Comprehensive Analysis: Performance comparison with random actions and sensitivity studies
🔧 Modular Design: Clean, well-documented code structure
📈 Visualization: Detailed plotting and analysis of training results
🎮 Environment Testing: Baseline performance analysis with random actions
🏆 State-of-the-art Algorithms: Integration with Stable-Baselines3 (PPO, A2C)

🎮 Environment Analysis

CartPole-v1 Environment

The CartPole-v1 environment is a classic control problem where the goal is to balance a pole on a cart by moving the cart left or right.

Action Space

Discrete: 2 possible actions
- 0: Move cart to the left
- 1: Move cart to the right

Observation Space

Continuous: 4-dimensional state vector
- [0]: Cart Position (range: -4.8 to 4.8)
- [1]: Cart Velocity (range: -∞ to +∞)
- [2]: Pole Angle (range: ~-0.418 to 0.418 radians)
- [3]: Pole Angular Velocity (range: -∞ to +∞)

Reward System

+1 for each timestep the pole remains upright
Reward threshold: 500 (considered solved)
Episode terminates when pole angle > ±12° or cart position > ±2.4

🚀 Implementation

Core Algorithms Implemented

Deep Q-Network (DQN)
- Experience replay buffer
- Target network for stable learning
- ε-greedy exploration strategy
Dueling Architecture DQN
- Separate value and advantage streams
- Improved learning efficiency
Transformer-based Q-Network
- Sequential state processing
- Attention mechanism for temporal dependencies
Stable-Baselines3 Integration
- Proximal Policy Optimization (PPO)
- Advantage Actor-Critic (A2C)

Key Components

Neural Networks: Fully connected layers with ReLU activations
Replay Buffer: Experience replay for stable training
Target Networks: Periodic updates for learning stability
Exploration Strategy: ε-greedy with exponential decay

📁 Project Structure

Reinforcement-Learning-Assignment/
│
├── notebooks/
│   └── cart_pole.ipynb          # Main Jupyter notebook with experiments
│
├── src/
│   ├── agents.py                # DQN Agent implementation
│   ├── networks.py              # Neural network architectures
│   ├── trainers.py              # Training logic and utilities
│   ├── replay_buffers.py        # Experience replay buffer
│   ├── testing.py               # Model testing and evaluation
│   ├── plotting.py              # Visualization utilities
│   ├── utils.py                 # Helper functions and hyperparameters
│   ├── dqn.py                   # Main DQN training script
│   ├── env_showcase.py          # Environment demonstration
│   ├── stable_baselines_a2c.py  # A2C training with Stable-Baselines3
│   └── stable_baselines_ppo.py  # PPO training with Stable-Baselines3
│
├── models/                      # Saved trained models
│   ├── dqn_model.pth
│   ├── dueling_arc_dqn_model.pth
│   ├── transformer_model.pth
│   ├── ppo_*.pth
│   └── a2c_*.pth
│
├── reports/
│   ├── figs/                    # Generated plots and visualizations
│   └── PDFs/                    # Final report documents
│
├── logs/
│   └── tensorboard/             # TensorBoard logging for training metrics
│
├── assets/
│   └── imgs/                    # Images and diagrams
│
├── docs/                        # Assignment documentation
├── requirements.txt             # Python dependencies
└── README.md                    # This file

🔧 Installation

Prerequisites

Python 3.8+
CUDA-compatible GPU (optional, for faster training)

Setup Instructions

Clone the repository

git clone <repository-url>
cd Reinforcement-Learning-Assignment

Create virtual environment (recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Key Dependencies

torch: Deep learning framework
gymnasium: OpenAI Gym environments
stable-baselines3: State-of-the-art RL algorithms
matplotlib: Plotting and visualization
numpy: Numerical computations
jupyter: Interactive notebooks

🎮 Usage

Quick Start

Run Environment Showcase
```
python src/env_showcase.py
```
Train DQN Agent
```
python src/dqn.py
```

Train with Stable-Baselines3

python src/stable_baselines_ppo.py  # For PPO
python src/stable_baselines_a2c.py  # For A2C

Interactive Analysis

jupyter notebook notebooks/cart_pole.ipynb

Hyperparameter Configuration

Key hyperparameters are defined in src/utils.py:

GAMMA = 0.99          # Discount factor
LR = 1e-3             # Learning rate
BATCH_SIZE = 64       # Minibatch size
MEMORY_SIZE = 10000   # Replay buffer size
EPSILON_START = 1.0   # Starting exploration probability
EPSILON_END = 0.01    # Minimum exploration probability
EPSILON_DECAY = 0.995 # Epsilon decay rate
TARGET_UPDATE = 10    # Target network update frequency

📊 Results

Performance Comparison

Algorithm	Average Score	Success Rate	Training Episodes
Random Actions	~22	~10%	N/A
DQN	~475+	~95%+	500
Dueling DQN	~480+	~96%+	500
Transformer DQN	~450+	~90%+	500
PPO (Stable-Baselines3)	~500	~99%	Variable
A2C (Stable-Baselines3)	~495+	~98%	Variable

Key Findings

✅ All implemented algorithms significantly outperform random actions
✅ Dueling architecture shows slight improvement over standard DQN
✅ Stable-Baselines3 implementations achieve near-optimal performance
✅ Transformer-based approach shows promise but requires tuning

🧠 Models and Architectures

Standard DQN Architecture

Input Layer (4 nodes) → Hidden Layer (128) → Hidden Layer (128) → Hidden Layer (128) → Output Layer (2 nodes)

Dueling Architecture

Shared layers: 4 → 128 → 128 → 128 → 128
Value stream: 128 → 64 → 1
Advantage stream: 128 → 64 → 2
Combination: Q(s,a) = V(s) + (A(s,a) - mean(A(s,·)))

Transformer Architecture

Sequence length: 10 timesteps
Embedding dimension: 64
Attention heads: 4
Encoder layers: 2

📈 Monitoring and Visualization

The project includes comprehensive visualization tools:

Training Progress: Score and epsilon decay over episodes
Performance Comparison: Trained agents vs. random actions
Sensitivity Analysis: Hyperparameter impact studies
TensorBoard Integration: Real-time training metrics

🤝 Contributing

This is an academic project for coursework. The implementation follows best practices for:

Code Organization: Modular, well-documented structure
Reproducibility: Seed setting for consistent results
Experimentation: Comprehensive sensitivity studies
Visualization: Clear, informative plots and metrics

This project is part of the coursework for Reinforcement Learning & Stochastic Games

National and Kapodistrian University of Athens

Department of Informatics and Telecommunications

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
assets/imgs		assets/imgs
docs		docs
logs/tensorboard		logs/tensorboard
models		models
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CartPole Environment with Reinforcement Learning

📋 Table of Contents

👥 Team Members

🎯 Project Overview

Key Features

🎮 Environment Analysis

CartPole-v1 Environment

Action Space

Observation Space

Reward System

🚀 Implementation

Core Algorithms Implemented

Key Components

📁 Project Structure

🔧 Installation

Prerequisites

Setup Instructions

Key Dependencies

🎮 Usage

Quick Start

Hyperparameter Configuration

📊 Results

Performance Comparison

Key Findings

🧠 Models and Architectures

Standard DQN Architecture

Dueling Architecture

Transformer Architecture

📈 Monitoring and Visualization

🤝 Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages