This repository contains my solutions and implementations for a university-level Natural Language Processing (NLP) course.
The project consists of 10 exercise sheets, each covering fundamental and advanced NLP concepts using Python, NLTK, and related libraries.
Throughout the exercises, the following NLP concepts were explored:
- Tokenization
- Normalization
- Stopword removal
- Stemming & Lemmatization
- Regular expressions
- N-gram models
- Probability estimation
- Perplexity evaluation
- POS tagging with NLTK
- Tagging accuracy evaluation
- Context-Free Grammars (CFG)
- Constituency parsing
- Tree representations
- Word frequency analysis
- Distributional semantics
- Vector representations
- Text classification
- Feature extraction
- Evaluation metrics (accuracy, precision, recall, F1)
- Named Entity Recognition (NER)
- Sequence labeling
- Corpus processing
Each exercise sheet is provided in:
- 📓 Notebook version (.ipynb) – interactive exploration
- 🐍 Python script version (.py) – standalone implementation
- Python 3.9+
- Jupyter Notebook
- Required libraries:
pip install nltk numpy pandas scikit-learn matplotlibIf needed, download NLTK resources:
import nltk
nltk.download('all')jupyter notebook
Open any Exercise_Sheet_X.ipynb.
- Practical experience with core NLP pipelines
- Understanding of probabilistic language models
- Working with real corpora
- Implementing ML models for text classification
- Evaluating NLP systems properly
This repository contains coursework implementations and is shared for educational purposes only.