Titanic Survival Prediction (AI Capstone Project)

Overview

This project presents an end-to-end machine learning pipeline to predict passenger survival on the Titanic dataset. It demonstrates the complete workflow from data preprocessing and exploratory analysis to model building, evaluation, and prediction.

Objective

To build a robust classification model that predicts whether a passenger survived the Titanic disaster based on demographic and travel-related features.

Dataset

The dataset includes passenger information such as:

Age
Gender
Passenger Class (Pclass)
Fare
Embarked Location

Note: Due to file size and repository best practices, datasets are not included. They can be downloaded from the Kaggle Titanic dataset.

Workflow

1. Data Preprocessing

Handled missing values using median and mode imputation
Removed irrelevant features (Cabin, Name, Ticket)
Encoded categorical variables (Sex, Embarked)

2. Feature Engineering

Created new features such as:
- FamilySize
- IsAlone

3. Exploratory Data Analysis (EDA)

Analyzed survival distribution
Examined relationships between survival and key features such as gender and passenger class

4. Model Development

Trained and compared multiple machine learning models:

Logistic Regression
Decision Tree
Random Forest
Gradient Boosting

5. Model Evaluation

Performed train-test split for validation
Evaluated models using:
- Accuracy
- Confusion Matrix
- Classification Report

6. Hyperparameter Tuning

Used GridSearchCV to optimize Random Forest parameters
Improved model performance through cross-validation

Results

Best Model: Random Forest
Validation Accuracy: ~82%
Key Predictors: Gender, Passenger Class, Fare

Tech Stack

Python
Pandas
NumPy
Scikit-learn
Matplotlib
Seaborn

How to Run

Install dependencies:

pip install pandas numpy scikit-learn matplotlib seaborn

Run the notebook:

Titanic_Survival_Prediction.ipynb

Project Structure

Titanic_Survival_Prediction.ipynb → Main notebook
submission.csv → Model predictions
README.md → Project documentation

Key Insights

Survival probability is strongly influenced by gender and passenger class
Feature engineering improves model performance
Ensemble models outperform simpler models

Conclusion

This project demonstrates a complete machine learning workflow, highlighting the importance of preprocessing, feature engineering, model comparison, and evaluation in building effective predictive models.

Future Improvements

Incorporate advanced feature engineering (Title extraction, family grouping)
Experiment with boosting algorithms such as XGBoost or LightGBM
Improve model interpretability

Author

Lavanya Dive BSc Data Science Student

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Titanic_Survival_Prediction.ipynb		Titanic_Survival_Prediction.ipynb
submission.csv		submission.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic Survival Prediction (AI Capstone Project)

Overview

Objective

Dataset

Workflow

1. Data Preprocessing

2. Feature Engineering

3. Exploratory Data Analysis (EDA)

4. Model Development

5. Model Evaluation

6. Hyperparameter Tuning

Results

Tech Stack

How to Run

Project Structure

Key Insights

Conclusion

Future Improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Titanic Survival Prediction (AI Capstone Project)

Overview

Objective

Dataset

Workflow

1. Data Preprocessing

2. Feature Engineering

3. Exploratory Data Analysis (EDA)

4. Model Development

5. Model Evaluation

6. Hyperparameter Tuning

Results

Tech Stack

How to Run

Project Structure

Key Insights

Conclusion

Future Improvements

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages