Skip to content

jiegzhan/multi-class-text-classification-cnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project: Classify Kaggle Consumer Finance Complaints

Highlights:

  • This is a multi-class text classification (sentence classification) problem.
  • The purpose of this project is to classify Kaggle Consumer Finance Complaints into 11 classes.
  • The model was built with Convolutional Neural Network (CNN) and Word Embeddings on TensorFlow 2 / Keras.
  • Input: consumer_complaint_narrative

    • Example: "someone in north Carolina has stolen my identity information and has purchased items including XXXX cell phones thru XXXX on XXXX/XXXX/2015. A police report was filed as soon as I found out about it on XXXX/XXXX/2015. A investigation from XXXX is under way thru there fraud department and our local police department.\n"
  • Output: product

    • Example: Credit reporting

Setup:

python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt

Train:

  • Command: python3 train.py <data_file> <params_file>
  • Example: python3 train.py ./data/consumer_complaints.csv.zip ./parameters.json

A directory (trained_model_<timestamp>/) will be created during training:

  • best_model.keras — model with best validation accuracy
  • train_config.json — training metadata, label mapping, and vocabulary

Predict:

Provide the model directory (created when running train.py) and new data to predict.py.

  • Command: python3 predict.py <model_directory> <test_data.json>
  • Example: python3 predict.py ./trained_model_1780290823/ ./data/small_samples.json

Predictions are saved to ./data/predictions_output.json.

Reference:

About

Classify Kaggle Consumer Finance Complaints into 11 classes. Build the model with CNN (Convolutional Neural Network) and Word Embeddings on Tensorflow.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages