Skip to content

julidelgado/ai-data-analyst-agent

Repository files navigation

AI Data Analyst Agent

An end-to-end, free AI data analyst system that:

  • cleans uploaded datasets automatically
  • generates insights and trend analysis
  • builds visualizations
  • writes downloadable reports (HTML, Markdown, JSON, cleaned CSV)

It includes:

  • a Streamlit web app for no-code usage
  • a Python CLI for scriptable usage
  • modular pipeline components for extension
  • automated tests

Project Phases (Planning + Build)

Phase 1 - Foundation and Structure

  • Define architecture and data flow.
  • Scaffold package/module layout.
  • Add dependencies, sample data, and output directories.

Phase 2 - Core Agent Engine

  • Implement auto-cleaning (column normalization, typing, missing values, duplicates, outliers).
  • Implement insights and trend detection.
  • Implement recommendation generation logic.

Phase 3 - Visualization and Reporting

  • Create Plotly chart generation layer.
  • Generate HTML and Markdown reports.
  • Persist artifacts (cleaned CSV, JSON insights, reports).

Phase 4 - Product Interfaces

  • Build Streamlit app for upload -> one-click analysis -> downloads.
  • Add CLI for batch usage.

Phase 5 - Validation and Quality

  • Add unit/integration tests for core behavior.
  • Run test suite and verify outputs.

Architecture

Upload CSV/XLSX
      |
      v
DataAnalystAgent (pipeline.py)
  |- cleaning.py       -> cleaned dataframe + quality metrics
  |- insights.py       -> insights + trends + recommendations + metrics
  |- visualization.py  -> Plotly figures
  |- reporting.py      -> CSV/JSON/MD/HTML artifacts
      |
      v
UI (app.py) and CLI (cli.py)

Folder Structure

Project 4 - AI Data Analyst Agent/
  app.py
  requirements.txt
  pyproject.toml
  README.md
  data/
    sample_sales.csv
  outputs/
  src/
    ai_data_analyst/
      __init__.py
      cleaning.py
      insights.py
      io_utils.py
      pipeline.py
      reporting.py
      types.py
      visualization.py
      cli.py
  tests/
    test_cleaning.py
    test_insights.py
    test_pipeline.py

Quick Start

1. Install dependencies

pip install -r requirements.txt

2. Run the web app

py -3.12 -m streamlit run app.py

3. Run the CLI

PowerShell:

$env:PYTHONPATH="src"; py -3.12 -m ai_data_analyst.cli --input data/sample_sales.csv --output outputs

Linux/macOS (bash):

export PYTHONPATH=src
python3 -m ai_data_analyst.cli --input data/sample_sales.csv --output outputs

Example Output Artifacts

Each run generates timestamped files in outputs/:

  • *_cleaned.csv
  • *_insights.json
  • *_report.md
  • *_report.html

Notes

  • Supported input formats: .csv, .xlsx, .xls
  • The system is fully local and free to run.
  • Recommendation text is rule-based and deterministic (no paid LLM required).

License

This project is released under the MIT License. See LICENSE.

About

AI Data Analyst Agent that automatically cleans datasets, generates insights, builds interactive visualizations, and exports reports (CSV, JSON, Markdown, HTML).

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages