An end-to-end, free AI data analyst system that:
- cleans uploaded datasets automatically
- generates insights and trend analysis
- builds visualizations
- writes downloadable reports (HTML, Markdown, JSON, cleaned CSV)
It includes:
- a Streamlit web app for no-code usage
- a Python CLI for scriptable usage
- modular pipeline components for extension
- automated tests
- Define architecture and data flow.
- Scaffold package/module layout.
- Add dependencies, sample data, and output directories.
- Implement auto-cleaning (column normalization, typing, missing values, duplicates, outliers).
- Implement insights and trend detection.
- Implement recommendation generation logic.
- Create Plotly chart generation layer.
- Generate HTML and Markdown reports.
- Persist artifacts (cleaned CSV, JSON insights, reports).
- Build Streamlit app for upload -> one-click analysis -> downloads.
- Add CLI for batch usage.
- Add unit/integration tests for core behavior.
- Run test suite and verify outputs.
Upload CSV/XLSX
|
v
DataAnalystAgent (pipeline.py)
|- cleaning.py -> cleaned dataframe + quality metrics
|- insights.py -> insights + trends + recommendations + metrics
|- visualization.py -> Plotly figures
|- reporting.py -> CSV/JSON/MD/HTML artifacts
|
v
UI (app.py) and CLI (cli.py)
Project 4 - AI Data Analyst Agent/
app.py
requirements.txt
pyproject.toml
README.md
data/
sample_sales.csv
outputs/
src/
ai_data_analyst/
__init__.py
cleaning.py
insights.py
io_utils.py
pipeline.py
reporting.py
types.py
visualization.py
cli.py
tests/
test_cleaning.py
test_insights.py
test_pipeline.py
pip install -r requirements.txtpy -3.12 -m streamlit run app.pyPowerShell:
$env:PYTHONPATH="src"; py -3.12 -m ai_data_analyst.cli --input data/sample_sales.csv --output outputsLinux/macOS (bash):
export PYTHONPATH=src
python3 -m ai_data_analyst.cli --input data/sample_sales.csv --output outputsEach run generates timestamped files in outputs/:
*_cleaned.csv*_insights.json*_report.md*_report.html
- Supported input formats:
.csv,.xlsx,.xls - The system is fully local and free to run.
- Recommendation text is rule-based and deterministic (no paid LLM required).
This project is released under the MIT License. See LICENSE.