A production-ready Next.js application for experimenting with LLM parameters (temperature, top_p) and comparing response quality using custom metrics algorithms.
- Multi-Parameter Generation: Generate multiple LLM responses with different parameter combinations
- Quality Metrics: Three custom algorithms measuring:
- Coherence: Sentence flow and logical connectivity (Jaccard similarity + transition words)
- Completeness: Content coverage and depth (keyword matching + length appropriateness)
- Structural: Formatting and organization quality (paragraphs, sentence variety, punctuation)
- Comparison Dashboard: Side-by-side comparison with color-coded scores
- Data Visualization: Interactive charts using Recharts (bar charts, radar charts)
- Data Persistence: Vercel Postgres database for storing experiments and responses
- Export Functionality: Download experiments as JSON or CSV
- Professional UI: Built with shadcn/ui components, Tailwind CSS and responsive design
- Framework: Next.js 16 (App Router)
- Language: TypeScript
- Styling: Tailwind CSS v4
- UI Components: shadcn/ui
- State Management: TanStack Query (React Query)
- Database: Vercel Postgres (Neon)
- LLM Integration: OpenAI API (gpt-4o-mini)
- Charts: Recharts
- Tables: TanStack Table
- Node.js 18+
- npm or yarn
- OpenAI API key
- Vercel account (for Postgres)
- Clone the repository:
git clone https://github.com/anjola-adeuyi/llm-lab.git
cd llm-lab- Install dependencies:
npm install- Set up environment variables:
Create a
.env.localfile in the root directory:
OPENAI_API_KEY=your_openai_api_key_here
POSTGRES_URL=your_postgres_url_here
POSTGRES_PRISMA_URL=your_postgres_prisma_url_here
POSTGRES_URL_NON_POOLING=your_postgres_url_non_pooling_here- Set up the database: Run the migration SQL script in your Vercel Postgres database:
# Connect to your Vercel Postgres database and run:
psql < scripts/migrate.sqlOr manually execute the SQL from scripts/migrate.sql in your database console.
- Run the development server:
npm run dev- Open http://localhost:3000 in your browser.
llm-lab/
├── app/
│ ├── api/ # API routes
│ │ ├── generate/ # POST: Generate responses
│ │ ├── experiments/ # GET/POST: CRUD operations
│ │ └── export/ # GET: Export as JSON/CSV
│ ├── experiments/ # Experiment pages
│ │ ├── page.tsx # List all experiments
│ │ └── [id]/page.tsx # Single experiment view
│ ├── layout.tsx # Root layout with QueryClientProvider
│ ├── page.tsx # Home: Experiment creator
│ └── globals.css # Global styles
├── components/
│ ├── ui/ # shadcn/ui base components
│ ├── experiment-form.tsx # Parameter input form
│ ├── response-card.tsx # Individual response display
│ ├── comparison-table.tsx # TanStack Table comparison
│ ├── metrics-chart.tsx # Recharts visualization
│ └── export-button.tsx # Download trigger
├── lib/
│ ├── types.ts # TypeScript interfaces
│ ├── llm-service.ts # OpenAI integration
│ ├── metrics-calculator.ts # Quality algorithms
│ ├── storage-service.ts # Database operations
│ └── utils.ts # Utility functions
└── scripts/
└── migrate.sql # Database schema
- Navigate to the home page
- Enter your prompt in the text area
- Specify temperature values (comma-separated, e.g., "0.1, 0.5, 0.9")
- Specify top_p values (comma-separated, e.g., "0.5, 0.9, 1.0")
- Click "Generate Responses"
The system will:
- Generate all parameter combinations
- Call OpenAI API in parallel for each combination
- Calculate quality metrics for each response
- Store everything in the database
- Redirect you to the comparison dashboard
- List View: Navigate to
/experimentsto see all experiments - Detail View: Click on any experiment to see:
- Response cards with color-coded scores
- Interactive comparison table (sortable)
- Metrics visualization charts
- Export options (JSON/CSV)
Measures logical flow and sentence connectivity:
- Calculates Jaccard similarity between consecutive sentences
- Rewards use of transition words
- Higher scores indicate better logical flow
Measures content depth and coverage:
- Extracts keywords from prompt
- Checks coverage in response
- Evaluates response length appropriateness
- Higher scores indicate better prompt coverage
Measures formatting and organization:
- Paragraph structure (line breaks)
- Sentence variety (length distribution)
- Punctuation quality
- Markdown formatting usage
- Higher scores indicate better organization
Weighted average: Coherence (40%) + Completeness (35%) + Structural (25%)
POST /api/generate- Generate responses with parameter combinationsGET /api/experiments- List all experimentsGET /api/experiments/[id]- Get single experiment with responsesGET /api/export/[id]?format=json|csv- Export experiment data
- Push your code to GitHub
- Import the repository in Vercel
- Configure environment variables in Vercel dashboard
- Deploy
The application will automatically:
- Build using Next.js
- Connect to Vercel Postgres
- Use environment variables from Vercel
npm run lintnpm run build
npm start- Monorepo Next.js: Single deployment unit, SSR benefits, shared types
- Vercel Postgres: Free tier, zero config, SQL familiarity
- Server Components: Leverage Next.js 16 performance
- Parallel API Calls: Use Promise.allSettled for concurrent LLM requests
- Custom Metrics: Production-grade algorithms, not just word counts
MIT
LLM Labs built by Anjola Adeuyi demonstrating production-ready AI application development.