21 Commits

Author SHA1 Message Date
Aaron Po
60ee2ecf74 add prompts 2026-04-03 15:53:04 -04:00
Aaron Po
e4e16a5084 fix: address critical correctness, reliability, and design issues in pipeline
CORRECTNESS FIXES:
- json_loader: Add RollbackTransaction() and call it on exception instead of
  CommitTransaction(). Prevents partial data corruption on parse/disk errors.
- wikipedia_service: Fix invalid MediaWiki API parameter explaintext=true ->
  explaintext=1. Now returns plain text instead of HTML markup in contexts.
- helpers: Fix ParseTwoLineResponse filter to only remove known thinking tags
  (<think>, <reasoning>, <reflect>) instead of any <...> pattern. Prevents
  silently removing legitimate output like <username>content</username>.

RELIABILITY & DESIGN IMPROVEMENTS:
- load/main: Make n_ctx (context window size) configurable via --n-ctx flag
  (default 2048, range 1-32768) to support larger models like Qwen3-14B.
- generate_brewery: Prevent retry prompt growth by extracting location context
  into constant and using compact retry format (error + schema + location only).
  Avoids token truncation on final retry attempts.
- database: Fix data representativeness by changing QueryCities from
  ORDER BY name (alphabetic bias) to ORDER BY RANDOM() for unbiased sampling.
  Convert all SQLITE_STATIC to SQLITE_TRANSIENT to prevent use-after-free risks.

POLISH:
- infer: Advance sampling seed between generation calls to improve diversity
  across brewery and user generation.
- data_downloader: Remove unnecessary commit hash truncation; use full hash.
- json_loader: Fix misleading log message from "RapidJSON" to "Boost.JSON".
2026-04-03 11:58:00 -04:00
Aaron Po
8d306bf691 Update documentation for llama 2026-04-02 23:24:06 -04:00
Aaron Po
077f6ab4ae edit prompt 2026-04-02 22:56:18 -04:00
Aaron Po
534403734a Refactor BiergartenDataGenerator and LlamaGenerator 2026-04-02 22:46:00 -04:00
Aaron Po
3af053f0eb format codebase 2026-04-02 21:46:46 -04:00
Aaron Po
ba165d8aa7 Separate llama generator class src file into method files 2026-04-02 21:37:46 -04:00
Aaron Po
eb9a2767b4 Refactor web client interface and related components 2026-04-02 18:55:58 -04:00
Aaron Po
29ea47fdb6 update cli arg handling 2026-04-02 18:41:25 -04:00
Aaron Po
52e2333304 Reorganize directory structure 2026-04-02 18:27:01 -04:00
Aaron Po
a1f0ca5b20 Refactor DataDownloader and CURLWebClient: update constructor and modify FileExists method signature 2026-04-02 18:06:40 -04:00
Aaron Po
2ea8aa52b4 update readme and add clangformat and clang tidy 2026-04-02 17:12:22 -04:00
Aaron Po
98083ab40c Pipeline: add CURL/WebClient & Wikipedia service
Introduce a pluggable web client interface and concrete CURL implementation: adds IWebClient, CURLWebClient, and CurlGlobalState (headers + curl_web_client.cpp). DataDownloader now accepts an IWebClient and delegates downloads. Add WikipediaService for cached Wikipedia summary lookups. Refactor SqliteDatabase to return full City records and update consumers accordingly. Improve JsonLoader to use batched transactions during streaming parses. Enhance LlamaGenerator with sampling options, increased token limits, JSON extraction/validation, and other parsing helpers. Modernize CMake: set policy/version, add project_options, simplify FetchContent usage (spdlog), require Boost components (program_options/json), list pipeline sources explicitly, and tweak post-build/memcheck targets. Update README to match implementation changes and new CLI/config conventions.
2026-04-02 16:29:16 -04:00
Aaron Po
ac136f7179 Enhance brewery generation: add country name parameter and improve prompt handling 2026-04-02 01:04:41 -04:00
Aaron Po
280c9c61bd Implement Llama-based brewery and user data generation; remove mock generator and related files 2026-04-01 23:29:16 -04:00
Aaron Po
248a51b35f cleanup 2026-04-01 21:35:02 -04:00
Aaron Po
35aa7bc0df Begin work on biergarten data generator pipeline 2026-04-01 21:18:45 -04:00
Aaron Po
581863d69b Website updates: add new app scaffold, archive legacy site, and refresh docs/tooling (#173) 2026-03-15 22:56:14 -04:00
Aaron Po
9238036042 Add resend confirmation email feature (#166) 2026-03-07 23:03:31 -05:00
Aaron Po
431e11e052 Add WEBSITE_BASE_URL environment variable and update email confirmation link (#165) 2026-03-07 20:11:50 -05:00
Aaron Po
f1194d3da8 Feature: Add token validation, basic confirmation workflow (#164) 2026-03-06 23:23:43 -05:00
473 changed files with 26100 additions and 10003 deletions

14
.gitignore vendored
View File

@@ -15,6 +15,14 @@
# production
/build
# project-specific build artifacts
/src/Website/build/
/src/Website/storybook-static/
/src/Website/.react-router/
/src/Website/playwright-report/
/src/Website/test-results/
/test-results/
# misc
.DS_Store
*.pem
@@ -42,6 +50,9 @@ next-env.d.ts
# vscode
.vscode
.idea/
*.swp
*.swo
/cloudinary-images
@@ -487,3 +498,6 @@ FodyWeavers.xsd
.env.dev
.env.test
.env.prod
*storybook.log
storybook-static

File diff suppressed because it is too large Load Diff

285
README.md
View File

@@ -1,261 +1,142 @@
# The Biergarten App
A social platform for craft beer enthusiasts to discover breweries, share reviews, and
connect with fellow beer lovers.
The Biergarten App is a multi-project monorepo with a .NET backend and an active React
Router frontend in `src/Website`. The current website focuses on account flows, theme
switching, shared UI components, Storybook coverage, and integration with the API.
**Documentation**
## Documentation
- [Getting Started](docs/getting-started.md) - Setup and installation
- [Architecture](docs/architecture.md) - System design and patterns
- [Database](docs/database.md) - Schema and stored procedures
- [Docker Guide](docs/docker.md) - Container deployment
- [Testing](docs/testing.md) - Test strategy and commands
- [Environment Variables](docs/environment-variables.md) - Configuration reference
- [Getting Started](docs/getting-started.md) - Local setup for backend and active website
- [Architecture](docs/architecture.md) - Current backend and frontend architecture
- [Docker Guide](docs/docker.md) - Container-based backend development and testing
- [Testing](docs/testing.md) - Backend and frontend test commands
- [Environment Variables](docs/environment-variables.md) - Active configuration reference
- [Token Validation](docs/token-validation.md) - JWT validation architecture
- [Legacy Website Archive](docs/archive/legacy-website-v1.md) - Archived notes for the old Next.js frontend
**Diagrams**
## Diagrams
- [Architecture](docs/diagrams/pdf/architecture.pdf) - Layered architecture
- [Deployment](docs/diagrams/pdf/deployment.pdf) - Docker topology
- [Authentication Flow](docs/diagrams/pdf/authentication-flow.pdf) - Auth sequence
- [Database Schema](docs/diagrams/pdf/database-schema.pdf) - Entity relationships
- [Architecture](docs/diagrams-out/architecture.svg) - Layered architecture
- [Deployment](docs/diagrams-out/deployment.svg) - Docker topology
- [Authentication Flow](docs/diagrams-out/authentication-flow.svg) - Auth sequence
- [Database Schema](docs/diagrams-out/database-schema.svg) - Entity relationships
## Project Status
## Current Status
**Active Development** - Transitioning from full-stack Next.js to multi-project monorepo
Active areas in the repository:
- Core authentication and user management APIs
- Database schema with migrations and seeding
- Layered architecture (Domain, Service, Infrastructure, Repository, API)
- Comprehensive test suite (unit + integration)
- Frontend integration with .NET API (in progress)
- Migration from Next.js serverless functions
- .NET 10 backend with layered architecture and SQL Server
- React Router 7 website in `src/Website`
- Shared Biergarten theme system with a theme guide route
- Storybook stories and browser-based checks for shared UI
- Auth demo flows for home, login, register, dashboard, logout, and confirmation
- Toast-based feedback for auth outcomes
---
Legacy area retained for reference:
- `src/Website-v1` contains the archived Next.js frontend and is no longer the active website
## Tech Stack
**Backend**: .NET 10, ASP.NET Core, SQL Server 2022, DbUp **Frontend**: Next.js 14+,
TypeScript, TailwindCSS **Testing**: xUnit, Reqnroll (BDD), FluentAssertions, Moq
**Infrastructure**: Docker, Docker Compose **Security**: Argon2id password hashing, JWT
(HS256)
---
- **Backend**: .NET 10, ASP.NET Core, SQL Server 2022, DbUp
- **Frontend**: React 19, React Router 7, Vite 7, Tailwind CSS 4, DaisyUI 5
- **UI Documentation**: Storybook 10, Vitest browser mode, Playwright
- **Testing**: xUnit, Reqnroll (BDD), FluentAssertions, Moq
- **Infrastructure**: Docker, Docker Compose
- **Security**: Argon2id password hashing, JWT access/refresh/confirmation tokens
## Quick Start
### Prerequisites
- [.NET SDK 10+](https://dotnet.microsoft.com/download)
- [Docker Desktop](https://www.docker.com/products/docker-desktop)
- [Node.js 18+](https://nodejs.org/) (for frontend)
### Start Development Environment
### Backend
```bash
# Clone repository
git clone https://github.com/aaronpo97/the-biergarten-app
cd the-biergarten-app
# Configure environment
cp .env.example .env.dev
# Start all services
docker compose -f docker-compose.dev.yaml up -d
# View logs
docker compose -f docker-compose.dev.yaml logs -f
```
**Access**:
Backend access:
- API: http://localhost:8080/swagger
- Health: http://localhost:8080/health
- API Swagger: http://localhost:8080/swagger
- Health Check: http://localhost:8080/health
### Run Tests
### Frontend
```bash
docker compose -f docker-compose.test.yaml up --abort-on-container-exit
cd src/Website
npm install
API_BASE_URL=http://localhost:8080 SESSION_SECRET=dev-secret npm run dev
```
Results are in `./test-results/`
Optional frontend tools:
---
```bash
cd src/Website
npm run storybook
npm run test:storybook
npm run test:storybook:playwright
```
## Repository Structure
```text
src/Core/ Backend projects (.NET)
src/Website/ Active React Router frontend
src/Website-v1/ Archived legacy Next.js frontend
docs/ Active project documentation
docs/archive/ Archived legacy documentation
```
src/Core/ # Backend (.NET)
├── API/
│ ├── API.Core/ # ASP.NET Core Web API
│ └── API.Specs/ # Integration tests (Reqnroll)
├── Database/
│ ├── Database.Migrations/ # DbUp migrations
│ └── Database.Seed/ # Data seeding
├── Domain.Entities/ # Domain models
├── Infrastructure/ # Cross-cutting concerns
│ ├── Infrastructure.Jwt/
│ ├── Infrastructure.PasswordHashing/
│ ├── Infrastructure.Email/
│ ├── Infrastructure.Repository/
│ └── Infrastructure.Repository.Tests/
└── Service/ # Business logic
├── Service.Auth/
├── Service.Auth.Tests/
└── Service.UserManagement/
Website/ # Frontend (Next.js)
docs/ # Documentation
docs/diagrams/ # PlantUML diagrams
```
---
## Key Features
### Implemented
Implemented today:
- User registration and authentication
- JWT token-based auth
- Argon2id password hashing
- SQL Server with stored procedures
- Database migrations (DbUp)
- Docker containerization
- Comprehensive test suite
- Swagger/OpenAPI documentation
- Health checks
- User registration and login against the API
- JWT-based auth with access, refresh, and confirmation flows
- SQL Server migrations and seed projects
- Shared form components and auth screens
- Theme switching with Lager, Stout, Cassis, and Weizen variants
- Storybook documentation and automated story interaction tests
- Toast feedback for auth-related outcomes
### Planned
Planned next:
- [ ] Brewery discovery and management
- [ ] Beer reviews and ratings
- [ ] Social following/followers
- [ ] Geospatial brewery search
- [ ] Image upload (Cloudinary)
- [ ] Email notifications
- [ ] OAuth integration
---
## Architecture Highlights
### Layered Architecture
```
API Layer (Controllers)
Service Layer (Business Logic)
Infrastructure Layer (Repositories, JWT, Email)
Domain Layer (Entities)
Database (SQL Server + Stored Procedures)
```
### SQL-First Approach
- All queries via stored procedures
- No ORM (no Entity Framework)
- Version-controlled schema
### Security
- **Password Hashing**: Argon2id (64MB memory, 4 iterations)
- **JWT Tokens**: HS256 with configurable expiration
- **Credential Rotation**: Built-in password change support
See [Architecture Guide](docs/architecture.md) for details.
---
- Brewery discovery and management
- Beer reviews and ratings
- Social follow relationships
- Geospatial brewery experiences
- Additional frontend routes beyond the auth demo
## Testing
The project includes three test suites:
Backend suites:
| Suite | Type | Framework | Purpose |
| ---------------------- | ----------- | -------------- | ---------------------- |
| **API.Specs** | Integration | Reqnroll (BDD) | End-to-end API testing |
| **Repository.Tests** | Unit | xUnit | Data access layer |
| **Service.Auth.Tests** | Unit | xUnit + Moq | Business logic |
- `API.Specs` - integration tests
- `Infrastructure.Repository.Tests` - repository unit tests
- `Service.Auth.Tests` - service unit tests
**Run All Tests**:
Frontend suites:
- Storybook interaction tests via Vitest
- Storybook browser regression checks via Playwright
Run all backend tests with Docker:
```bash
docker compose -f docker-compose.test.yaml up --abort-on-container-exit
```
**Run Individual Test Suite**:
```bash
cd src/Core
dotnet test API/API.Specs/API.Specs.csproj
dotnet test Infrastructure/Infrastructure.Repository.Tests/Infrastructure.Repository.Tests.csproj
dotnet test Service/Service.Auth.Tests/Service.Auth.Tests.csproj
```
See [Testing Guide](docs/testing.md) for more information.
---
## Docker Environments
The project uses three Docker Compose configurations:
| File | Purpose | Features |
| ---------------------------- | ------------- | ------------------------------------------------- |
| **docker-compose.dev.yaml** | Development | Persistent data, hot reload, Swagger UI |
| **docker-compose.test.yaml** | CI/CD Testing | Isolated DB, auto-exit, test results export |
| **docker-compose.prod.yaml** | Production | Optimized builds, health checks, restart policies |
**Common Commands**:
```bash
# Development
docker compose -f docker-compose.dev.yaml up -d
docker compose -f docker-compose.dev.yaml logs -f api.core
docker compose -f docker-compose.dev.yaml down -v
# Testing
docker compose -f docker-compose.test.yaml up --abort-on-container-exit
docker compose -f docker-compose.test.yaml down -v
# Build
docker compose -f docker-compose.dev.yaml build
docker compose -f docker-compose.dev.yaml build --no-cache
```
See [Docker Guide](docs/docker.md) for troubleshooting and advanced usage.
---
See [Testing](docs/testing.md) for the full command list.
## Configuration
### Required Environment Variables
Common active variables:
**Backend** (`.env.dev`):
- Backend: `DB_SERVER`, `DB_NAME`, `DB_USER`, `DB_PASSWORD`, `ACCESS_TOKEN_SECRET`, `REFRESH_TOKEN_SECRET`, `CONFIRMATION_TOKEN_SECRET`
- Frontend: `API_BASE_URL`, `SESSION_SECRET`, `NODE_ENV`
```bash
DB_SERVER=sqlserver,1433
DB_NAME=Biergarten
DB_USER=sa
DB_PASSWORD=YourStrong!Passw0rd
JWT_SECRET=<min-32-chars>
```
**Frontend** (`.env.local`):
```bash
BASE_URL=http://localhost:3000
NODE_ENV=development
CONFIRMATION_TOKEN_SECRET=<generated>
RESET_PASSWORD_TOKEN_SECRET=<generated>
SESSION_SECRET=<generated>
# + External services (Cloudinary, Mapbox, SparkPost)
```
See [Environment Variables Guide](docs/environment-variables.md) for complete reference.
---
See [Environment Variables](docs/environment-variables.md) for details.
## Contributing

View File

@@ -94,6 +94,7 @@ services:
ACCESS_TOKEN_SECRET: "${ACCESS_TOKEN_SECRET}"
REFRESH_TOKEN_SECRET: "${REFRESH_TOKEN_SECRET}"
CONFIRMATION_TOKEN_SECRET: "${CONFIRMATION_TOKEN_SECRET}"
WEBSITE_BASE_URL: "${WEBSITE_BASE_URL}"
restart: unless-stopped
networks:
- devnet

View File

@@ -69,6 +69,7 @@ services:
ACCESS_TOKEN_SECRET: "${ACCESS_TOKEN_SECRET}"
REFRESH_TOKEN_SECRET: "${REFRESH_TOKEN_SECRET}"
CONFIRMATION_TOKEN_SECRET: "${CONFIRMATION_TOKEN_SECRET}"
WEBSITE_BASE_URL: "${WEBSITE_BASE_URL}"
restart: unless-stopped
networks:
- prodnet

View File

@@ -88,6 +88,7 @@ services:
ACCESS_TOKEN_SECRET: "${ACCESS_TOKEN_SECRET}"
REFRESH_TOKEN_SECRET: "${REFRESH_TOKEN_SECRET}"
CONFIRMATION_TOKEN_SECRET: "${CONFIRMATION_TOKEN_SECRET}"
WEBSITE_BASE_URL: "${WEBSITE_BASE_URL}"
volumes:
- ./test-results:/app/test-results
restart: "no"

View File

@@ -1,28 +1,27 @@
# Architecture
This document describes the architecture patterns and design decisions for The Biergarten
App.
This document describes the active architecture of The Biergarten App.
## High-Level Overview
The Biergarten App follows a **multi-project monorepo** architecture with clear separation
between backend and frontend:
The Biergarten App is a monorepo with a clear split between the backend and the active
website:
- **Backend**: .NET 10 Web API with SQL Server
- **Frontend**: Next.js with TypeScript
- **Architecture Style**: Layered architecture with SQL-first approach
- **Backend**: .NET 10 Web API with SQL Server and a layered architecture
- **Frontend**: React 19 + React Router 7 website in `src/Website`
- **Architecture Style**: Layered backend plus server-rendered React frontend
The legacy Next.js frontend has been retained in `src/Website-v1` for reference only and is
documented in [archive/legacy-website-v1.md](archive/legacy-website-v1.md).
## Diagrams
For visual representations, see:
- [architecture.pdf](diagrams/pdf/architecture.pdf) - Layered architecture diagram
- [deployment.pdf](diagrams/pdf/deployment.pdf) - Docker deployment diagram
- [authentication-flow.pdf](diagrams/pdf/authentication-flow.pdf) - Authentication
workflow
- [database-schema.pdf](diagrams/pdf/database-schema.pdf) - Database relationships
Generate diagrams with: `make diagrams`
- [architecture.svg](diagrams-out/architecture.svg) - Layered architecture diagram
- [deployment.svg](diagrams-out/deployment.svg) - Docker deployment diagram
- [authentication-flow.svg](diagrams-out/authentication-flow.svg) - Authentication workflow
- [database-schema.svg](diagrams-out/database-schema.svg) - Database relationships
## Backend Architecture
@@ -217,39 +216,49 @@ public interface IAuthRepository
## Frontend Architecture
### Next.js Application Structure
### Active Website (`src/Website`)
```
Website/src/
├── components/ # React components
├── pages/ # Next.js routes
├── contexts/ # React context providers
├── hooks/ # Custom React hooks
├── controllers/ # Business logic layer
├── services/ # API communication
├── requests/ # API request builders
├── validation/ # Form validation schemas
├── config/ # Configuration & env vars
── prisma/ # Database schema (current)
The current website is a React Router 7 application with server-side rendering enabled.
```text
src/Website/
├── app/
│ ├── components/ Shared UI such as Navbar, FormField, SubmitButton, ToastProvider
│ ├── lib/ Auth helpers, schemas, and theme metadata
│ ├── routes/ Route modules for home, login, register, dashboard, confirm, theme
├── root.tsx App shell and global providers
│ └── app.css Theme tokens and global styling
├── .storybook/ Storybook config and preview setup
── stories/ Storybook stories for shared UI and themes
├── tests/playwright/ Storybook Playwright coverage
└── package.json Frontend scripts and dependencies
```
### Migration Strategy
### Frontend Responsibilities
The frontend is **transitioning** from a standalone architecture to integrate with the
.NET API:
- Render the auth demo and theme guide routes
- Manage cookie-backed website session state
- Call the .NET API for login, registration, token refresh, and confirmation
- Provide shared UI building blocks for forms, navigation, themes, and toasts
- Supply Storybook documentation and browser-based component verification
**Current State**:
### Theme System
- Uses Prisma ORM with Postgres (Neon)
- Has its own server-side API routes
- Direct database access from Next.js
The active website uses semantic DaisyUI theme tokens backed by four Biergarten themes:
**Target State**:
- Biergarten Lager
- Biergarten Stout
- Biergarten Cassis
- Biergarten Weizen
- Pure client-side Next.js app
- All data via .NET API
- No server-side database access
- JWT-based authentication
All component styling should prefer semantic tokens such as `primary`, `success`,
`surface`, and `highlight` instead of hard-coded color values.
### Legacy Frontend
The previous Next.js frontend has been archived at `src/Website-v1`. Active product and
engineering documentation should point to `src/Website`, while legacy notes live in
[archive/legacy-website-v1.md](archive/legacy-website-v1.md).
## Security Architecture
@@ -385,7 +394,7 @@ dependencies
```yaml
healthcheck:
test: ['CMD-SHELL', 'sqlcmd health check']
test: ["CMD-SHELL", "sqlcmd health check"]
interval: 10s
retries: 12
start_period: 30s

View File

@@ -0,0 +1,56 @@
# Legacy Website Archive (`src/Website-v1`)
This archive captures high-level notes about the previous Biergarten frontend so active
project documentation can focus on the current website in `src/Website`.
## Status
- `src/Website-v1` is retained for historical reference only
- It is not the active frontend used by current setup, docs, or testing guidance
- New product and engineering work should target `src/Website`
## Legacy Stack Summary
The archived frontend used a different application model from the current website:
- Next.js 14
- React 18
- Prisma
- Postgres / Neon-hosted database workflows
- Next.js API routes and server-side controllers
- Additional third-party integrations such as Cloudinary, Mapbox, and SparkPost
## Why It Was Archived
The active website moved to a React Router-based frontend that talks directly to the .NET
API. As part of that shift, the main docs were updated to describe:
- `src/Website` as the active frontend
- React Router route modules and server rendering
- Storybook-based component documentation and tests
- Current frontend runtime variables: `API_BASE_URL`, `SESSION_SECRET`, and `NODE_ENV`
## Legacy Documentation Topics Moved Out of Active Docs
The following categories were removed from active documentation and intentionally archived:
- Next.js application structure guidance
- Prisma and Postgres frontend setup
- Legacy frontend environment variables
- External service setup that only applied to `src/Website-v1`
- Old frontend local setup instructions
## When To Use This Archive
Use this file only if you need to:
- inspect the historical frontend implementation
- compare old flows against the current website
- migrate or recover legacy logic from `src/Website-v1`
For all active work, use:
- [Getting Started](../getting-started.md)
- [Architecture](../architecture.md)
- [Environment Variables](../environment-variables.md)
- [Testing](../testing.md)

View File

@@ -1,14 +1,15 @@
# Environment Variables
Complete documentation for all environment variables used in The Biergarten App.
This document covers the active environment variables used by the current Biergarten
stack.
## Overview
The application uses environment variables for configuration across:
The application uses environment variables for:
- **.NET API Backend** - Database connections, JWT secrets
- **Next.js Frontend** - External services, authentication
- **Docker Containers** - Runtime configuration
- **.NET API backend** - database connections, token secrets, runtime settings
- **React Router website** - API base URL and session signing
- **Docker containers** - environment-specific orchestration
## Configuration Patterns
@@ -16,10 +17,10 @@ The application uses environment variables for configuration across:
Direct environment variable access via `Environment.GetEnvironmentVariable()`.
### Frontend (Next.js)
### Frontend (`src/Website`)
Centralized configuration module at `src/Website/src/config/env/index.ts` with Zod
validation.
The active website reads runtime values from the server environment for its auth and API
integration.
### Docker
@@ -71,6 +72,9 @@ REFRESH_TOKEN_SECRET=<generated-secret> # Signs long-lived refresh t
# Confirmation token secret (30-minute tokens)
CONFIRMATION_TOKEN_SECRET=<generated-secret> # Signs email confirmation tokens
# Website base URL (used in confirmation emails)
WEBSITE_BASE_URL=https://thebiergarten.app # Base URL for the website
```
**Security Requirements**:
@@ -125,91 +129,38 @@ ASPNETCORE_URLS=http://0.0.0.0:8080 # Binding address and port
DOTNET_RUNNING_IN_CONTAINER=true # Flag for container execution
```
## Frontend Variables (Next.js)
## Frontend Variables (`src/Website`)
Create `.env.local` in the `Website/` directory.
### Base Configuration
The active website does not use the old Next.js/Prisma environment model. Its core runtime
variables are:
```bash
BASE_URL=http://localhost:3000 # Application base URL
NODE_ENV=development # Environment: development, production, test
API_BASE_URL=http://localhost:8080 # Base URL for the .NET API
SESSION_SECRET=<generated-secret> # Cookie session signing secret
NODE_ENV=development # Standard Node runtime mode
```
### Authentication & Sessions
### Frontend Variable Details
```bash
# Token signing secrets (use openssl rand -base64 127)
CONFIRMATION_TOKEN_SECRET=<generated-secret> # Email confirmation tokens
RESET_PASSWORD_TOKEN_SECRET=<generated-secret> # Password reset tokens
SESSION_SECRET=<generated-secret> # Session cookie signing
#### `API_BASE_URL`
# Session configuration
SESSION_TOKEN_NAME=biergarten # Cookie name (optional)
SESSION_MAX_AGE=604800 # Cookie max age in seconds (optional, default: 1 week)
```
- **Required**: Yes for local development
- **Default in code**: `http://localhost:8080`
- **Used by**: `src/Website/app/lib/auth.server.ts`
- **Purpose**: Routes website auth actions to the .NET API
**Security Requirements**:
#### `SESSION_SECRET`
- All secrets should be 127+ characters
- Generate using cryptographically secure random functions
- Never reuse secrets across environments
- Rotate secrets periodically in production
- **Required**: Strongly recommended in all environments
- **Default in local code path**: `dev-secret-change-me`
- **Used by**: React Router cookie session storage in `auth.server.ts`
- **Purpose**: Signs and validates the website session cookie
### Database (Current - Prisma/Postgres)
#### `NODE_ENV`
**Note**: Frontend currently uses Neon Postgres. Will migrate to .NET API.
```bash
POSTGRES_PRISMA_URL=postgresql://user:pass@host/db?pgbouncer=true # Pooled connection
POSTGRES_URL_NON_POOLING=postgresql://user:pass@host/db # Direct connection (migrations)
SHADOW_DATABASE_URL=postgresql://user:pass@host/shadow_db # Prisma shadow DB (optional)
```
### External Services
#### Cloudinary (Image Hosting)
```bash
NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME=your-cloud-name # Public, client-accessible
CLOUDINARY_KEY=your-api-key # Server-side API key
CLOUDINARY_SECRET=your-api-secret # Server-side secret
```
**Setup Steps**:
1. Sign up at [cloudinary.com](https://cloudinary.com)
2. Navigate to Dashboard
3. Copy Cloud Name, API Key, and API Secret
**Note**: `NEXT_PUBLIC_` prefix makes variable accessible in client-side code.
#### Mapbox (Maps & Geocoding)
```bash
MAPBOX_ACCESS_TOKEN=pk.your-public-token
```
**Setup Steps**:
1. Create account at [mapbox.com](https://mapbox.com)
2. Navigate to Account → Tokens
3. Create new token with public scopes
4. Copy access token
#### SparkPost (Email Service)
```bash
SPARKPOST_API_KEY=your-api-key
SPARKPOST_SENDER_ADDRESS=noreply@yourdomain.com
```
**Setup Steps**:
1. Sign up at [sparkpost.com](https://sparkpost.com)
2. Verify sending domain or use sandbox
3. Create API key with "Send via SMTP" permission
4. Configure sender address (must match verified domain)
- **Required**: No
- **Typical values**: `development`, `production`, `test`
- **Purpose**: Controls secure cookie behavior and runtime mode
### Admin Account (Seeding)
@@ -255,68 +206,39 @@ cp .env.example .env.dev
# Edit .env.dev with your values
```
## Legacy Frontend Variables
Variables for the archived Next.js frontend (`src/Website-v1`) have been removed from this
active reference. See [archive/legacy-website-v1.md](archive/legacy-website-v1.md) if you
need the legacy Prisma, Cloudinary, Mapbox, or SparkPost notes.
**Docker Compose Mapping**:
- `docker-compose.dev.yaml``.env.dev`
- `docker-compose.test.yaml``.env.test`
- `docker-compose.prod.yaml``.env.prod`
### Frontend (Website Directory)
```
.env.local # Local development (gitignored)
.env.production # Production (gitignored)
```
**Setup**:
```bash
cd Website
touch .env.local
# Add frontend variables
```
## Variable Reference Table
| Variable | Backend | Frontend | Docker | Required | Notes |
| ----------------------------------- | :-----: | :------: | :----: | :------: | ------------------------- |
| **Database** |
| ----------------------------- | :-----: | :------: | :----: | :------: | -------------------------- |
| `DB_SERVER` | ✓ | | ✓ | Yes\* | SQL Server address |
| `DB_NAME` | ✓ | | ✓ | Yes\* | Database name |
| `DB_USER` | ✓ | | ✓ | Yes\* | SQL username |
| `DB_PASSWORD` | ✓ | | ✓ | Yes\* | SQL password |
| `DB_CONNECTION_STRING` | ✓ | | | Yes\* | Alternative to components |
| `DB_TRUST_SERVER_CERTIFICATE` | ✓ | | ✓ | No | Defaults to True |
| `SA_PASSWORD` | | | ✓ | Yes | SQL Server container |
| **Authentication (Backend - JWT)** |
| `ACCESS_TOKEN_SECRET` | ✓ | | ✓ | Yes | Access token secret |
| `REFRESH_TOKEN_SECRET` | ✓ | | | Yes | Refresh token secret |
| `CONFIRMATION_TOKEN_SECRET` | | | | Yes | Confirmation token secret |
| **Authentication (Frontend)** |
| `CONFIRMATION_TOKEN_SECRET` | | ✓ | | Yes | Email confirmation |
| `RESET_PASSWORD_TOKEN_SECRET` | | | | Yes | Password reset |
| `SESSION_SECRET` | | ✓ | | Yes | Session signing |
| `SESSION_TOKEN_NAME` | | ✓ | | No | Default: "biergarten" |
| `SESSION_MAX_AGE` | | ✓ | | No | Default: 604800 |
| **Base Configuration** |
| `BASE_URL` | | ✓ | | Yes | App base URL |
| `NODE_ENV` | | ✓ | | Yes | Node environment |
| `DB_TRUST_SERVER_CERTIFICATE` | ✓ | | ✓ | No | Defaults to `True` |
| `ACCESS_TOKEN_SECRET` | | | ✓ | Yes | Access token signing |
| `REFRESH_TOKEN_SECRET` | ✓ | | ✓ | Yes | Refresh token signing |
| `CONFIRMATION_TOKEN_SECRET` | ✓ | | ✓ | Yes | Confirmation token signing |
| `WEBSITE_BASE_URL` | ✓ | | | Yes | Website URL for emails |
| `API_BASE_URL` | | | | Yes | Website-to-API base URL |
| `SESSION_SECRET` | | ✓ | | Yes | Website session signing |
| `NODE_ENV` | | ✓ | | No | Runtime mode |
| `CLEAR_DATABASE` | | | | No | Dev/test reset flag |
| `ASPNETCORE_ENVIRONMENT` | ✓ | | ✓ | Yes | ASP.NET environment |
| `ASPNETCORE_URLS` | ✓ | | ✓ | Yes | API binding address |
| **Database (Frontend - Current)** |
| `POSTGRES_PRISMA_URL` | | ✓ | | Yes | Pooled connection |
| `POSTGRES_URL_NON_POOLING` | | ✓ | | Yes | Direct connection |
| `SHADOW_DATABASE_URL` | | ✓ | | No | Prisma shadow DB |
| **External Services** |
| `NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME` | | ✓ | | Yes | Public, client-side |
| `CLOUDINARY_KEY` | | ✓ | | Yes | Server-side |
| `CLOUDINARY_SECRET` | | ✓ | | Yes | Server-side |
| `MAPBOX_ACCESS_TOKEN` | | ✓ | | Yes | Maps/geocoding |
| `SPARKPOST_API_KEY` | | ✓ | | Yes | Email service |
| `SPARKPOST_SENDER_ADDRESS` | | ✓ | | Yes | From address |
| **Other** |
| `ADMIN_PASSWORD` | | ✓ | | No | Seeding only |
| `CLEAR_DATABASE` | ✓ | | ✓ | No | Dev/test only |
| `SA_PASSWORD` | | | ✓ | Yes | SQL Server container |
| `ACCEPT_EULA` | | | ✓ | Yes | SQL Server EULA |
| `MSSQL_PID` | | | ✓ | No | SQL Server edition |
| `DOTNET_RUNNING_IN_CONTAINER` | ✓ | | ✓ | No | Container flag |
@@ -336,13 +258,12 @@ Variables are validated at startup:
### Frontend Validation
Zod schemas validate variables at runtime:
The active website relies on runtime defaults for local development and the surrounding
server environment in deployed environments.
- Type checking (string, number, URL, etc.)
- Format validation (email, URL patterns)
- Required vs optional enforcement
**Location**: `src/Website/src/config/env/index.ts`
- `API_BASE_URL` defaults to `http://localhost:8080`
- `SESSION_SECRET` falls back to a development-only local secret
- `NODE_ENV` controls secure cookie behavior
## Example Configuration Files
@@ -359,6 +280,7 @@ DB_PASSWORD=Dev_Password_123!
ACCESS_TOKEN_SECRET=<generated-with-openssl>
REFRESH_TOKEN_SECRET=<generated-with-openssl>
CONFIRMATION_TOKEN_SECRET=<generated-with-openssl>
WEBSITE_BASE_URL=http://localhost:3000
# Migration
CLEAR_DATABASE=true
@@ -373,28 +295,10 @@ ACCEPT_EULA=Y
MSSQL_PID=Express
```
### `.env.local` (Frontend)
### Frontend local runtime example
```bash
# Base
BASE_URL=http://localhost:3000
NODE_ENV=development
# Authentication
API_BASE_URL=http://localhost:8080
SESSION_SECRET=<generated-with-openssl>
# Database (current Prisma setup)
POSTGRES_PRISMA_URL=postgresql://user:pass@db.neon.tech/biergarten?pgbouncer=true
POSTGRES_URL_NON_POOLING=postgresql://user:pass@db.neon.tech/biergarten
# External Services
NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME=my-cloud
CLOUDINARY_KEY=123456789012345
CLOUDINARY_SECRET=abcdefghijklmnopqrstuvwxyz
MAPBOX_ACCESS_TOKEN=pk.eyJ...
SPARKPOST_API_KEY=abc123...
SPARKPOST_SENDER_ADDRESS=noreply@biergarten.app
# Admin (for seeding)
ADMIN_PASSWORD=Admin_Dev_Password_123!
NODE_ENV=development
```

View File

@@ -1,19 +1,16 @@
# Getting Started
This guide will help you set up and run The Biergarten App in your development
environment.
This guide covers local setup for the current Biergarten stack: the .NET backend in
`src/Core` and the active React Router frontend in `src/Website`.
## Prerequisites
Before you begin, ensure you have the following installed:
- **.NET SDK 10+**
- **Node.js 18+**
- **Docker Desktop** or equivalent Docker Engine setup
- **Java 8+** if you want to regenerate PlantUML diagrams
- **.NET SDK 10+** - [Download](https://dotnet.microsoft.com/download)
- **Node.js 18+** - [Download](https://nodejs.org/)
- **Docker Desktop** - [Download](https://www.docker.com/products/docker-desktop)
(recommended)
- **Java 8+** - Required for generating diagrams from PlantUML (optional)
## Quick Start with Docker (Recommended)
## Recommended Path: Docker for Backend, Node for Frontend
### 1. Clone the Repository
@@ -22,174 +19,120 @@ git clone <repository-url>
cd the-biergarten-app
```
### 2. Configure Environment Variables
Copy the example environment file:
### 2. Configure Backend Environment Variables
```bash
cp .env.example .env.dev
```
Edit `.env.dev` with your configuration:
At minimum, ensure `.env.dev` includes valid database and token values:
```bash
# Database (component-based for Docker)
DB_SERVER=sqlserver,1433
DB_NAME=Biergarten
DB_USER=sa
DB_PASSWORD=YourStrong!Passw0rd
# JWT Authentication
JWT_SECRET=your-secret-key-minimum-32-characters-required
ACCESS_TOKEN_SECRET=<generated>
REFRESH_TOKEN_SECRET=<generated>
CONFIRMATION_TOKEN_SECRET=<generated>
WEBSITE_BASE_URL=http://localhost:3000
```
> For a complete list of environment variables, see
> [Environment Variables](environment-variables.md).
See [Environment Variables](environment-variables.md) for the full list.
### 3. Start the Development Environment
### 3. Start the Backend Stack
```bash
docker compose -f docker-compose.dev.yaml up -d
```
This command will:
This starts SQL Server, migrations, seeding, and the API.
- Start SQL Server container
- Run database migrations
- Seed initial data
- Start the API on http://localhost:8080
Available endpoints:
### 4. Access the API
- API Swagger: http://localhost:8080/swagger
- Health Check: http://localhost:8080/health
- **Swagger UI**: http://localhost:8080/swagger
- **Health Check**: http://localhost:8080/health
### 5. View Logs
### 4. Start the Active Frontend
```bash
# All services
docker compose -f docker-compose.dev.yaml logs -f
# Specific service
docker compose -f docker-compose.dev.yaml logs -f api.core
cd src/Website
npm install
API_BASE_URL=http://localhost:8080 SESSION_SECRET=dev-secret-change-me npm run dev
```
### 6. Stop the Environment
The website will be available at the local address printed by React Router dev.
Required frontend runtime variables for local work:
- `API_BASE_URL` - Base URL for the .NET API
- `SESSION_SECRET` - Cookie session signing secret for the website server
### 5. Optional: Run Storybook
```bash
docker compose -f docker-compose.dev.yaml down
cd src/Website
npm run storybook
```
# Remove volumes (fresh start)
Storybook runs at http://localhost:6006 by default.
## Useful Commands
### Backend
```bash
docker compose -f docker-compose.dev.yaml logs -f
docker compose -f docker-compose.dev.yaml down
docker compose -f docker-compose.dev.yaml down -v
```
## Manual Setup (Without Docker)
If you prefer to run services locally without Docker:
### Backend Setup
#### 1. Start SQL Server
You can use a local SQL Server instance or a cloud-hosted one. Ensure it's accessible and
you have the connection details.
#### 2. Set Environment Variables
### Frontend
```bash
# macOS/Linux
export DB_CONNECTION_STRING="Server=localhost,1433;Database=Biergarten;User Id=sa;Password=YourStrong!Passw0rd;TrustServerCertificate=True;"
export JWT_SECRET="your-secret-key-minimum-32-characters-required"
# Windows PowerShell
$env:DB_CONNECTION_STRING="Server=localhost,1433;Database=Biergarten;User Id=sa;Password=YourStrong!Passw0rd;TrustServerCertificate=True;"
$env:JWT_SECRET="your-secret-key-minimum-32-characters-required"
cd src/Website
npm run lint
npm run typecheck
npm run format:check
npm run test:storybook
npm run test:storybook:playwright
```
#### 3. Run Database Migrations
## Manual Backend Setup
If you do not want to use Docker, you can run the backend locally.
### 1. Set Environment Variables
```bash
export DB_CONNECTION_STRING="Server=localhost,1433;Database=Biergarten;User Id=sa;Password=YourStrong!Passw0rd;TrustServerCertificate=True;"
export ACCESS_TOKEN_SECRET="<generated>"
export REFRESH_TOKEN_SECRET="<generated>"
export CONFIRMATION_TOKEN_SECRET="<generated>"
export WEBSITE_BASE_URL="http://localhost:3000"
```
### 2. Run Migrations and Seed
```bash
cd src/Core
dotnet run --project Database/Database.Migrations/Database.Migrations.csproj
```
#### 4. Seed the Database
```bash
dotnet run --project Database/Database.Seed/Database.Seed.csproj
```
#### 5. Start the API
### 3. Start the API
```bash
dotnet run --project API/API.Core/API.Core.csproj
```
The API will be available at http://localhost:5000 (or the port specified in
launchSettings.json).
### Frontend Setup
> **Note**: The frontend is currently transitioning from its standalone Prisma/Postgres
> backend to the .NET API. Some features may still use the old backend.
#### 1. Navigate to Website Directory
```bash
cd Website
```
#### 2. Create Environment File
Create `.env.local` with frontend variables. See
[Environment Variables - Frontend](environment-variables.md#frontend-variables) for the
complete list.
```bash
BASE_URL=http://localhost:3000
NODE_ENV=development
# Generate secrets
CONFIRMATION_TOKEN_SECRET=$(openssl rand -base64 127)
RESET_PASSWORD_TOKEN_SECRET=$(openssl rand -base64 127)
SESSION_SECRET=$(openssl rand -base64 127)
# External services (you'll need to register for these)
NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME=your-cloud-name
CLOUDINARY_KEY=your-api-key
CLOUDINARY_SECRET=your-api-secret
NEXT_PUBLIC_MAPBOX_KEY=your-mapbox-token
# Database URL (current Prisma setup)
DATABASE_URL=your-postgres-connection-string
```
#### 3. Install Dependencies
```bash
npm install
```
#### 4. Run Prisma Migrations
```bash
npx prisma generate
npx prisma migrate dev
```
#### 5. Start Development Server
```bash
npm run dev
```
The frontend will be available at http://localhost:3000.
## Legacy Frontend Note
The previous Next.js frontend now lives in `src/Website-v1` and is not the active website.
Legacy setup details have been moved to [docs/archive/legacy-website-v1.md](archive/legacy-website-v1.md).
## Next Steps
- **Test the API**: Visit http://localhost:8080/swagger and try the endpoints
- **Run Tests**: See [Testing Guide](testing.md)
- **Learn the Architecture**: Read [Architecture Overview](architecture.md)
- **Understand Docker Setup**: See [Docker Guide](docker.md)
- **Database Details**: Check [Database Schema](database.md)
- Review [Architecture](architecture.md)
- Run backend and frontend checks from [Testing](testing.md)
- Use [Docker Guide](docker.md) for container troubleshooting

View File

@@ -4,11 +4,13 @@ This document describes the testing strategy and how to run tests for The Bierga
## Overview
The project uses a multi-layered testing approach:
The project uses a multi-layered testing approach across backend and frontend:
- **API.Specs** - BDD integration tests using Reqnroll (Gherkin)
- **Infrastructure.Repository.Tests** - Unit tests for data access layer
- **Service.Auth.Tests** - Unit tests for authentication business logic
- **Storybook Vitest project** - Browser-based interaction tests for shared website stories
- **Storybook Playwright suite** - Browser checks against Storybook-rendered components
## Running Tests with Docker (Recommended)
@@ -86,6 +88,33 @@ dotnet test Service/Service.Auth.Tests/Service.Auth.Tests.csproj
- No database required (uses Moq for mocking)
### Frontend Storybook Tests
```bash
cd src/Website
npm install
npm run test:storybook
```
**Purpose**:
- Verifies shared stories such as form fields, submit buttons, navbar states, toasts, and the theme gallery
- Runs in browser mode via Vitest and Storybook integration
### Frontend Playwright Storybook Tests
```bash
cd src/Website
npm install
npm run test:storybook:playwright
```
**Requirements**:
- Storybook dependencies installed
- Playwright browser dependencies installed
- The command will start or reuse the Storybook server defined in `playwright.storybook.config.ts`
## Test Coverage
### Current Coverage
@@ -112,6 +141,14 @@ dotnet test Service/Service.Auth.Tests/Service.Auth.Tests.csproj
- Register service with validation
- Business logic for authentication flow
**Frontend UI Coverage**:
- Shared submit button states
- Form field happy path and error presentation
- Navbar guest, authenticated, and mobile behavior
- Theme gallery rendering across Biergarten themes
- Toast interactions and themed notification display
### Planned Coverage
- [ ] Email verification workflow
@@ -121,6 +158,7 @@ dotnet test Service/Service.Auth.Tests/Service.Auth.Tests.csproj
- [ ] Beer post operations
- [ ] User follow/unfollow
- [ ] Image upload service
- [ ] Frontend route integration coverage beyond Storybook stories
## Testing Frameworks & Tools
@@ -254,6 +292,15 @@ Exit codes:
- `0` - All tests passed
- Non-zero - Test failures occurred
Frontend UI checks should also be included in CI for the active website workspace:
```bash
cd src/Website
npm ci
npm run test:storybook
npm run test:storybook:playwright
```
## Troubleshooting
### Tests Failing Due to Database Connection

5
pipeline/.clang-format Normal file
View File

@@ -0,0 +1,5 @@
---
BasedOnStyle: Google
ColumnLimit: 80
IndentWidth: 3
...

17
pipeline/.clang-tidy Normal file
View File

@@ -0,0 +1,17 @@
---
Checks: >
-*,
bugprone-*,
clang-analyzer-*,
cppcoreguidelines-*,
google-*,
modernize-*,
performance-*,
readability-*,
-cppcoreguidelines-avoid-magic-numbers,
-cppcoreguidelines-owning-memory,
-readability-magic-numbers,
-google-readability-todo
HeaderFilterRegex: "^(src|includes)/.*"
FormatStyle: file
...

3
pipeline/.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
dist
build
data

170
pipeline/CMakeLists.txt Normal file
View File

@@ -0,0 +1,170 @@
cmake_minimum_required(VERSION 3.20)
project(biergarten-pipeline VERSION 0.1.0 LANGUAGES CXX)
# Allows older dependencies to configure on newer CMake.
set(CMAKE_POLICY_VERSION_MINIMUM 3.5)
# Policies
cmake_policy(SET CMP0167 NEW) # FindBoost improvements
# Global Settings
set(CMAKE_CXX_STANDARD 23)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
option(ENABLE_CLANG_TIDY "Enable clang-tidy static analysis for project targets" ON)
option(ENABLE_CLANG_FORMAT_TARGETS "Enable clang-format helper targets" ON)
if(ENABLE_CLANG_TIDY)
find_program(CLANG_TIDY_EXE NAMES clang-tidy)
if(CLANG_TIDY_EXE)
set(BIERGARTEN_CLANG_TIDY_COMMAND
"${CLANG_TIDY_EXE};--config-file=${CMAKE_CURRENT_SOURCE_DIR}/.clang-tidy")
message(STATUS "clang-tidy enabled: ${CLANG_TIDY_EXE}")
else()
message(STATUS "clang-tidy not found; static analysis is disabled")
endif()
endif()
# -----------------------------------------------------------------------------
# Compiler Options & Warnings (Interface Library)
# -----------------------------------------------------------------------------
add_library(project_options INTERFACE)
target_compile_options(project_options INTERFACE
$<$<CXX_COMPILER_ID:GNU,Clang>:
-Wall -Wextra -Wpedantic -Wshadow -Wconversion -Wsign-conversion -Wunused
>
$<$<CXX_COMPILER_ID:MSVC>:
/W4 /WX /permissive-
>
)
# -----------------------------------------------------------------------------
# Dependencies
# -----------------------------------------------------------------------------
find_package(CURL REQUIRED)
find_package(SQLite3 REQUIRED)
find_package(Boost 1.75 REQUIRED COMPONENTS program_options json)
include(FetchContent)
# spdlog (Logging)
FetchContent_Declare(
spdlog
GIT_REPOSITORY https://github.com/gabime/spdlog.git
GIT_TAG v1.11.0
)
FetchContent_MakeAvailable(spdlog)
# llama.cpp (LLM Inference)
set(LLAMA_BUILD_TESTS OFF CACHE BOOL "" FORCE)
set(LLAMA_BUILD_EXAMPLES OFF CACHE BOOL "" FORCE)
set(LLAMA_BUILD_SERVER OFF CACHE BOOL "" FORCE)
FetchContent_Declare(
llama_cpp
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
GIT_TAG b8611
)
FetchContent_MakeAvailable(llama_cpp)
if(TARGET llama)
target_compile_options(llama PRIVATE
$<$<CXX_COMPILER_ID:AppleClang>:-include algorithm>
)
endif()
# -----------------------------------------------------------------------------
# Main Executable
# -----------------------------------------------------------------------------
set(PIPELINE_SOURCES
src/biergarten_data_generator.cpp
src/web_client/curl_web_client.cpp
src/data_generation/data_downloader.cpp
src/database/database.cpp
src/json_handling/json_loader.cpp
src/data_generation/llama/destructor.cpp
src/data_generation/llama/set_sampling_options.cpp
src/data_generation/llama/load.cpp
src/data_generation/llama/infer.cpp
src/data_generation/llama/generate_brewery.cpp
src/data_generation/llama/generate_user.cpp
src/data_generation/llama/helpers.cpp
src/data_generation/llama/load_brewery_prompt.cpp
src/data_generation/mock/data.cpp
src/data_generation/mock/deterministic_hash.cpp
src/data_generation/mock/load.cpp
src/data_generation/mock/generate_brewery.cpp
src/data_generation/mock/generate_user.cpp
src/json_handling/stream_parser.cpp
src/wikipedia/wikipedia_service.cpp
src/main.cpp
)
add_executable(biergarten-pipeline ${PIPELINE_SOURCES})
if(BIERGARTEN_CLANG_TIDY_COMMAND)
set_target_properties(biergarten-pipeline PROPERTIES
CXX_CLANG_TIDY "${BIERGARTEN_CLANG_TIDY_COMMAND}"
)
endif()
target_include_directories(biergarten-pipeline
PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}/includes
${llama_cpp_SOURCE_DIR}/include
)
target_link_libraries(biergarten-pipeline
PRIVATE
project_options
CURL::libcurl
SQLite::SQLite3
spdlog::spdlog
llama
Boost::program_options
Boost::json
)
if(ENABLE_CLANG_FORMAT_TARGETS)
find_program(CLANG_FORMAT_EXE NAMES clang-format)
if(CLANG_FORMAT_EXE)
file(GLOB_RECURSE FORMAT_SOURCES CONFIGURE_DEPENDS
${CMAKE_CURRENT_SOURCE_DIR}/src/**/*.cpp
${CMAKE_CURRENT_SOURCE_DIR}/src/**/*.cc
${CMAKE_CURRENT_SOURCE_DIR}/includes/**/*.h
${CMAKE_CURRENT_SOURCE_DIR}/includes/**/*.hpp
)
add_custom_target(format
COMMAND ${CLANG_FORMAT_EXE} -style=file -i ${FORMAT_SOURCES}
COMMENT "Formatting source files with clang-format (Google style)"
VERBATIM
)
add_custom_target(format-check
COMMAND ${CLANG_FORMAT_EXE} -style=file --dry-run --Werror ${FORMAT_SOURCES}
COMMENT "Checking source formatting with clang-format (Google style)"
VERBATIM
)
else()
message(STATUS "clang-format not found; format targets are disabled")
endif()
endif()
# -----------------------------------------------------------------------------
# Post-Build Steps & Utilities
# -----------------------------------------------------------------------------
add_custom_command(TARGET biergarten-pipeline POST_BUILD
COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_CURRENT_SOURCE_DIR}/output
COMMENT "Ensuring output directory exists"
)
find_program(VALGRIND valgrind)
if(VALGRIND)
add_custom_target(memcheck
COMMAND ${VALGRIND} --leak-check=full --error-exitcode=1 $<TARGET_FILE:biergarten-pipeline> --help
DEPENDS biergarten-pipeline
COMMENT "Running Valgrind memory check"
)
endif()

406
pipeline/README.md Normal file
View File

@@ -0,0 +1,406 @@
# Biergarten Pipeline
A high-performance C++23 data pipeline for fetching, parsing, and storing geographic data (countries, states, cities) with brewery metadata generation capabilities. The system supports both mock and LLM-based (llama.cpp) generation modes.
## Overview
The pipeline orchestrates **four key stages**:
1. **Download** - Fetches `countries+states+cities.json` from a pinned GitHub commit with optional local filesystem caching
2. **Parse** - Streams JSON using Boost.JSON's `basic_parser` to extract country/state/city records without loading the entire file into memory
3. **Store** - Inserts records into a file-based SQLite database with all operations performed sequentially in a single thread
4. **Generate** - Produces brewery metadata or user profiles (mock implementation; supports future LLM integration via llama.cpp)
## System Architecture
### Data Sources and Formats
- **Hierarchical Structure**: Countries array → states per country → cities per state
- **Data Fields**:
- `id` (integer)
- `name` (string)
- `iso2` / `iso3` (ISO country/state codes)
- `latitude` / `longitude` (geographic coordinates)
- **Source**: [dr5hn/countries-states-cities-database](https://github.com/dr5hn/countries-states-cities-database) on GitHub
- **Output**: Structured SQLite file-based database (`biergarten-pipeline.db`) + structured logging via spdlog
### Concurrency Model
The pipeline currently operates **single-threaded** with sequential stage execution:
1. **Download Phase**: Main thread blocks while downloading the source JSON file (if not in cache)
2. **Parse & Store Phase**: Main thread performs streaming JSON parse with immediate SQLite inserts
**Thread Safety**: While single-threaded, the `SqliteDatabase` component is **mutex-protected** using `std::mutex` (`dbMutex`) for all database operations. This design enables safe future parallelization without code modifications.
## Core Components
| Component | Purpose | Thread Safety | Dependencies |
| ----------------------------- | ----------------------------------------------------------------------------------------------- | -------------------------------------------- | --------------------------------------------- |
| **BiergartenDataGenerator** | Orchestrates pipeline execution; manages lifecycle of downloader, parser, and generator | Single-threaded coordinator | ApplicationOptions, WebClient, SqliteDatabase |
| **DataDownloader** | HTTP fetch with curl; optional filesystem cache; ETag support and retries | Blocking I/O; safe for startup | IWebClient, filesystem |
| **StreamingJsonParser** | Extends `boost::json::basic_parser`; emits country/state/city via callbacks; tracks parse depth | Single-threaded parse; callbacks thread-safe | Boost.JSON |
| **JsonLoader** | Wraps parser; dispatches callbacks for country/state/city; manages WorkQueue lifecycle | Produces to WorkQueue; safe callbacks | StreamingJsonParser, SqliteDatabase |
| **SqliteDatabase** | Manages schema initialization; insert/query methods for geographic data | Mutex-guarded all operations | SQLite3 |
| **IDataGenerator** (Abstract) | Interface for brewery/user metadata generation | Stateless virtual methods | N/A |
| **LlamaGenerator** | LLM-based generation via llama.cpp; configurable sampling (temperature, top-p, seed) | Manages llama_model* and llama_context* | llama.cpp, BreweryResult, UserResult |
| **MockGenerator** | Deterministic mock generation using seeded randomization | Stateless; thread-safe | N/A |
| **CURLWebClient** | HTTP client adapter; URL encoding; file downloads | cURL library bindings | libcurl |
| **WikipediaService** | (Planned) Wikipedia data lookups for enrichment | N/A | IWebClient |
## Database Schema
SQLite file-based database with **three core tables** and **indexes for fast lookups**:
### Countries
```sql
CREATE TABLE countries (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
iso2 TEXT,
iso3 TEXT
);
CREATE INDEX idx_countries_iso2 ON countries(iso2);
```
### States
```sql
CREATE TABLE states (
id INTEGER PRIMARY KEY,
country_id INTEGER NOT NULL,
name TEXT NOT NULL,
iso2 TEXT,
FOREIGN KEY (country_id) REFERENCES countries(id)
);
CREATE INDEX idx_states_country ON states(country_id);
```
### Cities
```sql
CREATE TABLE cities (
id INTEGER PRIMARY KEY,
state_id INTEGER NOT NULL,
country_id INTEGER NOT NULL,
name TEXT NOT NULL,
latitude REAL,
longitude REAL,
FOREIGN KEY (state_id) REFERENCES states(id),
FOREIGN KEY (country_id) REFERENCES countries(id)
);
CREATE INDEX idx_cities_state ON cities(state_id);
CREATE INDEX idx_cities_country ON cities(country_id);
```
## Architecture Diagram
```plantuml
@startuml biergarten-pipeline
!theme plain
skinparam monochrome true
skinparam classBackgroundColor #FFFFFF
skinparam classBorderColor #000000
package "Application Layer" {
class BiergartenDataGenerator {
- options: ApplicationOptions
- webClient: IWebClient
- database: SqliteDatabase
- generator: IDataGenerator
--
+ Run() : int
}
}
package "Data Acquisition" {
class DataDownloader {
- webClient: IWebClient
--
+ Download(url: string, filePath: string)
+ DownloadWithCache(url: string, cachePath: string)
}
interface IWebClient {
+ DownloadToFile(url: string, filePath: string)
+ Get(url: string) : string
+ UrlEncode(value: string) : string
}
class CURLWebClient {
- globalState: CurlGlobalState
--
+ DownloadToFile(url: string, filePath: string)
+ Get(url: string) : string
+ UrlEncode(value: string) : string
}
}
package "JSON Processing" {
class StreamingJsonParser {
- depth: int
--
+ on_object_begin()
+ on_object_end()
+ on_array_begin()
+ on_array_end()
+ on_key(str: string)
+ on_string(str: string)
+ on_number(value: int)
}
class JsonLoader {
--
+ LoadWorldCities(jsonPath: string, db: SqliteDatabase)
}
}
package "Data Storage" {
class SqliteDatabase {
- db: sqlite3*
- dbMutex: std::mutex
--
+ Initialize(dbPath: string)
+ InsertCountry(id: int, name: string, iso2: string, iso3: string)
+ InsertState(id: int, countryId: int, name: string, iso2: string)
+ InsertCity(id: int, stateId: int, countryId: int, name: string, lat: double, lon: double)
+ QueryCountries(limit: int) : vector<Country>
+ QueryStates(limit: int) : vector<State>
+ QueryCities() : vector<City>
+ BeginTransaction()
+ CommitTransaction()
# InitializeSchema()
}
struct Country {
id: int
name: string
iso2: string
iso3: string
}
struct State {
id: int
name: string
iso2: string
countryId: int
}
struct City {
id: int
name: string
countryId: int
}
}
package "Data Generation" {
interface IDataGenerator {
+ load(modelPath: string)
+ generateBrewery(cityName: string, countryName: string, regionContext: string) : BreweryResult
+ generateUser(locale: string) : UserResult
}
class LlamaGenerator {
- model: llama_model*
- context: llama_context*
- sampling_temperature: float
- sampling_top_p: float
- sampling_seed: uint32_t
--
+ load(modelPath: string)
+ generateBrewery(...) : BreweryResult
+ generateUser(locale: string) : UserResult
+ setSamplingOptions(temperature: float, topP: float, seed: int)
# infer(prompt: string) : string
}
class MockGenerator {
--
+ load(modelPath: string)
+ generateBrewery(...) : BreweryResult
+ generateUser(locale: string) : UserResult
}
struct BreweryResult {
name: string
description: string
}
struct UserResult {
username: string
bio: string
}
}
package "Enrichment (Planned)" {
class WikipediaService {
- webClient: IWebClient
--
+ SearchCity(cityName: string, countryName: string) : string
}
}
' Relationships
BiergartenDataGenerator --> DataDownloader
BiergartenDataGenerator --> JsonLoader
BiergartenDataGenerator --> SqliteDatabase
BiergartenDataGenerator --> IDataGenerator
DataDownloader --> IWebClient
CURLWebClient ..|> IWebClient
JsonLoader --> StreamingJsonParser
JsonLoader --> SqliteDatabase
LlamaGenerator ..|> IDataGenerator
MockGenerator ..|> IDataGenerator
SqliteDatabase --> Country
SqliteDatabase --> State
SqliteDatabase --> City
LlamaGenerator --> BreweryResult
LlamaGenerator --> UserResult
MockGenerator --> BreweryResult
MockGenerator --> UserResult
WikipediaService --> IWebClient
@enduml
```
## Configuration and Extensibility
### Command-Line Arguments
Boost.Program_options provides named CLI arguments. Running without arguments displays usage instructions.
```bash
./biergarten-pipeline [options]
```
**Requirement**: Exactly one of `--mocked` or `--model` must be specified.
| Argument | Short | Type | Purpose |
| --------------- | ----- | ------ | --------------------------------------------------------------- |
| `--mocked` | - | flag | Use mocked generator for brewery/user data |
| `--model` | `-m` | string | Path to LLM model file (gguf); mutually exclusive with --mocked |
| `--cache-dir` | `-c` | path | Directory for cached JSON (default: `/tmp`) |
| `--temperature` | - | float | LLM sampling temperature 0.0-1.0 (default: `0.8`) |
| `--top-p` | - | float | Nucleus sampling parameter 0.0-1.0 (default: `0.92`) |
| `--seed` | - | int | Random seed: -1 for random (default: `-1`) |
| `--help` | `-h` | flag | Show help message |
**Note**: The data source is always pinned to commit `c5eb7772` (stable 2026-03-28) and cannot be changed.
**Note**: When `--mocked` is used, any sampling parameters (`--temperature`, `--top-p`, `--seed`) are ignored with a warning.
### Usage Examples
```bash
# Mocked generator (deterministic, no LLM required)
./biergarten-pipeline --mocked
# With LLM model
./biergarten-pipeline --model ./models/llama.gguf --cache-dir /var/cache
# Mocked with extra parameters provided (will be ignored with warning)
./biergarten-pipeline --mocked --temperature 0.5 --top-p 0.8 --seed 42
# Show help
./biergarten-pipeline --help
```
## Building and Running
### Prerequisites
- **C++23 compiler** (g++, clang, MSVC)
- **CMake** 3.20+
- **curl** (for HTTP downloads)
- **sqlite3** (database backend)
- **Boost** 1.75+ (requires Boost.JSON and Boost.Program_options)
- **spdlog** v1.11.0 (fetched via CMake FetchContent)
- **llama.cpp** (fetched via CMake FetchContent for LLM inference)
### Build
```bash
mkdir -p build
cd build
cmake ..
cmake --build . --target biergarten-pipeline -- -j
```
### Run
```bash
./build/biergarten-pipeline
```
**Output**:
- Console logs with structured spdlog output
- Cached JSON file: `/tmp/countries+states+cities.json`
- SQLite database: `biergarten-pipeline.db` (in output directory)
## Code Quality and Static Analysis
### Formatting
This project uses **clang-format** with the **Google C++ style guide**:
```bash
# Apply formatting to all source files
cmake --build build --target format
# Check formatting without modifications
cmake --build build --target format-check
```
### Static Analysis
This project uses **clang-tidy** with configurations for Google, modernize, performance, and bug-prone rules (`.clang-tidy`):
Static analysis runs automatically during compilation if `clang-tidy` is available.
## Code Implementation Summary
### Key Achievements
**Full pipeline implementation** - Download → Parse → Store → Generate
**Streaming JSON parser** - Memory-efficient processing via Boost.JSON callbacks
**Thread-safe SQLite wrapper** - Mutex-protected database for future parallelization
**Flexible data generation** - Abstract IDataGenerator interface supporting both mock and LLM modes
**Comprehensive CLI** - Boost.Program_options with sensible defaults
**Production-grade logging** - spdlog integration for structured output
**Build quality** - CMake with clang-format/clang-tidy integration
### Architecture Patterns
- **Interface-based design**: `IWebClient`, `IDataGenerator` abstract base classes enable substitution and testing
- **Dependency injection**: Components receive dependencies via constructors (BiergartenDataGenerator)
- **RAII principle**: SQLite connections and resources managed via destructors
- **Callback-driven parsing**: Boost.JSON parser emits events to processing callbacks
- **Transaction-scoped inserts**: BeginTransaction/CommitTransaction for batch performance
### External Dependencies
| Dependency | Version | Purpose | Type |
| ---------- | ------- | ---------------------------------- | ------- |
| Boost | 1.75+ | JSON parsing, CLI argument parsing | Library |
| SQLite3 | - | Persistent data storage | System |
| libcurl | - | HTTP downloads | System |
| spdlog | v1.11.0 | Structured logging | Fetched |
| llama.cpp | b8611 | LLM inference engine | Fetched |
to validate formatting without modifying files.
clang-tidy runs automatically on the biergarten-pipeline target when available. You can disable it at configure time:
cmake -DENABLE_CLANG_TIDY=OFF ..
You can also disable format helper targets:
cmake -DENABLE_CLANG_FORMAT_TARGETS=OFF ..

View File

@@ -0,0 +1,157 @@
#ifndef BIERGARTEN_PIPELINE_BIERGARTEN_DATA_GENERATOR_H_
#define BIERGARTEN_PIPELINE_BIERGARTEN_DATA_GENERATOR_H_
#include <memory>
#include <string>
#include <unordered_map>
#include <vector>
#include "data_generation/data_generator.h"
#include "database/database.h"
#include "web_client/web_client.h"
#include "wikipedia/wikipedia_service.h"
/**
* @brief Program options for the Biergarten pipeline application.
*/
struct ApplicationOptions {
/// @brief Path to the LLM model file (gguf format); mutually exclusive with
/// use_mocked.
std::string model_path;
/// @brief Use mocked generator instead of LLM; mutually exclusive with
/// model_path.
bool use_mocked = false;
/// @brief Directory for cached JSON and database files.
std::string cache_dir;
/// @brief LLM sampling temperature (0.0 to 1.0, higher = more random).
float temperature = 0.8f;
/// @brief LLM nucleus sampling top-p parameter (0.0 to 1.0, higher = more
/// random).
float top_p = 0.92f;
/// @brief Context window size (tokens) for LLM inference. Higher values
/// support longer prompts but use more memory.
uint32_t n_ctx = 2048;
/// @brief Random seed for sampling (-1 for random, otherwise non-negative).
int seed = -1;
/// @brief Git commit hash for database consistency (always pinned to
/// c5eb7772).
std::string commit = "c5eb7772";
};
/**
* @brief Main data generator class for the Biergarten pipeline.
*
* This class encapsulates the core logic for generating brewery data.
* It handles database initialization, data loading/downloading, and brewery
* generation.
*/
class BiergartenDataGenerator {
public:
/**
* @brief Construct a BiergartenDataGenerator with injected dependencies.
*
* @param options Application configuration options.
* @param web_client HTTP client for downloading data.
* @param database SQLite database instance.
*/
BiergartenDataGenerator(const ApplicationOptions& options,
std::shared_ptr<WebClient> web_client,
SqliteDatabase& database);
/**
* @brief Run the data generation pipeline.
*
* Performs the following steps:
* 1. Initialize database
* 2. Download geographic data if needed
* 3. Initialize the generator (LLM or Mock)
* 4. Generate brewery data for sample cities
*
* @return 0 on success, 1 on failure.
*/
int Run();
private:
/// @brief Immutable application options.
const ApplicationOptions options_;
/// @brief Shared HTTP client dependency.
std::shared_ptr<WebClient> webClient_;
/// @brief Database dependency.
SqliteDatabase& database_;
/**
* @brief Enriched city data with Wikipedia context.
*/
struct EnrichedCity {
int city_id;
std::string city_name;
std::string country_name;
std::string region_context;
};
/**
* @brief Initialize the data generator based on options.
*
* Creates either a MockGenerator (if no model path) or LlamaGenerator.
*
* @return A unique_ptr to the initialized generator.
*/
std::unique_ptr<DataGenerator> InitializeGenerator();
/**
* @brief Download and load geographic data if not cached.
*/
void LoadGeographicData();
/**
* @brief Query cities from database and build country name map.
*
* @return Vector of (City, country_name) pairs capped at 30 entries.
*/
std::vector<std::pair<City, std::string>> QueryCitiesWithCountries();
/**
* @brief Enrich cities with Wikipedia summaries.
*
* @param cities Vector of (City, country_name) pairs.
* @return Vector of enriched city data with context.
*/
std::vector<EnrichedCity> EnrichWithWikipedia(
const std::vector<std::pair<City, std::string>>& cities);
/**
* @brief Generate breweries for enriched cities.
*
* @param generator The data generator instance.
* @param cities Vector of enriched city data.
*/
void GenerateBreweries(DataGenerator& generator,
const std::vector<EnrichedCity>& cities);
/**
* @brief Log the generated brewery results.
*/
void LogResults() const;
/**
* @brief Helper struct to store generated brewery data.
*/
struct GeneratedBrewery {
int city_id;
std::string city_name;
BreweryResult brewery;
};
/// @brief Stores generated brewery data.
std::vector<GeneratedBrewery> generatedBreweries_;
};
#endif // BIERGARTEN_PIPELINE_BIERGARTEN_DATA_GENERATOR_H_

View File

@@ -0,0 +1,31 @@
#ifndef BIERGARTEN_PIPELINE_DATA_GENERATION_DATA_DOWNLOADER_H_
#define BIERGARTEN_PIPELINE_DATA_GENERATION_DATA_DOWNLOADER_H_
#include <memory>
#include <stdexcept>
#include <string>
#include "web_client/web_client.h"
/// @brief Downloads and caches source geography JSON payloads.
class DataDownloader {
public:
/// @brief Initializes global curl state used by this downloader.
explicit DataDownloader(std::shared_ptr<WebClient> web_client);
/// @brief Cleans up global curl state.
~DataDownloader();
/// @brief Returns a local JSON path, downloading it when cache is missing.
std::string DownloadCountriesDatabase(
const std::string& cache_path,
const std::string& commit =
"c5eb7772" // Stable commit: 2026-03-28 export
);
private:
static bool FileExists(const std::string& file_path);
std::shared_ptr<WebClient> web_client_;
};
#endif // BIERGARTEN_PIPELINE_DATA_GENERATION_DATA_DOWNLOADER_H_

View File

@@ -0,0 +1,29 @@
#ifndef BIERGARTEN_PIPELINE_DATA_GENERATION_DATA_GENERATOR_H_
#define BIERGARTEN_PIPELINE_DATA_GENERATION_DATA_GENERATOR_H_
#include <string>
struct BreweryResult {
std::string name;
std::string description;
};
struct UserResult {
std::string username;
std::string bio;
};
class DataGenerator {
public:
virtual ~DataGenerator() = default;
virtual void Load(const std::string& model_path) = 0;
virtual BreweryResult GenerateBrewery(const std::string& city_name,
const std::string& country_name,
const std::string& region_context) = 0;
virtual UserResult GenerateUser(const std::string& locale) = 0;
};
#endif // BIERGARTEN_PIPELINE_DATA_GENERATION_DATA_GENERATOR_H_

View File

@@ -0,0 +1,51 @@
#ifndef BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_H_
#define BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_H_
#include <cstdint>
#include <string>
#include "data_generation/data_generator.h"
struct llama_model;
struct llama_context;
class LlamaGenerator final : public DataGenerator {
public:
LlamaGenerator() = default;
~LlamaGenerator() override;
void SetSamplingOptions(float temperature, float top_p, int seed = -1);
void SetContextSize(uint32_t n_ctx);
void Load(const std::string& model_path) override;
BreweryResult GenerateBrewery(const std::string& city_name,
const std::string& country_name,
const std::string& region_context) override;
UserResult GenerateUser(const std::string& locale) override;
private:
std::string Infer(const std::string& prompt, int max_tokens = 10000);
// Overload that allows passing a system message separately so chat-capable
// models receive a proper system role instead of having the system text
// concatenated into the user prompt (helps avoid revealing internal
// reasoning or instructions in model output).
std::string Infer(const std::string& system_prompt,
const std::string& prompt, int max_tokens = 10000);
std::string InferFormatted(const std::string& formatted_prompt,
int max_tokens = 10000);
std::string LoadBrewerySystemPrompt(const std::string& prompt_file_path);
std::string GetFallbackBreweryPrompt();
llama_model* model_ = nullptr;
llama_context* context_ = nullptr;
float sampling_temperature_ = 0.8f;
float sampling_top_p_ = 0.92f;
uint32_t sampling_seed_ = 0xFFFFFFFFu;
uint32_t n_ctx_ = 8192;
std::string brewery_system_prompt_;
};
#endif // BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_H_

View File

@@ -0,0 +1,32 @@
#ifndef BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_HELPERS_H_
#define BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_HELPERS_H_
#include <string>
#include <utility>
struct llama_model;
struct llama_vocab;
typedef int llama_token;
// Helper functions for LlamaGenerator methods
std::string PrepareRegionContextPublic(std::string_view region_context,
std::size_t max_chars = 700);
std::pair<std::string, std::string> ParseTwoLineResponsePublic(
const std::string& raw, const std::string& error_message);
std::string ToChatPromptPublic(const llama_model* model,
const std::string& user_prompt);
std::string ToChatPromptPublic(const llama_model* model,
const std::string& system_prompt,
const std::string& user_prompt);
void AppendTokenPiecePublic(const llama_vocab* vocab, llama_token token,
std::string& output);
std::string ValidateBreweryJsonPublic(const std::string& raw,
std::string& name_out,
std::string& description_out);
#endif // BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_HELPERS_H_

View File

@@ -0,0 +1,28 @@
#ifndef BIERGARTEN_PIPELINE_DATA_GENERATION_MOCK_GENERATOR_H_
#define BIERGARTEN_PIPELINE_DATA_GENERATION_MOCK_GENERATOR_H_
#include <string>
#include <vector>
#include "data_generation/data_generator.h"
class MockGenerator final : public DataGenerator {
public:
void Load(const std::string& model_path) override;
BreweryResult GenerateBrewery(const std::string& city_name,
const std::string& country_name,
const std::string& region_context) override;
UserResult GenerateUser(const std::string& locale) override;
private:
static std::size_t DeterministicHash(const std::string& a,
const std::string& b);
static const std::vector<std::string> kBreweryAdjectives;
static const std::vector<std::string> kBreweryNouns;
static const std::vector<std::string> kBreweryDescriptions;
static const std::vector<std::string> kUsernames;
static const std::vector<std::string> kBios;
};
#endif // BIERGARTEN_PIPELINE_DATA_GENERATION_MOCK_GENERATOR_H_

View File

@@ -0,0 +1,87 @@
#ifndef BIERGARTEN_PIPELINE_DATABASE_DATABASE_H_
#define BIERGARTEN_PIPELINE_DATABASE_DATABASE_H_
#include <sqlite3.h>
#include <mutex>
#include <string>
#include <vector>
struct Country {
/// @brief Country identifier from the source dataset.
int id;
/// @brief Country display name.
std::string name;
/// @brief ISO 3166-1 alpha-2 code.
std::string iso2;
/// @brief ISO 3166-1 alpha-3 code.
std::string iso3;
};
struct State {
/// @brief State or province identifier from the source dataset.
int id;
/// @brief State or province display name.
std::string name;
/// @brief State or province short code.
std::string iso2;
/// @brief Parent country identifier.
int country_id;
};
struct City {
/// @brief City identifier from the source dataset.
int id;
/// @brief City display name.
std::string name;
/// @brief Parent country identifier.
int country_id;
};
/// @brief Thread-safe SQLite wrapper for pipeline writes and readbacks.
class SqliteDatabase {
private:
sqlite3* db_ = nullptr;
std::mutex db_mutex_;
void InitializeSchema();
public:
/// @brief Closes the SQLite connection if initialized.
~SqliteDatabase();
/// @brief Opens the SQLite database at db_path and creates schema objects.
void Initialize(const std::string& db_path = ":memory:");
/// @brief Starts a database transaction for batched writes.
void BeginTransaction();
/// @brief Commits the active database transaction.
void CommitTransaction();
/// @brief Rolls back the active database transaction.
void RollbackTransaction();
/// @brief Inserts a country row.
void InsertCountry(int id, const std::string& name, const std::string& iso2,
const std::string& iso3);
/// @brief Inserts a state row linked to a country.
void InsertState(int id, int country_id, const std::string& name,
const std::string& iso2);
/// @brief Inserts a city row linked to state and country.
void InsertCity(int id, int state_id, int country_id,
const std::string& name, double latitude, double longitude);
/// @brief Returns city records including parent country id.
std::vector<City> QueryCities();
/// @brief Returns countries with optional row limit.
std::vector<Country> QueryCountries(int limit = 0);
/// @brief Returns states with optional row limit.
std::vector<State> QueryStates(int limit = 0);
};
#endif // BIERGARTEN_PIPELINE_DATABASE_DATABASE_H_

View File

@@ -0,0 +1,17 @@
#ifndef BIERGARTEN_PIPELINE_JSON_HANDLING_JSON_LOADER_H_
#define BIERGARTEN_PIPELINE_JSON_HANDLING_JSON_LOADER_H_
#include <string>
#include "database/database.h"
#include "json_handling/stream_parser.h"
/// @brief Loads world-city JSON data into SQLite through streaming parsing.
class JsonLoader {
public:
/// @brief Parses a JSON file and writes country/state/city rows into db.
static void LoadWorldCities(const std::string& json_path,
SqliteDatabase& db);
};
#endif // BIERGARTEN_PIPELINE_JSON_HANDLING_JSON_LOADER_H_

View File

@@ -0,0 +1,52 @@
#ifndef BIERGARTEN_PIPELINE_JSON_HANDLING_STREAM_PARSER_H_
#define BIERGARTEN_PIPELINE_JSON_HANDLING_STREAM_PARSER_H_
#include <functional>
#include <string>
#include "database/database.h"
// Forward declaration to avoid circular dependency
class SqliteDatabase;
/// @brief In-memory representation of one parsed city entry.
struct CityRecord {
int id;
int state_id;
int country_id;
std::string name;
double latitude;
double longitude;
};
/// @brief Streaming SAX parser that emits city records during traversal.
class StreamingJsonParser {
public:
/// @brief Parses file_path and invokes callbacks for city rows and progress.
static void Parse(const std::string& file_path, SqliteDatabase& db,
std::function<void(const CityRecord&)> on_city,
std::function<void(size_t, size_t)> on_progress = nullptr);
private:
/// @brief Mutable SAX handler state while traversing nested JSON arrays.
struct ParseState {
int current_country_id = 0;
int current_state_id = 0;
CityRecord current_city = {};
bool building_city = false;
std::string current_key;
int array_depth = 0;
int object_depth = 0;
bool in_countries_array = false;
bool in_states_array = false;
bool in_cities_array = false;
std::function<void(const CityRecord&)> on_city;
std::function<void(size_t, size_t)> on_progress;
size_t bytes_processed = 0;
};
};
#endif // BIERGARTEN_PIPELINE_JSON_HANDLING_STREAM_PARSER_H_

View File

@@ -0,0 +1,30 @@
#ifndef BIERGARTEN_PIPELINE_WEB_CLIENT_CURL_WEB_CLIENT_H_
#define BIERGARTEN_PIPELINE_WEB_CLIENT_CURL_WEB_CLIENT_H_
#include <memory>
#include "web_client/web_client.h"
// RAII for curl_global_init/cleanup.
// An instance of this class should be created in main() before any curl
// operations and exist for the lifetime of the application.
class CurlGlobalState {
public:
CurlGlobalState();
~CurlGlobalState();
CurlGlobalState(const CurlGlobalState&) = delete;
CurlGlobalState& operator=(const CurlGlobalState&) = delete;
};
class CURLWebClient : public WebClient {
public:
CURLWebClient();
~CURLWebClient() override;
void DownloadToFile(const std::string& url,
const std::string& file_path) override;
std::string Get(const std::string& url) override;
std::string UrlEncode(const std::string& value) override;
};
#endif // BIERGARTEN_PIPELINE_WEB_CLIENT_CURL_WEB_CLIENT_H_

View File

@@ -0,0 +1,22 @@
#ifndef BIERGARTEN_PIPELINE_WEB_CLIENT_WEB_CLIENT_H_
#define BIERGARTEN_PIPELINE_WEB_CLIENT_WEB_CLIENT_H_
#include <string>
class WebClient {
public:
virtual ~WebClient() = default;
// Downloads content from a URL to a file. Throws on error.
virtual void DownloadToFile(const std::string& url,
const std::string& file_path) = 0;
// Performs a GET request and returns the response body as a string. Throws
// on error.
virtual std::string Get(const std::string& url) = 0;
// URL-encodes a string.
virtual std::string UrlEncode(const std::string& value) = 0;
};
#endif // BIERGARTEN_PIPELINE_WEB_CLIENT_WEB_CLIENT_H_

View File

@@ -0,0 +1,27 @@
#ifndef BIERGARTEN_PIPELINE_WIKIPEDIA_WIKIPEDIA_SERVICE_H_
#define BIERGARTEN_PIPELINE_WIKIPEDIA_WIKIPEDIA_SERVICE_H_
#include <memory>
#include <string>
#include <string_view>
#include <unordered_map>
#include "web_client/web_client.h"
/// @brief Provides cached Wikipedia summary lookups for city and country pairs.
class WikipediaService {
public:
/// @brief Creates a new Wikipedia service with the provided web client.
explicit WikipediaService(std::shared_ptr<WebClient> client);
/// @brief Returns the Wikipedia summary extract for city and country.
[[nodiscard]] std::string GetSummary(std::string_view city,
std::string_view country);
private:
std::string FetchExtract(std::string_view query);
std::shared_ptr<WebClient> client_;
std::unordered_map<std::string, std::string> cache_;
};
#endif // BIERGARTEN_PIPELINE_WIKIPEDIA_WIKIPEDIA_SERVICE_H_

View File

@@ -0,0 +1,425 @@
================================================================================
BREWERY DATA GENERATION - COMPREHENSIVE SYSTEM PROMPT
================================================================================
ROLE AND OBJECTIVE
You are an experienced brewmaster and owner of a local craft brewery. Your task
is to create a distinctive, authentic name and a detailed description for your
brewery that genuinely reflects your specific location, your brewing philosophy,
the local culture, and your connection to the community.
The brewery must feel real and grounded in its specific place—not generic or
interchangeable with breweries from other regions. Every detail should build
authenticity and distinctiveness.
================================================================================
FORBIDDEN PHRASES AND CLICHÉS
================================================================================
NEVER USE THESE OVERUSED CONSTRUCTIONS (even in modified form):
- "Love letter to" / "tribute to" / "ode to"
- "Rolling hills" / "picturesque landscape" / "scenic beauty"
- "Every sip tells a story" / "every pint tells a story" / "transporting you"
- "Come for X, stay for Y" formula (Come for beer, stay for...)
- "Rich history/traditions" / "storied past" / "storied brewing tradition"
- "Passion" as a generic descriptor ("crafted with passion", "our passion")
- "Woven into the fabric" / "echoes of" / "steeped in"
- "Ancient roots" / "timeless traditions" / "time-honored heritage"
- Opening ONLY with landscape/geography (no standalone "Nestled...", "Where...")
- "Where tradition meets innovation"
- "Celebrating the spirit of [place]"
- "Raised on the values of" / "rooted in the values of"
- "Taste of [place]" / "essence of [place]"
- "From our family to yours"
- "Brewing excellence" / "committed to excellence"
- "Bringing people together" (without showing HOW)
- "Honoring local heritage" (without specifics)
================================================================================
SEVEN OPENING APPROACHES - ROTATE BETWEEN THESE
================================================================================
1. BEER STYLE ORIGIN ANGLE
Start by identifying a specific beer style historically made in or
influenced by the region. Explain why THIS place inspired that style.
Example Foundation: "Belgian Trappist ales developed from monastic traditions
in the Ardennes; our brewery continues that contemplative approach..."
2. BREWING CHALLENGE / ADVANTAGE ANGLE
Begin with a specific environmental or geographic challenge that shapes
the brewery's approach. Water hardness, altitude, climate, ingredient scarcity.
Example Foundation: "High-altitude fermentation requires patience; at 1,500m,
our lagers need 8 weeks to develop the crisp finish..."
3. FOUNDING STORY / PERSONAL MOTIVATION
Open with why the founder started THIS brewery HERE. Personal history,
escape from corporate work, multi-generational family legacy, career change.
Example Foundation: "After 20 years in finance, I returned to my hometown to
revive my grandfather's closed brewery using his original recipe notes..."
4. SPECIFIC LOCAL INGREDIENT / RESOURCE
Lead with a unique input source: special water, rare hops grown locally,
grain from a specific mill, honey from local apiaries, barrel aging with
local wood.
Example Foundation: "The cold springs below Sniffels Peak provide water so soft
it inspired our signature pale lager..."
5. CONTRADICTION / UNEXPECTED ANGLE
Start with a surprising fact about the place that defies stereotype.
Example Foundation: "Nobody expects beer culture in a Muslim-majority city,
yet our secular neighborhood has deep roots in 1920s beer halls..."
6. LOCAL EVENT / CULTURAL MOMENT
Begin with a specific historical moment, festival, cultural practice, or
seasonal tradition in the place.
Example Foundation: "Every October, the hop harvest brings itinerant workers
and tradition. Our brewery grew from a harvest celebration in 2008..."
7. TANGIBLE PHYSICAL DETAIL
Open by describing a concrete architectural or geographic feature: building
age, material, location relative to notable structures, layout, history of
the space.
Example Foundation: "This 1887 mill house once crushed grain; the original
water wheel still runs below our fermentation room..."
================================================================================
SPECIFICITY AND CONCRETENESS REQUIREMENTS
================================================================================
DO NOT GENERALIZE. Every brewery description must include:
✓ At least ONE concrete proper noun or specific reference:
- Actual local landmarks (mountain name, river name, street, neighborhood)
- Specific business partner or supplier name (if real to the region)
- Named local cultural event or historical period
- Specific beer style(s) with regional significance
- Actual geographic feature (e.g., "the volcanic ash in our soil")
✓ Mention specific beer styles relevant to the region's culture:
- German Bavaria: Dunkelweizen, Märzen, Kellerbier, Helles
- Belgian/Flemish: Lambic, Trappist, Strong Dark Ale
- British Isles: Brown Ale, Real Ale, Bitter, Cask Ale
- Czech: Pilsner, Bohemian Lager
- IPA/Hoppy: American regions, UK (origin)
- New Zealand/Australia: Hop-forward, experimental
- Japanese: Clean lagers, sake influence
- Mexican: Lager-centric, sometimes citrus
✓ Name concrete brewing challenges or advantages:
Examples: water minerality, altitude, temperature swings, grain varieties,
humidity, wild yeasts in the region, traditional equipment preserved in place
✓ Use sensory language SPECIFIC to the place:
NOT: "beautiful views" → "the copper beech trees turn rust-colored by
September"
NOT: "charming" → "the original tile floor from 1924 still mosaic-patterns
the taproom"
NOT: "authentic" → "the water chiller uses the original 1950s ammonia system"
✓ Avoid describing multiple regions with the same adjectives:
Don't say every brewery is "cozy" or "vibrant" or "historic"—be specific
about WHAT makes this one different from others in different regions.
================================================================================
STRUCTURAL PATTERNS - MIX THESE UP
================================================================================
NOT every description should follow: legacy → current brewing → call to action
TEMPLATE ROTATION (these are EXAMPLES, not formulas):
TEMPLATE A: [Region origin] → [specific challenge] → [how we adapted] → [result]
"The Saône River flooded predictably each spring. Medieval brewers learned
to schedule production around it. We use the same seasonal rhythm..."
TEMPLATE B: [Ingredient story] → [technique developed because of it] → [distinctive result]
"Our barley terraces face southwest; the afternoon sun dries the crop weeks
before northern valleys. This inspired our crisp, mineral-forward pale ale..."
TEMPLATE C: [Personal/family history (without generic framing)] → [specific challenge overcome] → [philosophy]
"My mother was a chemist studying water quality; she noticed the local supply
had unusual pH. Rather than fight it, we formulated our entire range around
it. The sulfate content sharpens our bitters..."
TEMPLATE D: [Describe the physical space in detail] → [how space enables brewing style] → [sensory experience]
"The brewhouse occupies a converted 1960s chemical factory. The stainless steel
vats still bear faded original markings. The building's thermal mass keeps
fermentation stable without modern refrigeration..."
TEMPLATE E: [Unexpected contradiction] → [explanation] → [brewing philosophy]
"In a region famous for wine, we're a beer-only operation. We embrace that
outsider status and brew adventurously, avoiding the 'respect tradition'
pressure wine makes locals feel..."
TEMPLATE F: [Community role, specific] → [what that demands] → [brewing expression]
"We're the only gathering space in the village that stays open after 10pm.
That responsibility means brewing beers that pair with conversation, not
provocation. Sessionable, food-friendly, endlessly drinkable..."
TEMPLATE G: [Backward chronology] → [how practices persist] → [what's evolved]
"Our great-grandfather hand-packed bottles in 1952. We still own his bench.
Even though we use machines now, the pace he set—careful, thoughtful—shapes
every decision. Nothing about us is fast..."
SOMETIMES skip the narrative entirely and just describe:
"We brew four core beers—a dry lager, a copper ale, a wheat beer, and a hop-
forward pale. The range itself tells our story: accessible, varied,
unpretentious. No flagship. No hero beer. Balance."
================================================================================
REGIONAL AUTHENTICITY GUIDELINES
================================================================================
GERMAN / ALPINE / CENTRAL EUROPEAN
- Discuss water hardness and mineral content
- Reference specific beer laws (Reinheitsgebot, Bavarian purity traditions)
- Name specific styles: Kellerbier, Märzen, Dunkelweizen, Helles, Alt, Zwickel
- Mention lager fermentation dominance and cool-cave advantages
- Consider beer hall culture, tradition of communal spaces
- Discuss barrel aging if applicable
- Reference precision/engineering in brewing approach
- Don't romanticize; emphasis can be on technique and consistency
MEDITERRANEAN / SOUTHERN EUROPEAN
- Reference local wine culture (compare or contrast with brewing)
- Mention grape varieties if relevant (some regions have wine-brewery overlap)
- Discuss sun exposure, heat challenges during fermentation
- Ingredient sourcing: local herbs, citrus, wheat quality
- May emphasize Mediterranean sociability and gathering spaces
- Consider how northern European brewing tradition transplanted here
- Water source and quality specific to region
- Seasonal agricultural connections (harvest timing, etc.)
ANGLO-SAXON / BRITISH ISLES / SCANDINAVIAN
- Real ale, cask conditioning, hand-pulled pints
- IPA heritage (if British, England specifically; if American, different innovation story)
- Hops: specific varietal heritage (Fuggle, Golding, Cascade, etc.)
- Pub culture and community gathering
- Ales: top-fermented, warmer fermentation temperatures
- May emphasize working-class history or rural traditions
- Cider/mead/fermented heritage alongside beer
NEW WORLD (US, AUSTRALIA, NZ, SOUTH AFRICA)
- Emphasize experimentation and lack of brewing "rules"
- Ingredient sourcing: local grain growers, foraged hops, local suppliers
- May reference mining heritage, recent settlement, diverse immigration
- Craft beer boom influence: how does this brewery differentiate?
- Often: bold flavors, high ABVs, creative adjuncts
- Can emphasize anti-tradition or deliberate rule-breaking
- Emphasis on farmer partnerships and local food scenes
SMALL VILLAGES / RURAL AREAS
- Brewery likely serves as actual gathering place—explain HOW
- Ingredient sourcing highly local (grain from X farm, water from Y spring)
- May be family operation or multi-generation story
- Role in community identity and events
- Accessibility and lack of pretension
- Seasonal rhythm and agricultural calendar influence
- Risk: Don't make it overly quaint or "simpler times" nostalgic
URBAN / NEIGHBORHOOD-BASED
- Distinctive neighborhood identity (don't just say "vibrant")
- Specific business community or residential character
- Street-level visibility and casual drop-in culture
- May emphasize diversity, immigrant heritage, gentrification navigation
- Smaller brewing scale in dense area (space constraints)
- Walking-distance customer base instead of destination draw
- May have stronger food pairing focus (food truck culture, restaurant neighbors)
WINE REGIONS (Italy, France, Spain, Germany's Mosel, etc.)
- Show awareness of wine's prestige locally
- Explain why brewing exists here despite wine dominance
- Does brewery respect wine or deliberately provide alternative?
- Ingredient differences: water quality suited to beer, not wine
- Brewing approach: precise, clean—influenced by wine mentality
- May emphasize beer's sociability vs. wine's formality
- Historical context: beer predates or coexists with wine tradition
BEER-HERITAGE HOTSPOTS (Belgium, Germany, UK, Czech Republic)
- Can't ignore the weight of history without acknowledging it
- Do you innovate within tradition or break from it? Say which.
- Specific pride in one style over others (Lambic specialist, Trappist-inspired, etc.)
- May emphasize family legacy or generational knowledge
- Regional identity VERY strong—brewery reflects this unapologetically
- Risk: Avoid claiming to "honor" or "continue" without specifics
================================================================================
TONE VARIATIONS - NOT ALL BREWERIES ARE SOULFUL
================================================================================
These descriptions should NOT all sound romantic, quaint, or emotionally
passionate. These are alternative tones:
IRREVERENT / HUMOROUS
"We're brewing beer because wine required too much prayer. Less spirituality,
more hops. Our ales are big, unpolished, and perfect after a day's work."
MATTER-OF-FACT / ENGINEERING-FOCUSED
"Brewing is chemistry. We source ingredient components, control variables,
and optimize for reproducibility. If that sounds clinical, good—consistency
is our craft."
PROUDLY UNPRETENTIOUS / WORKING-CLASS
"This isn't farm-to-table aspirational nonsense. It's a neighborhood beer.
$4 pints. No reservations. No sipping notes. Tastes good, fills the glass,
keeps you coming back."
MINIMALIST / DIRECT
"We brew three beers. They're good. Come drink one."
BUSINESS-FOCUSED / PRACTICAL
"Starting a brewery in 2015 meant finding a niche. We're the only nano-
brewery serving the airport district. Our rapid turnover and distribution
focus differentiate us from weekend hobbyists."
CONFRONTATIONAL / REBELLIOUS
"Craft beer got boring. Expensive IPAs and flavor-chasing. We're brewing
wheat beers and forgotten styles because fashion is temporary; good beer is timeless."
MIX these tones across your descriptions. Some breweries should sound romantic
and place-proud. Others should sound irreverent or practical.
================================================================================
NARRATIVE CLICHÉS TO ABSOLUTELY AVOID
================================================================================
1. THE "HIDDEN GEM" FRAMING
Don't use discovery language: "hidden," "lesser-known," "off the beaten path,"
"tucked away." Implies marketing speak, not authenticity.
2. OVERT NOSTALGIA / "SIMPLER TIMES"
Don't appeal to vague sense that past was better: "yearning for," "those
days," "how things used to be." Lazy and off-putting.
3. EMPTY "GATHERING PLACE" CLAIMS
Don't just assert "we bring people together." Show HOW: local workers' lunch
spot? Trivia night tradition? Live music venue? Political meeting ground?
4. "SPECIAL" WITHOUT EVIDENCE
Don't declare location is "special" or "unique." SHOW what makes it distinct
through specific details, not assertion.
5. "WE BELIEVE IN" AS PLACEHOLDER
Every brewery claims to "believe in" quality, community, craft, sustainability.
These are empty. What specific belief drives THIS brewery's choices?
6. "ESCAPE / RETREAT" FRAMING
Don't suggest beer allows people to escape reality, retreat from the world,
or "get away." Implies you don't trust the place itself to be compelling.
7. SUPERLATIVE CLAIMS
Don't use: "finest," "best," "most authentic," "truly legendary." Let details
prove these implied claims instead.
8. PASSIVE VOICE ABOUT YOUR OWN BREWERY
Avoid: "beloved by locals," "known for its," "celebrated for." Active voice:
what does the brewery actively DO?
================================================================================
LENGTH AND CONTENT REQUIREMENTS
================================================================================
TARGET LENGTH: 120-180 words
- Long enough to establish place and brewing philosophy
- Short enough to avoid meandering or repetition
- Specific enough that brewery feels real and unreplicable
REQUIRED ELEMENTS (at least ONE each):
✓ Concrete location reference (proper noun, landmark, geographic feature)
✓ One specific brewing detail (challenge, advantage, technique, ingredient)
✓ Sensory language specific to the place (NOT generic adjectives)
✓ Distinct tone/voice (don't all sound the same quiet reverence)
OPTIONAL ELEMENTS:
- Name 1-2 specific beer styles or beer names
- Personal/family story (if it illuminates why brewery exists here)
- Ingredient sourcing or supply chain detail
- Community role (with evidence, not assertion)
- Regional historical context (brief, specific)
WORD ECONOMY:
- Don't waste words on "we believe in quality" or "committed to excellence"
- Don't use filler adjectives: "authentic," "genuine," "real," "true," "local"
(these should be IMPLIED by specific details)
- Every sentence should add information, flavor, or distinctive detail
================================================================================
SENSORY LANGUAGE GUIDELINES
================================================================================
AVOID THESE GENERIC SENSORY WORDS (they're lazy placeholders):
- "Beautiful," "picturesque," "gorgeous," "stunning"
- "Warm," "cozy," "inviting" (without context)
- "Vibrant," "lively," "energetic" (without examples)
- "Charming," "quaint," "rustic" (without specifics)
USE INSTEAD: Specific, concrete sensory details
- Colors: "copper beech," "rust-stained brick," "frost-blue shutters"
- Textures: "the grain of wooden barrel hoops," "hand-smoothed stone," "grime-darkened windows"
- Sounds: "the hiss of the hand-pump," "coin-drop in the old register," "church bells on Sunday"
- Smells: "yeast-heavy floor," "wet limestone," "Hallertau hop resin"
- Tastes: (in the beer) "mineral-sharp," "sulfate clarity," "heather honey notes"
EXAMPLE SENSORY COMPARISON:
AVOID: "Our brewery captures the essence of the region's rustic charm."
USE: "The five-meter stone walls keep fermentation at 12°C without refrigeration.
On warm days, water drips from moss-covered blocks—the original cooling
system that hasn't changed in 150 years."
================================================================================
DIVERSITY ACROSS DATASET - WHAT NOT TO REPEAT
================================================================================
Since you're generating many breweries, ensure variety by:
□ Alternating tone (soulful → irreverent → matter-of-fact → working-class, etc.)
□ Varying opening approach (don't use beer-style origin twice in a row)
□ Different geographic contexts (don't make all small villages sound the same)
□ Distinct brewery sizes/models (nano-brewery, family operation, investor-backed, etc.)
□ Various types of "draw" (neighborhood destination vs. local-only vs. tourist
attraction vs. untouched community staple)
□ Diverse relationship to beer history/tradition (embrace it, subvert it, ignore it)
□ Different community roles (political space, athlete hangout, food destination,
working person's bar, experimentation lab, etc.)
If you notice yourself using the same phrasing twice within three breweries,
STOP and take a completely different approach for the next one.
================================================================================
QUALITY CHECKLIST
================================================================================
Before submitting your brewery description, verify:
□ Zero clichés from the FORBIDDEN list appear anywhere
□ At least one specific proper noun or concrete reference included
□ No more than two generic adjectives in the entire description
□ The brewery is genuinely unreplicable (wouldn't work in a different location)
□ Tone matches a SPECIFIC angle (not generic reverence)
□ Opening sentence is distinctive and unexpected
□ No sentence says the same thing twice in different words
□ At least one detail is surprising or specific to this place
□ The description would make sense ONLY for this location/region
□ "Passion," "tradition," "community" either don't appear or appear with
specific context/evidence
================================================================================
OUTPUT FORMAT
================================================================================
Return ONLY a valid JSON object with exactly two keys:
{
"name": "Brewery Name Here",
"description": "Full description text here..."
}
Requirements:
- name: 2-5 words, distinctive, memorable
- description: 120-180 words, follows all guidelines above
- Valid JSON (escaped quotes, no line breaks in strings)
- No markdown, no backticks, no code formatting
- No preamble before the JSON
- No trailing text after the JSON
- No explanations or commentary
================================================================================

View File

@@ -0,0 +1,169 @@
================================================================================
BREWERY DATA GENERATION SYSTEM PROMPT
================================================================================
ROLE AND OBJECTIVE
You are an experienced brewmaster creating authentic brewery descriptions that
feel real and grounded in specific places. Every detail should prove the brewery
could only exist in this location. Write as a brewmaster would—focused on concrete
details, not marketing copy.
================================================================================
FORBIDDEN PHRASES AND CLICHÉS
================================================================================
NEVER USE THESE (even in modified form):
- "Love letter to" / "tribute to" / "ode to" / "rolling hills" / "picturesque"
- "Every sip tells a story" / "Come for X, stay for Y" / "Where tradition meets innovation"
- "Rich history" / "ancient roots" / "timeless traditions" / "time-honored heritage"
- "Passion" (standalone descriptor) / "brewing excellence" / "commitment to quality"
- "Authentic" / "genuine" / "real" / "true" (SHOW these, don't state them)
- "Bringing people together" (without HOW) / "community gathering place" (without proof)
- "Hidden gem" / "secret" / "lesser-known" / "beloved by locals"
- Generic adjectives: "beautiful," "gorgeous," "lovely," "cozy," "charming," "vibrant"
- Vague temporal claims: "simpler times," "the good old days," "escape from the modern world"
- Passive voice: "is known for," "has become famous for," "has earned a reputation"
================================================================================
OPENING APPROACHES (Choose ONE per brewery)
================================================================================
1. BEER STYLE ORIGIN: Start with a specific historical beer style from this
region, explain why this place created it, show how your brewery continues it.
Key: Name specific style → why this region made it → how you continue it
2. BREWING CHALLENGE: Begin with a specific environmental constraint (altitude,
water hardness, temperature, endemic yeasts). Explain the technical consequence
and what decision you made because of it.
Key: Name constraint → technical consequence → your response → distinctive result
3. FOUNDING STORY: Why did the founder return/move HERE? What did they discover?
What specific brewing decision followed? Include a concrete artifact (logs, equipment).
Key: Real motivation → specific discovery → brewing decision that stemmed from it
4. LOCAL INGREDIENT: What unique resource defines your brewery? Why is it unique?
What brewing constraint or opportunity does it create?
Key: Specific ingredient/resource → why unique → brewing choices it enables
5. CONTRADICTION: What is the region famous for? Why does your brewery do the
opposite? Make the contradiction a strength, not an apology.
Key: Regional identity → why you diverge → what you do instead → why it works
6. CULTURAL MOMENT: What specific seasonal tradition or event shapes your brewery?
How do you connect to it? What brewing decisions follow?
Key: Specific tradition/event → your brewery's relationship → brewing decisions
7. PHYSICAL SPACE: Describe a specific architectural feature with date/material.
How does it create technical advantage? What sensory details matter? Why keep
constraints instead of modernizing?
Key: Specific feature → technical consequence → sensory details → why you keep it
================================================================================
SPECIFICITY REQUIREMENTS
================================================================================
Every brewery description MUST include (minimum 2-3 of each):
1. CONCRETE PROPER NOUNS (at least 2)
- Named geographic features: "Saône River," "Monte Guzzo," "Hallertau region"
- Named landmarks: "St. Augustine Cathedral," "the old train station," "Harbor Point"
- Named varieties: "Saaz hops," "Maris Otter barley," "wild Lambic culture"
- Named local suppliers: "[Farmer name]'s wheat," "limestone quarry at Kinderheim"
- Named historical periods: "post-WWII reconstruction," "the 1952 flood"
2. BREWING-SPECIFIC DETAILS (at least 1-2)
- Water chemistry: "58 ppm calcium, 45 ppm sulfate" or temperature/pH specifics
- Altitude/climate constraints: "1,500m elevation means fermentation at 2-3°C lower"
- Temperature swings: "winters reach -20°C, summers hit 35°C; requires separate strategies"
- Endemic challenges: "Brettanomyces naturally present; exposed wort gets infected within hours"
- Equipment constraints: "original wooden tun from 1954 still seals better than stainless steel"
- Ingredient limitations: "fresh hops available only August-September; plan year around that"
3. SENSORY DETAILS SPECIFIC TO THIS PLACE (at least 1)
NOT generic: "beautiful, charming, cozy"
Instead: "copper beech trees turn rust-colored by September, visible from fermentation windows"
Instead: "boot-scrape grooves worn by coal miners still visible in original tile floor"
Instead: "fermentation produces ethanol vapor visible in morning frost every September"
Instead: "3-meter stone walls keep fermentation at 13°C naturally; sitting under stone feels colder"
PROOF TEST: Could this brewery description fit in Chile? Germany? Japan?
- If YES: add more place-specific details
- If NO: you're on track. Identity should be inseparable from location.
================================================================================
TONE VARIATIONS
================================================================================
Rotate tones consciously. Examples:
IRREVERENT: "We're brewing beer because wine required ritual and prayer. Less
spirituality, more hops. Our ales are big, unpolished. Named our Brown Ale
'Medieval Constipation' because the grain gives texture."
MATTER-OF-FACT: "Brewing is applied chemistry. We measure water mineral content
to the ppm, fermentation temperature to 0.5°C. Our Märzen has the same gravity,
ABV, and color every single batch. Precision is our craft."
WORKING-CLASS PROUD: "This isn't farm-to-table aspirational nonsense. It's a
neighborhood beer. Four dollars a pint. No reservations, no tasting notes.
Workers need somewhere to go."
MINIMALIST: "We brew three beers. They're good. That's it."
NOSTALGIC-GROUNDED: "My grandfather brewed in his basement. When he died in
1995, I found his brewing logs in 2015. I copied his exact recipes. Now the
fermentation smells like his basement."
================================================================================
LENGTH & CONTENT REQUIREMENTS
================================================================================
TARGET LENGTH: 150-250 words
REQUIRED ELEMENTS:
- At least 2-3 concrete proper nouns (named locations, suppliers, historical moments)
- At least 1-2 brewing-specific details (water chemistry, altitude, equipment constraints)
- At least 1 sensory detail specific to this place (visible, olfactory, tactile, or temporal)
- Consistent tone throughout (irreverent, matter-of-fact, working-class, nostalgic, etc.)
- One distinctive detail that proves the brewery could ONLY exist in this location
OPTIONAL ELEMENTS:
- Specific beer names (not just styles)
- Names of key people (if central to story)
- Explicit community role (with evidence)
- Actual sales/production details (if relevant)
DO NOT INCLUDE:
- Generic adjectives without evidence: "authentic," "genuine," "soulful," "passionate"
- Vague community claims without HOW: "gathering place," "beloved," "where people come together"
- Marketing language: "award-winning," "nationally recognized," "craft quality"
- Fillers: "and more," "creating memories," "for all to enjoy"
- Predictions: "we're working on," "coming soon," "we plan to"
================================================================================
OUTPUT FORMAT
================================================================================
Return ONLY a valid JSON object with exactly two keys:
{
"name": "Brewery Name Here",
"description": "Full description text here..."
}
Requirements:
- name: 2-5 words, distinctive, memorable
- description: 150-250 words, follows all guidelines
- Valid JSON (properly escaped quotes, no line breaks)
- No markdown, backticks, or code formatting
- No preamble or trailing text after JSON
Example:
{
"name": "Sniffels Peak Brewing",
"description": "The soft spring water beneath Sniffels Peak..."
}
================================================================================

View File

@@ -0,0 +1,158 @@
#include "biergarten_data_generator.h"
#include <spdlog/spdlog.h>
#include <algorithm>
#include <filesystem>
#include <unordered_map>
#include "data_generation/data_downloader.h"
#include "data_generation/llama_generator.h"
#include "data_generation/mock_generator.h"
#include "json_handling/json_loader.h"
#include "wikipedia/wikipedia_service.h"
BiergartenDataGenerator::BiergartenDataGenerator(
const ApplicationOptions& options, std::shared_ptr<WebClient> web_client,
SqliteDatabase& database)
: options_(options), webClient_(web_client), database_(database) {}
std::unique_ptr<DataGenerator> BiergartenDataGenerator::InitializeGenerator() {
spdlog::info("Initializing brewery generator...");
std::unique_ptr<DataGenerator> generator;
if (options_.model_path.empty()) {
generator = std::make_unique<MockGenerator>();
spdlog::info("[Generator] Using MockGenerator (no model path provided)");
} else {
auto llama_generator = std::make_unique<LlamaGenerator>();
llama_generator->SetSamplingOptions(options_.temperature, options_.top_p,
options_.seed);
llama_generator->SetContextSize(options_.n_ctx);
spdlog::info(
"[Generator] Using LlamaGenerator: {} (temperature={}, top-p={}, "
"n_ctx={}, seed={})",
options_.model_path, options_.temperature, options_.top_p,
options_.n_ctx, options_.seed);
generator = std::move(llama_generator);
}
generator->Load(options_.model_path);
return generator;
}
void BiergartenDataGenerator::LoadGeographicData() {
std::string json_path = options_.cache_dir + "/countries+states+cities.json";
std::string db_path = options_.cache_dir + "/biergarten-pipeline.db";
bool has_json_cache = std::filesystem::exists(json_path);
bool has_db_cache = std::filesystem::exists(db_path);
spdlog::info("Initializing SQLite database at {}...", db_path);
database_.Initialize(db_path);
if (has_db_cache && has_json_cache) {
spdlog::info("[Pipeline] Cache hit: skipping download and parse");
} else {
spdlog::info("\n[Pipeline] Downloading geographic data from GitHub...");
DataDownloader downloader(webClient_);
downloader.DownloadCountriesDatabase(json_path, options_.commit);
JsonLoader::LoadWorldCities(json_path, database_);
}
}
std::vector<std::pair<City, std::string>>
BiergartenDataGenerator::QueryCitiesWithCountries() {
spdlog::info("\n=== GEOGRAPHIC DATA OVERVIEW ===");
auto cities = database_.QueryCities();
// Build a quick map of country id -> name for per-city lookups.
auto all_countries = database_.QueryCountries(0);
std::unordered_map<int, std::string> country_map;
for (const auto& c : all_countries) {
country_map[c.id] = c.name;
}
spdlog::info("\nTotal records loaded:");
spdlog::info(" Countries: {}", database_.QueryCountries(0).size());
spdlog::info(" States: {}", database_.QueryStates(0).size());
spdlog::info(" Cities: {}", cities.size());
// Cap at 30 entries.
const size_t sample_count = std::min(size_t(30), cities.size());
std::vector<std::pair<City, std::string>> result;
for (size_t i = 0; i < sample_count; i++) {
const auto& city = cities[i];
std::string country_name;
const auto country_it = country_map.find(city.country_id);
if (country_it != country_map.end()) {
country_name = country_it->second;
}
result.push_back({city, country_name});
}
return result;
}
std::vector<BiergartenDataGenerator::EnrichedCity>
BiergartenDataGenerator::EnrichWithWikipedia(
const std::vector<std::pair<City, std::string>>& cities) {
WikipediaService wikipedia_service(webClient_);
std::vector<EnrichedCity> enriched;
for (const auto& [city, country_name] : cities) {
const std::string region_context =
wikipedia_service.GetSummary(city.name, country_name);
spdlog::debug("[Pipeline] Region context for {}: {}", city.name,
region_context);
enriched.push_back({city.id, city.name, country_name, region_context});
}
return enriched;
}
void BiergartenDataGenerator::GenerateBreweries(
DataGenerator& generator, const std::vector<EnrichedCity>& cities) {
spdlog::info("\n=== SAMPLE BREWERY GENERATION ===");
generatedBreweries_.clear();
for (const auto& enriched_city : cities) {
auto brewery = generator.GenerateBrewery(enriched_city.city_name,
enriched_city.country_name,
enriched_city.region_context);
generatedBreweries_.push_back(
{enriched_city.city_id, enriched_city.city_name, brewery});
}
}
void BiergartenDataGenerator::LogResults() const {
spdlog::info("\n=== GENERATED DATA DUMP ===");
for (size_t i = 0; i < generatedBreweries_.size(); i++) {
const auto& entry = generatedBreweries_[i];
spdlog::info("{}. city_id={} city=\"{}\"", i + 1, entry.city_id,
entry.city_name);
spdlog::info(" brewery_name=\"{}\"", entry.brewery.name);
spdlog::info(" brewery_description=\"{}\"", entry.brewery.description);
}
}
int BiergartenDataGenerator::Run() {
try {
LoadGeographicData();
auto generator = InitializeGenerator();
auto cities = QueryCitiesWithCountries();
auto enriched = EnrichWithWikipedia(cities);
GenerateBreweries(*generator, enriched);
LogResults();
spdlog::info("\nOK: Pipeline completed successfully");
return 0;
} catch (const std::exception& e) {
spdlog::error("ERROR: Pipeline failed: {}", e.what());
return 1;
}
}

View File

@@ -0,0 +1,44 @@
#include "data_generation/data_downloader.h"
#include <spdlog/spdlog.h>
#include <filesystem>
#include <fstream>
#include <sstream>
#include <stdexcept>
#include "web_client/web_client.h"
DataDownloader::DataDownloader(std::shared_ptr<WebClient> web_client)
: web_client_(std::move(web_client)) {}
DataDownloader::~DataDownloader() {}
bool DataDownloader::FileExists(const std::string& file_path) {
return std::filesystem::exists(file_path);
}
std::string DataDownloader::DownloadCountriesDatabase(
const std::string& cache_path, const std::string& commit) {
if (FileExists(cache_path)) {
spdlog::info("[DataDownloader] Cache hit: {}", cache_path);
return cache_path;
}
std::string url =
"https://raw.githubusercontent.com/dr5hn/"
"countries-states-cities-database/" +
commit + "/json/countries+states+cities.json";
spdlog::info("[DataDownloader] Downloading: {}", url);
web_client_->DownloadToFile(url, cache_path);
std::ifstream file_check(cache_path, std::ios::binary | std::ios::ate);
std::streamsize size = file_check.tellg();
file_check.close();
spdlog::info("[DataDownloader] OK: Download complete: {} ({:.2f} MB)",
cache_path, (size / (1024.0 * 1024.0)));
return cache_path;
}

View File

@@ -0,0 +1,31 @@
/**
* Destructor Module
* Ensures proper cleanup of llama.cpp resources (context and model) when the
* generator is destroyed, preventing memory leaks and resource exhaustion.
*/
#include "data_generation/llama_generator.h"
#include "llama.h"
LlamaGenerator::~LlamaGenerator() {
/**
* Free the inference context (contains KV cache and computation state)
*/
if (context_ != nullptr) {
llama_free(context_);
context_ = nullptr;
}
/**
* Free the loaded model (contains weights and vocabulary)
*/
if (model_ != nullptr) {
llama_model_free(model_);
model_ = nullptr;
}
/**
* Clean up the backend (GPU/CPU acceleration resources)
*/
llama_backend_free();
}

View File

@@ -0,0 +1,107 @@
/**
* Brewery Data Generation Module
* Uses the LLM to generate realistic brewery names and descriptions for a given
* location. Implements retry logic with validation and error correction to
* ensure valid JSON output conforming to the expected schema.
*/
#include <spdlog/spdlog.h>
#include <stdexcept>
#include <string>
#include "data_generation/llama_generator.h"
#include "data_generation/llama_generator_helpers.h"
BreweryResult LlamaGenerator::GenerateBrewery(
const std::string& city_name, const std::string& country_name,
const std::string& region_context) {
/**
* Preprocess and truncate region context to manageable size
*/
const std::string safe_region_context =
PrepareRegionContextPublic(region_context);
/**
* Load brewery system prompt from file
* Falls back to minimal inline prompt if file not found
* Default path: prompts/brewery_system_prompt_expanded.txt
*/
const std::string system_prompt =
LoadBrewerySystemPrompt("prompts/brewery_system_prompt_expanded.txt");
/**
* User prompt: provides geographic context to guide generation towards
* culturally appropriate and locally-inspired brewery attributes
*/
std::string prompt =
"Write a brewery name and place-specific long description for a craft "
"brewery in " +
city_name +
(country_name.empty() ? std::string("")
: std::string(", ") + country_name) +
(safe_region_context.empty()
? std::string(".")
: std::string(". Regional context: ") + safe_region_context);
/**
* Store location context for retry prompts (without repeating full context)
*/
const std::string retry_location =
"Location: " + city_name +
(country_name.empty() ? std::string("")
: std::string(", ") + country_name);
/**
* RETRY LOOP with validation and error correction
* Attempts to generate valid brewery data up to 3 times, with feedback-based
* refinement
*/
const int max_attempts = 3;
std::string raw;
std::string last_error;
// Limit output length to keep it concise and focused
constexpr int max_tokens = 1052;
for (int attempt = 0; attempt < max_attempts; ++attempt) {
// Generate brewery data from LLM
raw = Infer(system_prompt, prompt, max_tokens);
spdlog::debug("LlamaGenerator: raw output (attempt {}): {}", attempt + 1,
raw);
// Validate output: parse JSON and check required fields
std::string name;
std::string description;
const std::string validation_error =
ValidateBreweryJsonPublic(raw, name, description);
if (validation_error.empty()) {
// Success: return parsed brewery data
return {std::move(name), std::move(description)};
}
// Validation failed: log error and prepare corrective feedback
last_error = validation_error;
spdlog::warn("LlamaGenerator: malformed brewery JSON (attempt {}): {}",
attempt + 1, validation_error);
// Update prompt with error details to guide LLM toward correct output.
// For retries, use a compact prompt format to avoid exceeding token
// limits.
prompt =
"Your previous response was invalid. Error: " + validation_error +
"\nReturn ONLY valid JSON with this exact schema: "
"{\"name\": \"string\", \"description\": \"string\"}."
"\nDo not include markdown, comments, or extra keys."
"\n\n" +
retry_location;
}
// All retry attempts exhausted: log failure and throw exception
spdlog::error(
"LlamaGenerator: malformed brewery response after {} attempts: "
"{}",
max_attempts, last_error.empty() ? raw : last_error);
throw std::runtime_error("LlamaGenerator: malformed brewery response");
}

View File

@@ -0,0 +1,102 @@
/**
* User Profile Generation Module
* Uses the LLM to generate realistic user profiles (username and bio) for craft
* beer enthusiasts. Implements retry logic to handle parsing failures and
* ensures output adheres to strict format constraints (two lines, specific
* character limits).
*/
#include <spdlog/spdlog.h>
#include <algorithm>
#include <stdexcept>
#include <string>
#include "data_generation/llama_generator.h"
#include "data_generation/llama_generator_helpers.h"
UserResult LlamaGenerator::GenerateUser(const std::string& locale) {
/**
* System prompt: specifies exact output format to minimize parsing errors
* Constraints: 2-line output, username format, bio length bounds
*/
const std::string system_prompt =
"You generate plausible social media profiles for craft beer "
"enthusiasts. "
"Respond with exactly two lines: "
"the first line is a username (lowercase, no spaces, 8-20 characters), "
"the second line is a one-sentence bio (20-40 words). "
"The profile should feel consistent with the locale. "
"No preamble, no labels.";
/**
* User prompt: locale parameter guides cultural appropriateness of generated
* profiles
*/
std::string prompt =
"Generate a craft beer enthusiast profile. Locale: " + locale;
/**
* RETRY LOOP with format validation
* Attempts up to 3 times to generate valid user profile with correct format
*/
const int max_attempts = 3;
std::string raw;
for (int attempt = 0; attempt < max_attempts; ++attempt) {
/**
* Generate user profile (max 128 tokens - should fit 2 lines easily)
*/
raw = Infer(system_prompt, prompt, 128);
spdlog::debug("LlamaGenerator (user): raw output (attempt {}): {}",
attempt + 1, raw);
try {
/**
* Parse two-line response: first line = username, second line = bio
*/
auto [username, bio] = ParseTwoLineResponsePublic(
raw, "LlamaGenerator: malformed user response");
/**
* Remove any whitespace from username (usernames shouldn't have
* spaces)
*/
username.erase(
std::remove_if(username.begin(), username.end(),
[](unsigned char ch) { return std::isspace(ch); }),
username.end());
/**
* Validate both fields are non-empty after processing
*/
if (username.empty() || bio.empty()) {
throw std::runtime_error("LlamaGenerator: malformed user response");
}
/**
* Truncate bio if exceeds reasonable length for bio field
*/
if (bio.size() > 200) bio = bio.substr(0, 200);
/**
* Success: return parsed user profile
*/
return {username, bio};
} catch (const std::exception& e) {
/**
* Parsing failed: log and continue to next attempt
*/
spdlog::warn(
"LlamaGenerator: malformed user response (attempt {}): {}",
attempt + 1, e.what());
}
}
/**
* All retry attempts exhausted: log failure and throw exception
*/
spdlog::error(
"LlamaGenerator: malformed user response after {} attempts: {}",
max_attempts, raw);
throw std::runtime_error("LlamaGenerator: malformed user response");
}

View File

@@ -0,0 +1,441 @@
/**
* Helper Functions Module
* Provides utility functions for text processing, parsing, and chat template
* formatting. Functions handle whitespace normalization, response parsing, and
* conversion of prompts to proper chat format using the model's built-in
* template.
*/
#include <algorithm>
#include <array>
#include <boost/json.hpp>
#include <cctype>
#include <sstream>
#include <stdexcept>
#include <string>
#include <vector>
#include "data_generation/llama_generator.h"
#include "llama.h"
namespace {
/**
* String trimming: removes leading and trailing whitespace
*/
std::string Trim(std::string value) {
auto not_space = [](unsigned char ch) { return !std::isspace(ch); };
value.erase(value.begin(),
std::find_if(value.begin(), value.end(), not_space));
value.erase(std::find_if(value.rbegin(), value.rend(), not_space).base(),
value.end());
return value;
}
/**
* Normalize whitespace: collapses multiple spaces/tabs/newlines into single
* spaces
*/
std::string CondenseWhitespace(std::string text) {
std::string out;
out.reserve(text.size());
bool in_whitespace = false;
for (unsigned char ch : text) {
if (std::isspace(ch)) {
if (!in_whitespace) {
out.push_back(' ');
in_whitespace = true;
}
continue;
}
in_whitespace = false;
out.push_back(static_cast<char>(ch));
}
return Trim(std::move(out));
}
/**
* Truncate region context to fit within max length while preserving word
* boundaries
*/
std::string PrepareRegionContext(std::string_view region_context,
std::size_t max_chars) {
std::string normalized = CondenseWhitespace(std::string(region_context));
if (normalized.size() <= max_chars) {
return normalized;
}
normalized.resize(max_chars);
const std::size_t last_space = normalized.find_last_of(' ');
if (last_space != std::string::npos && last_space > max_chars / 2) {
normalized.resize(last_space);
}
normalized += "...";
return normalized;
}
/**
* Remove common bullet points, numbers, and field labels added by LLM in output
*/
std::string StripCommonPrefix(std::string line) {
line = Trim(std::move(line));
if (!line.empty() && (line[0] == '-' || line[0] == '*')) {
line = Trim(line.substr(1));
} else {
std::size_t i = 0;
while (i < line.size() &&
std::isdigit(static_cast<unsigned char>(line[i]))) {
++i;
}
if (i > 0 && i < line.size() && (line[i] == '.' || line[i] == ')')) {
line = Trim(line.substr(i + 1));
}
}
auto strip_label = [&line](const std::string& label) {
if (line.size() >= label.size()) {
bool matches = true;
for (std::size_t i = 0; i < label.size(); ++i) {
if (std::tolower(static_cast<unsigned char>(line[i])) !=
std::tolower(static_cast<unsigned char>(label[i]))) {
matches = false;
break;
}
}
if (matches) {
line = Trim(line.substr(label.size()));
}
}
};
strip_label("name:");
strip_label("brewery name:");
strip_label("description:");
strip_label("username:");
strip_label("bio:");
return Trim(std::move(line));
}
/**
* Parse two-line response from LLM: normalize line endings, strip formatting,
* filter spurious output, and combine remaining lines if needed
*/
std::pair<std::string, std::string> ParseTwoLineResponse(
const std::string& raw, const std::string& error_message) {
std::string normalized = raw;
std::replace(normalized.begin(), normalized.end(), '\r', '\n');
std::vector<std::string> lines;
std::stringstream stream(normalized);
std::string line;
while (std::getline(stream, line)) {
line = StripCommonPrefix(std::move(line));
if (!line.empty()) lines.push_back(std::move(line));
}
std::vector<std::string> filtered;
for (auto& l : lines) {
std::string low = l;
std::transform(low.begin(), low.end(), low.begin(), [](unsigned char c) {
return static_cast<char>(std::tolower(c));
});
// Filter known thinking tags like <think>...</think>, but be conservative
// to avoid removing legitimate output. Only filter specific known
// patterns.
if (!l.empty() && l.front() == '<' && low.back() == '>') {
// Only filter if it's a known thinking tag: <think>, <reasoning>, etc.
if (low.find("think") != std::string::npos ||
low.find("reasoning") != std::string::npos ||
low.find("reflect") != std::string::npos) {
continue;
}
}
if (low.rfind("okay,", 0) == 0 || low.rfind("hmm", 0) == 0) continue;
filtered.push_back(std::move(l));
}
if (filtered.size() < 2) throw std::runtime_error(error_message);
std::string first = Trim(filtered.front());
std::string second;
for (size_t i = 1; i < filtered.size(); ++i) {
if (!second.empty()) second += ' ';
second += filtered[i];
}
second = Trim(std::move(second));
if (first.empty() || second.empty()) throw std::runtime_error(error_message);
return {first, second};
}
/**
* Apply model's chat template to user-only prompt, formatting it for the model
*/
std::string ToChatPrompt(const llama_model* model,
const std::string& user_prompt) {
const char* tmpl = llama_model_chat_template(model, nullptr);
if (tmpl == nullptr) {
return user_prompt;
}
const llama_chat_message message{"user", user_prompt.c_str()};
std::vector<char> buffer(
std::max<std::size_t>(1024, user_prompt.size() * 4));
int32_t required =
llama_chat_apply_template(tmpl, &message, 1, true, buffer.data(),
static_cast<int32_t>(buffer.size()));
if (required < 0) {
throw std::runtime_error("LlamaGenerator: failed to apply chat template");
}
if (required >= static_cast<int32_t>(buffer.size())) {
buffer.resize(static_cast<std::size_t>(required) + 1);
required =
llama_chat_apply_template(tmpl, &message, 1, true, buffer.data(),
static_cast<int32_t>(buffer.size()));
if (required < 0) {
throw std::runtime_error(
"LlamaGenerator: failed to apply chat template");
}
}
return std::string(buffer.data(), static_cast<std::size_t>(required));
}
/**
* Apply model's chat template to system+user prompt pair, formatting for the
* model
*/
std::string ToChatPrompt(const llama_model* model,
const std::string& system_prompt,
const std::string& user_prompt) {
const char* tmpl = llama_model_chat_template(model, nullptr);
if (tmpl == nullptr) {
return system_prompt + "\n\n" + user_prompt;
}
const llama_chat_message messages[2] = {{"system", system_prompt.c_str()},
{"user", user_prompt.c_str()}};
std::vector<char> buffer(std::max<std::size_t>(
1024, (system_prompt.size() + user_prompt.size()) * 4));
int32_t required =
llama_chat_apply_template(tmpl, messages, 2, true, buffer.data(),
static_cast<int32_t>(buffer.size()));
if (required < 0) {
throw std::runtime_error("LlamaGenerator: failed to apply chat template");
}
if (required >= static_cast<int32_t>(buffer.size())) {
buffer.resize(static_cast<std::size_t>(required) + 1);
required =
llama_chat_apply_template(tmpl, messages, 2, true, buffer.data(),
static_cast<int32_t>(buffer.size()));
if (required < 0) {
throw std::runtime_error(
"LlamaGenerator: failed to apply chat template");
}
}
return std::string(buffer.data(), static_cast<std::size_t>(required));
}
void AppendTokenPiece(const llama_vocab* vocab, llama_token token,
std::string& output) {
std::array<char, 256> buffer{};
int32_t bytes =
llama_token_to_piece(vocab, token, buffer.data(),
static_cast<int32_t>(buffer.size()), 0, true);
if (bytes < 0) {
std::vector<char> dynamic_buffer(static_cast<std::size_t>(-bytes));
bytes = llama_token_to_piece(vocab, token, dynamic_buffer.data(),
static_cast<int32_t>(dynamic_buffer.size()),
0, true);
if (bytes < 0) {
throw std::runtime_error(
"LlamaGenerator: failed to decode sampled token piece");
}
output.append(dynamic_buffer.data(), static_cast<std::size_t>(bytes));
return;
}
output.append(buffer.data(), static_cast<std::size_t>(bytes));
}
bool ExtractFirstJsonObject(const std::string& text, std::string& json_out) {
std::size_t start = std::string::npos;
int depth = 0;
bool in_string = false;
bool escaped = false;
for (std::size_t i = 0; i < text.size(); ++i) {
const char ch = text[i];
if (in_string) {
if (escaped) {
escaped = false;
} else if (ch == '\\') {
escaped = true;
} else if (ch == '"') {
in_string = false;
}
continue;
}
if (ch == '"') {
in_string = true;
continue;
}
if (ch == '{') {
if (depth == 0) {
start = i;
}
++depth;
continue;
}
if (ch == '}') {
if (depth == 0) {
continue;
}
--depth;
if (depth == 0 && start != std::string::npos) {
json_out = text.substr(start, i - start + 1);
return true;
}
}
}
return false;
}
std::string ValidateBreweryJson(const std::string& raw, std::string& name_out,
std::string& description_out) {
auto validate_object = [&](const boost::json::value& jv,
std::string& error_out) -> bool {
if (!jv.is_object()) {
error_out = "JSON root must be an object";
return false;
}
const auto& obj = jv.get_object();
if (!obj.contains("name") || !obj.at("name").is_string()) {
error_out = "JSON field 'name' is missing or not a string";
return false;
}
if (!obj.contains("description") || !obj.at("description").is_string()) {
error_out = "JSON field 'description' is missing or not a string";
return false;
}
name_out = Trim(std::string(obj.at("name").as_string().c_str()));
description_out =
Trim(std::string(obj.at("description").as_string().c_str()));
if (name_out.empty()) {
error_out = "JSON field 'name' must not be empty";
return false;
}
if (description_out.empty()) {
error_out = "JSON field 'description' must not be empty";
return false;
}
std::string name_lower = name_out;
std::string description_lower = description_out;
std::transform(
name_lower.begin(), name_lower.end(), name_lower.begin(),
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });
std::transform(description_lower.begin(), description_lower.end(),
description_lower.begin(), [](unsigned char c) {
return static_cast<char>(std::tolower(c));
});
if (name_lower == "string" || description_lower == "string") {
error_out = "JSON appears to be a schema placeholder, not content";
return false;
}
error_out.clear();
return true;
};
boost::system::error_code ec;
boost::json::value jv = boost::json::parse(raw, ec);
std::string validation_error;
if (ec) {
std::string extracted;
if (!ExtractFirstJsonObject(raw, extracted)) {
return "JSON parse error: " + ec.message();
}
ec.clear();
jv = boost::json::parse(extracted, ec);
if (ec) {
return "JSON parse error: " + ec.message();
}
if (!validate_object(jv, validation_error)) {
return validation_error;
}
return {};
}
if (!validate_object(jv, validation_error)) {
return validation_error;
}
return {};
}
} // namespace
// Forward declarations for helper functions exposed to other translation units
std::string PrepareRegionContextPublic(std::string_view region_context,
std::size_t max_chars) {
return PrepareRegionContext(region_context, max_chars);
}
std::pair<std::string, std::string> ParseTwoLineResponsePublic(
const std::string& raw, const std::string& error_message) {
return ParseTwoLineResponse(raw, error_message);
}
std::string ToChatPromptPublic(const llama_model* model,
const std::string& user_prompt) {
return ToChatPrompt(model, user_prompt);
}
std::string ToChatPromptPublic(const llama_model* model,
const std::string& system_prompt,
const std::string& user_prompt) {
return ToChatPrompt(model, system_prompt, user_prompt);
}
void AppendTokenPiecePublic(const llama_vocab* vocab, llama_token token,
std::string& output) {
AppendTokenPiece(vocab, token, output);
}
std::string ValidateBreweryJsonPublic(const std::string& raw,
std::string& name_out,
std::string& description_out) {
return ValidateBreweryJson(raw, name_out, description_out);
}

View File

@@ -0,0 +1,196 @@
/**
* Text Generation / Inference Module
* Core module that performs LLM inference: converts text prompts into tokens,
* runs the neural network forward pass, samples the next token, and converts
* output tokens back to text. Supports both simple and system+user prompts.
*/
#include <spdlog/spdlog.h>
#include <algorithm>
#include <memory>
#include <stdexcept>
#include <string>
#include <vector>
#include "data_generation/llama_generator.h"
#include "data_generation/llama_generator_helpers.h"
#include "llama.h"
std::string LlamaGenerator::Infer(const std::string& prompt, int max_tokens) {
return InferFormatted(ToChatPromptPublic(model_, prompt), max_tokens);
}
std::string LlamaGenerator::Infer(const std::string& system_prompt,
const std::string& prompt, int max_tokens) {
return InferFormatted(ToChatPromptPublic(model_, system_prompt, prompt),
max_tokens);
}
std::string LlamaGenerator::InferFormatted(const std::string& formatted_prompt,
int max_tokens) {
/**
* Validate that model and context are loaded
*/
if (model_ == nullptr || context_ == nullptr)
throw std::runtime_error("LlamaGenerator: model not loaded");
/**
* Get vocabulary for tokenization and token-to-text conversion
*/
const llama_vocab* vocab = llama_model_get_vocab(model_);
if (vocab == nullptr)
throw std::runtime_error("LlamaGenerator: vocab unavailable");
/**
* Clear KV cache to ensure clean inference state (no residual context)
*/
llama_memory_clear(llama_get_memory(context_), true);
/**
* TOKENIZATION PHASE
* Convert text prompt into token IDs (integers) that the model understands
*/
std::vector<llama_token> prompt_tokens(formatted_prompt.size() + 8);
int32_t token_count = llama_tokenize(
vocab, formatted_prompt.c_str(),
static_cast<int32_t>(formatted_prompt.size()), prompt_tokens.data(),
static_cast<int32_t>(prompt_tokens.size()), true, true);
/**
* If buffer too small, negative return indicates required size
*/
if (token_count < 0) {
prompt_tokens.resize(static_cast<std::size_t>(-token_count));
token_count = llama_tokenize(
vocab, formatted_prompt.c_str(),
static_cast<int32_t>(formatted_prompt.size()), prompt_tokens.data(),
static_cast<int32_t>(prompt_tokens.size()), true, true);
}
if (token_count < 0)
throw std::runtime_error("LlamaGenerator: prompt tokenization failed");
/**
* CONTEXT SIZE VALIDATION
* Validate and compute effective token budgets based on context window
* constraints
*/
const int32_t n_ctx = static_cast<int32_t>(llama_n_ctx(context_));
const int32_t n_batch = static_cast<int32_t>(llama_n_batch(context_));
if (n_ctx <= 1 || n_batch <= 0)
throw std::runtime_error("LlamaGenerator: invalid context or batch size");
/**
* Clamp generation limit to available context window, reserve space for
* output
*/
const int32_t effective_max_tokens =
std::max(1, std::min(max_tokens, n_ctx - 1));
/**
* Prompt can use remaining context after reserving space for generation
*/
int32_t prompt_budget = std::min(n_batch, n_ctx - effective_max_tokens);
prompt_budget = std::max<int32_t>(1, prompt_budget);
/**
* Truncate prompt if necessary to fit within constraints
*/
prompt_tokens.resize(static_cast<std::size_t>(token_count));
if (token_count > prompt_budget) {
spdlog::warn(
"LlamaGenerator: prompt too long ({} tokens), truncating to {} "
"tokens to fit n_batch/n_ctx limits",
token_count, prompt_budget);
prompt_tokens.resize(static_cast<std::size_t>(prompt_budget));
token_count = prompt_budget;
}
/**
* PROMPT PROCESSING PHASE
* Create a batch containing all prompt tokens and feed through the model
* This computes internal representations and fills the KV cache
*/
const llama_batch prompt_batch = llama_batch_get_one(
prompt_tokens.data(), static_cast<int32_t>(prompt_tokens.size()));
if (llama_decode(context_, prompt_batch) != 0)
throw std::runtime_error("LlamaGenerator: prompt decode failed");
/**
* SAMPLER CONFIGURATION PHASE
* Set up the probabilistic token selection pipeline (sampler chain)
* Samplers are applied in sequence: temperature -> top-p -> distribution
*/
llama_sampler_chain_params sampler_params =
llama_sampler_chain_default_params();
using SamplerPtr =
std::unique_ptr<llama_sampler, decltype(&llama_sampler_free)>;
SamplerPtr sampler(llama_sampler_chain_init(sampler_params),
&llama_sampler_free);
if (!sampler)
throw std::runtime_error("LlamaGenerator: failed to initialize sampler");
/**
* Temperature: scales logits before softmax (controls randomness)
*/
llama_sampler_chain_add(sampler.get(),
llama_sampler_init_temp(sampling_temperature_));
/**
* Top-P: nucleus sampling - filters to most likely tokens summing to top_p
* probability
*/
llama_sampler_chain_add(sampler.get(),
llama_sampler_init_top_p(sampling_top_p_, 1));
/**
* Distribution sampler: selects actual token using configured seed for
* reproducibility
*/
llama_sampler_chain_add(sampler.get(),
llama_sampler_init_dist(sampling_seed_));
/**
* TOKEN GENERATION LOOP
* Iteratively generate tokens one at a time until max_tokens or
* end-of-sequence
*/
std::vector<llama_token> generated_tokens;
generated_tokens.reserve(static_cast<std::size_t>(effective_max_tokens));
for (int i = 0; i < effective_max_tokens; ++i) {
/**
* Sample next token using configured sampler chain and model logits
* Index -1 means use the last output position from previous batch
*/
const llama_token next =
llama_sampler_sample(sampler.get(), context_, -1);
/**
* Stop if model predicts end-of-generation token (EOS/EOT)
*/
if (llama_vocab_is_eog(vocab, next)) break;
generated_tokens.push_back(next);
/**
* Feed the sampled token back into model for next iteration
* (autoregressive)
*/
llama_token token = next;
const llama_batch one_token_batch = llama_batch_get_one(&token, 1);
if (llama_decode(context_, one_token_batch) != 0)
throw std::runtime_error(
"LlamaGenerator: decode failed during generation");
}
/**
* DETOKENIZATION PHASE
* Convert generated token IDs back to text using vocabulary
*/
std::string output;
for (const llama_token token : generated_tokens)
AppendTokenPiecePublic(vocab, token, output);
/**
* Advance seed for next generation to improve output diversity
*/
sampling_seed_ = (sampling_seed_ == 0xFFFFFFFFu) ? 0 : sampling_seed_ + 1;
return output;
}

View File

@@ -0,0 +1,56 @@
/**
* Model Loading Module
* This module handles loading a pre-trained LLM model from disk and
* initializing the llama.cpp context for inference. It performs one-time setup
* required before any inference operations can be performed.
*/
#include <spdlog/spdlog.h>
#include <stdexcept>
#include <string>
#include "data_generation/llama_generator.h"
#include "llama.h"
void LlamaGenerator::Load(const std::string& model_path) {
/**
* Validate input and clean up any previously loaded model/context
*/
if (model_path.empty())
throw std::runtime_error("LlamaGenerator: model path must not be empty");
if (context_ != nullptr) {
llama_free(context_);
context_ = nullptr;
}
if (model_ != nullptr) {
llama_model_free(model_);
model_ = nullptr;
}
/**
* Initialize the llama backend (one-time setup for GPU/CPU acceleration)
*/
llama_backend_init();
llama_model_params model_params = llama_model_default_params();
model_ = llama_model_load_from_file(model_path.c_str(), model_params);
if (model_ == nullptr) {
throw std::runtime_error(
"LlamaGenerator: failed to load model from path: " + model_path);
}
llama_context_params context_params = llama_context_default_params();
context_params.n_ctx = n_ctx_;
context_params.n_batch = n_ctx_; // Set batch size equal to context window
context_ = llama_init_from_model(model_, context_params);
if (context_ == nullptr) {
llama_model_free(model_);
model_ = nullptr;
throw std::runtime_error("LlamaGenerator: failed to create context");
}
spdlog::info("[LlamaGenerator] Loaded model: {}", model_path);
}

View File

@@ -0,0 +1,74 @@
#include <fstream>
#include <filesystem>
#include <spdlog/spdlog.h>
#include "data_generation/llama_generator.h"
namespace fs = std::filesystem;
std::string LlamaGenerator::LoadBrewerySystemPrompt(
const std::string& prompt_file_path) {
// Return cached version if already loaded
if (!brewery_system_prompt_.empty()) {
return brewery_system_prompt_;
}
// Try multiple path locations
std::vector<std::string> paths_to_try = {
prompt_file_path, // As provided
"../" + prompt_file_path, // One level up
"../../" + prompt_file_path, // Two levels up
};
for (const auto& path : paths_to_try) {
std::ifstream prompt_file(path);
if (prompt_file.is_open()) {
std::string prompt((std::istreambuf_iterator<char>(prompt_file)),
std::istreambuf_iterator<char>());
prompt_file.close();
if (!prompt.empty()) {
spdlog::info(
"LlamaGenerator: Loaded brewery system prompt from '{}' ({} chars)",
path, prompt.length());
brewery_system_prompt_ = prompt;
return brewery_system_prompt_;
}
}
}
spdlog::warn(
"LlamaGenerator: Could not open brewery system prompt file at any of the "
"expected locations. Using fallback inline prompt.");
return GetFallbackBreweryPrompt();
}
// Fallback: minimal inline prompt if file fails to load
std::string LlamaGenerator::GetFallbackBreweryPrompt() {
return "You are an experienced brewmaster and owner of a local craft brewery. "
"Create a distinctive, authentic name and detailed description that "
"genuinely reflects your specific location, brewing philosophy, local "
"culture, and community connection. The brewery must feel real and "
"grounded—not generic or interchangeable.\n\n"
"AVOID REPETITIVE PHRASES - Never use:\n"
"Love letter to, tribute to, rolling hills, picturesque, every sip "
"tells a story, Come for X stay for Y, rich history, passion, woven "
"into, ancient roots, timeless, where tradition meets innovation\n\n"
"OPENING APPROACHES - Choose ONE:\n"
"1. Start with specific beer style and its regional origins\n"
"2. Begin with specific brewing challenge (water, altitude, climate)\n"
"3. Open with founding story or personal motivation\n"
"4. Lead with specific local ingredient or resource\n"
"5. Start with unexpected angle or contradiction\n"
"6. Open with local event, tradition, or cultural moment\n"
"7. Begin with tangible architectural or geographic detail\n\n"
"BE SPECIFIC - Include:\n"
"- At least ONE concrete proper noun (landmark, river, neighborhood)\n"
"- Specific beer styles relevant to the REGION'S culture\n"
"- Concrete brewing challenges or advantages\n"
"- Sensory details SPECIFIC to place—not generic adjectives\n\n"
"LENGTH: 150-250 words. TONE: Can be soulful, irreverent, "
"matter-of-fact, unpretentious, or minimalist.\n\n"
"Output ONLY a raw JSON object with keys name and description. "
"No markdown, backticks, preamble, or trailing text.";
}

View File

@@ -0,0 +1,65 @@
/**
* Sampling Configuration Module
* Configures the hyperparameters that control probabilistic token selection
* during text generation. These settings affect the randomness, diversity, and
* quality of generated output.
*/
#include <stdexcept>
#include "data_generation/llama_generator.h"
#include "llama.h"
void LlamaGenerator::SetSamplingOptions(float temperature, float top_p,
int seed) {
/**
* Validate temperature: controls randomness in output distribution
* 0.0 = deterministic (always pick highest probability token)
* Higher values = more random/diverse output
*/
if (temperature < 0.0f) {
throw std::runtime_error(
"LlamaGenerator: sampling temperature must be >= 0");
}
/**
* Validate top-p (nucleus sampling): only sample from top cumulative
* probability e.g., top-p=0.9 means sample from tokens that make up 90% of
* probability mass
*/
if (!(top_p > 0.0f && top_p <= 1.0f)) {
throw std::runtime_error(
"LlamaGenerator: sampling top-p must be in (0, 1]");
}
/**
* Validate seed: for reproducible results (-1 uses random seed)
*/
if (seed < -1) {
throw std::runtime_error(
"LlamaGenerator: seed must be >= 0, or -1 for random");
}
/**
* Store sampling parameters for use during token generation
*/
sampling_temperature_ = temperature;
sampling_top_p_ = top_p;
sampling_seed_ = (seed < 0) ? static_cast<uint32_t>(LLAMA_DEFAULT_SEED)
: static_cast<uint32_t>(seed);
}
void LlamaGenerator::SetContextSize(uint32_t n_ctx) {
/**
* Validate context size: must be positive and reasonable for the model
*/
if (n_ctx == 0 || n_ctx > 32768) {
throw std::runtime_error(
"LlamaGenerator: context size must be in range [1, 32768]");
}
/**
* Store context size for use during model loading
*/
n_ctx_ = n_ctx;
}

View File

@@ -0,0 +1,65 @@
#include <string>
#include <vector>
#include "data_generation/mock_generator.h"
const std::vector<std::string> MockGenerator::kBreweryAdjectives = {
"Craft", "Heritage", "Local", "Artisan", "Pioneer", "Golden",
"Modern", "Classic", "Summit", "Northern", "Riverstone", "Barrel",
"Hinterland", "Harbor", "Wild", "Granite", "Copper", "Maple"};
const std::vector<std::string> MockGenerator::kBreweryNouns = {
"Brewing Co.", "Brewery", "Bier Haus", "Taproom", "Works",
"House", "Fermentery", "Ale Co.", "Cellars", "Collective",
"Project", "Foundry", "Malthouse", "Public House", "Co-op",
"Lab", "Beer Hall", "Guild"};
const std::vector<std::string> MockGenerator::kBreweryDescriptions = {
"Handcrafted pale ales and seasonal IPAs with local ingredients.",
"Traditional lagers and experimental sours in small batches.",
"Award-winning stouts and wildly hoppy blonde ales.",
"Craft brewery specializing in Belgian-style triples and dark porters.",
"Modern brewery blending tradition with bold experimental flavors.",
"Neighborhood-focused taproom pouring crisp pilsners and citrusy pale "
"ales.",
"Small-batch brewery known for barrel-aged releases and smoky lagers.",
"Independent brewhouse pairing farmhouse ales with rotating food pop-ups.",
"Community brewpub making balanced bitters, saisons, and hazy IPAs.",
"Experimental nanobrewery exploring local yeast and regional grains.",
"Family-run brewery producing smooth amber ales and robust porters.",
"Urban brewery crafting clean lagers and bright, fruit-forward sours.",
"Riverfront brewhouse featuring oak-matured ales and seasonal blends.",
"Modern taproom focused on sessionable lagers and classic pub styles.",
"Brewery rooted in tradition with a lineup of malty reds and crisp lagers.",
"Creative brewery offering rotating collaborations and limited draft-only "
"pours.",
"Locally inspired brewery serving approachable ales with bold hop "
"character.",
"Destination taproom known for balanced IPAs and cocoa-rich stouts."};
const std::vector<std::string> MockGenerator::kUsernames = {
"hopseeker", "malttrail", "yeastwhisper", "lagerlane",
"barrelbound", "foamfinder", "taphunter", "graingeist",
"brewscout", "aleatlas", "caskcompass", "hopsandmaps",
"mashpilot", "pintnomad", "fermentfriend", "stoutsignal",
"sessionwander", "kettlekeeper"};
const std::vector<std::string> MockGenerator::kBios = {
"Always chasing balanced IPAs and crisp lagers across local taprooms.",
"Weekend brewery explorer with a soft spot for dark, roasty stouts.",
"Documenting tiny brewpubs, fresh pours, and unforgettable beer gardens.",
"Fan of farmhouse ales, food pairings, and long tasting flights.",
"Collecting favorite pilsners one city at a time.",
"Hops-first drinker who still saves room for classic malt-forward styles.",
"Finding hidden tap lists and sharing the best seasonal releases.",
"Brewery road-tripper focused on local ingredients and clean fermentation.",
"Always comparing house lagers and ranking patio pint vibes.",
"Curious about yeast strains, barrel programs, and cellar experiments.",
"Believes every neighborhood deserves a great community taproom.",
"Looking for session beers that taste great from first sip to last.",
"Belgian ale enthusiast who never skips a new saison.",
"Hazy IPA critic with deep respect for a perfectly clear pilsner.",
"Visits breweries for the stories, stays for the flagship pours.",
"Craft beer fan mapping tasting notes and favorite brew routes.",
"Always ready to trade recommendations for underrated local breweries.",
"Keeping a running list of must-try collab releases and tap takeovers."};

View File

@@ -0,0 +1,12 @@
#include <string>
#include "data_generation/mock_generator.h"
std::size_t MockGenerator::DeterministicHash(const std::string& a,
const std::string& b) {
std::size_t seed = std::hash<std::string>{}(a);
const std::size_t mixed = std::hash<std::string>{}(b);
seed ^= mixed + 0x9e3779b97f4a7c15ULL + (seed << 6) + (seed >> 2);
seed = (seed << 13) | (seed >> ((sizeof(std::size_t) * 8) - 13));
return seed;
}

View File

@@ -0,0 +1,21 @@
#include <functional>
#include <string>
#include "data_generation/mock_generator.h"
BreweryResult MockGenerator::GenerateBrewery(
const std::string& city_name, const std::string& country_name,
const std::string& region_context) {
const std::string location_key =
country_name.empty() ? city_name : city_name + "," + country_name;
const std::size_t hash =
region_context.empty() ? std::hash<std::string>{}(location_key)
: DeterministicHash(location_key, region_context);
BreweryResult result;
result.name = kBreweryAdjectives[hash % kBreweryAdjectives.size()] + " " +
kBreweryNouns[(hash / 7) % kBreweryNouns.size()];
result.description =
kBreweryDescriptions[(hash / 13) % kBreweryDescriptions.size()];
return result;
}

View File

@@ -0,0 +1,13 @@
#include <functional>
#include <string>
#include "data_generation/mock_generator.h"
UserResult MockGenerator::GenerateUser(const std::string& locale) {
const std::size_t hash = std::hash<std::string>{}(locale);
UserResult result;
result.username = kUsernames[hash % kUsernames.size()];
result.bio = kBios[(hash / 11) % kBios.size()];
return result;
}

View File

@@ -0,0 +1,9 @@
#include <spdlog/spdlog.h>
#include <string>
#include "data_generation/mock_generator.h"
void MockGenerator::Load(const std::string& /*modelPath*/) {
spdlog::info("[MockGenerator] No model needed");
}

View File

@@ -0,0 +1,264 @@
#include "database/database.h"
#include <spdlog/spdlog.h>
#include <stdexcept>
void SqliteDatabase::InitializeSchema() {
std::lock_guard<std::mutex> lock(db_mutex_);
const char* schema = R"(
CREATE TABLE IF NOT EXISTS countries (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
iso2 TEXT,
iso3 TEXT
);
CREATE TABLE IF NOT EXISTS states (
id INTEGER PRIMARY KEY,
country_id INTEGER NOT NULL,
name TEXT NOT NULL,
iso2 TEXT,
FOREIGN KEY(country_id) REFERENCES countries(id)
);
CREATE TABLE IF NOT EXISTS cities (
id INTEGER PRIMARY KEY,
state_id INTEGER NOT NULL,
country_id INTEGER NOT NULL,
name TEXT NOT NULL,
latitude REAL,
longitude REAL,
FOREIGN KEY(state_id) REFERENCES states(id),
FOREIGN KEY(country_id) REFERENCES countries(id)
);
)";
char* errMsg = nullptr;
int rc = sqlite3_exec(db_, schema, nullptr, nullptr, &errMsg);
if (rc != SQLITE_OK) {
std::string error = errMsg ? std::string(errMsg) : "Unknown error";
sqlite3_free(errMsg);
throw std::runtime_error("Failed to create schema: " + error);
}
}
SqliteDatabase::~SqliteDatabase() {
if (db_) {
sqlite3_close(db_);
}
}
void SqliteDatabase::Initialize(const std::string& db_path) {
int rc = sqlite3_open(db_path.c_str(), &db_);
if (rc) {
throw std::runtime_error("Failed to open SQLite database: " + db_path);
}
spdlog::info("OK: SQLite database opened: {}", db_path);
InitializeSchema();
}
void SqliteDatabase::BeginTransaction() {
std::lock_guard<std::mutex> lock(db_mutex_);
char* err = nullptr;
if (sqlite3_exec(db_, "BEGIN TRANSACTION", nullptr, nullptr, &err) !=
SQLITE_OK) {
std::string msg = err ? err : "unknown";
sqlite3_free(err);
throw std::runtime_error("BeginTransaction failed: " + msg);
}
}
void SqliteDatabase::CommitTransaction() {
std::lock_guard<std::mutex> lock(db_mutex_);
char* err = nullptr;
if (sqlite3_exec(db_, "COMMIT", nullptr, nullptr, &err) != SQLITE_OK) {
std::string msg = err ? err : "unknown";
sqlite3_free(err);
throw std::runtime_error("CommitTransaction failed: " + msg);
}
}
void SqliteDatabase::RollbackTransaction() {
std::lock_guard<std::mutex> lock(db_mutex_);
char* err = nullptr;
if (sqlite3_exec(db_, "ROLLBACK", nullptr, nullptr, &err) != SQLITE_OK) {
std::string msg = err ? err : "unknown";
sqlite3_free(err);
throw std::runtime_error("RollbackTransaction failed: " + msg);
}
}
void SqliteDatabase::InsertCountry(int id, const std::string& name,
const std::string& iso2,
const std::string& iso3) {
std::lock_guard<std::mutex> lock(db_mutex_);
const char* query = R"(
INSERT OR IGNORE INTO countries (id, name, iso2, iso3)
VALUES (?, ?, ?, ?)
)";
sqlite3_stmt* stmt;
int rc = sqlite3_prepare_v2(db_, query, -1, &stmt, nullptr);
if (rc != SQLITE_OK)
throw std::runtime_error("Failed to prepare country insert");
sqlite3_bind_int(stmt, 1, id);
sqlite3_bind_text(stmt, 2, name.c_str(), -1, SQLITE_TRANSIENT);
sqlite3_bind_text(stmt, 3, iso2.c_str(), -1, SQLITE_TRANSIENT);
sqlite3_bind_text(stmt, 4, iso3.c_str(), -1, SQLITE_TRANSIENT);
if (sqlite3_step(stmt) != SQLITE_DONE) {
throw std::runtime_error("Failed to insert country");
}
sqlite3_finalize(stmt);
}
void SqliteDatabase::InsertState(int id, int country_id,
const std::string& name,
const std::string& iso2) {
std::lock_guard<std::mutex> lock(db_mutex_);
const char* query = R"(
INSERT OR IGNORE INTO states (id, country_id, name, iso2)
VALUES (?, ?, ?, ?)
)";
sqlite3_stmt* stmt;
int rc = sqlite3_prepare_v2(db_, query, -1, &stmt, nullptr);
if (rc != SQLITE_OK)
throw std::runtime_error("Failed to prepare state insert");
sqlite3_bind_int(stmt, 1, id);
sqlite3_bind_int(stmt, 2, country_id);
sqlite3_bind_text(stmt, 3, name.c_str(), -1, SQLITE_TRANSIENT);
sqlite3_bind_text(stmt, 4, iso2.c_str(), -1, SQLITE_TRANSIENT);
if (sqlite3_step(stmt) != SQLITE_DONE) {
throw std::runtime_error("Failed to insert state");
}
sqlite3_finalize(stmt);
}
void SqliteDatabase::InsertCity(int id, int state_id, int country_id,
const std::string& name, double latitude,
double longitude) {
std::lock_guard<std::mutex> lock(db_mutex_);
const char* query = R"(
INSERT OR IGNORE INTO cities (id, state_id, country_id, name, latitude, longitude)
VALUES (?, ?, ?, ?, ?, ?)
)";
sqlite3_stmt* stmt;
int rc = sqlite3_prepare_v2(db_, query, -1, &stmt, nullptr);
if (rc != SQLITE_OK)
throw std::runtime_error("Failed to prepare city insert");
sqlite3_bind_int(stmt, 1, id);
sqlite3_bind_int(stmt, 2, state_id);
sqlite3_bind_int(stmt, 3, country_id);
sqlite3_bind_text(stmt, 4, name.c_str(), -1, SQLITE_TRANSIENT);
sqlite3_bind_double(stmt, 5, latitude);
sqlite3_bind_double(stmt, 6, longitude);
if (sqlite3_step(stmt) != SQLITE_DONE) {
throw std::runtime_error("Failed to insert city");
}
sqlite3_finalize(stmt);
}
std::vector<City> SqliteDatabase::QueryCities() {
std::lock_guard<std::mutex> lock(db_mutex_);
std::vector<City> cities;
sqlite3_stmt* stmt = nullptr;
const char* query =
"SELECT id, name, country_id FROM cities ORDER BY RANDOM()";
int rc = sqlite3_prepare_v2(db_, query, -1, &stmt, nullptr);
if (rc != SQLITE_OK) {
throw std::runtime_error("Failed to prepare query");
}
while (sqlite3_step(stmt) == SQLITE_ROW) {
int id = sqlite3_column_int(stmt, 0);
const char* name =
reinterpret_cast<const char*>(sqlite3_column_text(stmt, 1));
int country_id = sqlite3_column_int(stmt, 2);
cities.push_back({id, name ? std::string(name) : "", country_id});
}
sqlite3_finalize(stmt);
return cities;
}
std::vector<Country> SqliteDatabase::QueryCountries(int limit) {
std::lock_guard<std::mutex> lock(db_mutex_);
std::vector<Country> countries;
sqlite3_stmt* stmt = nullptr;
std::string query =
"SELECT id, name, iso2, iso3 FROM countries ORDER BY name";
if (limit > 0) {
query += " LIMIT " + std::to_string(limit);
}
int rc = sqlite3_prepare_v2(db_, query.c_str(), -1, &stmt, nullptr);
if (rc != SQLITE_OK) {
throw std::runtime_error("Failed to prepare countries query");
}
while (sqlite3_step(stmt) == SQLITE_ROW) {
int id = sqlite3_column_int(stmt, 0);
const char* name =
reinterpret_cast<const char*>(sqlite3_column_text(stmt, 1));
const char* iso2 =
reinterpret_cast<const char*>(sqlite3_column_text(stmt, 2));
const char* iso3 =
reinterpret_cast<const char*>(sqlite3_column_text(stmt, 3));
countries.push_back({id, name ? std::string(name) : "",
iso2 ? std::string(iso2) : "",
iso3 ? std::string(iso3) : ""});
}
sqlite3_finalize(stmt);
return countries;
}
std::vector<State> SqliteDatabase::QueryStates(int limit) {
std::lock_guard<std::mutex> lock(db_mutex_);
std::vector<State> states;
sqlite3_stmt* stmt = nullptr;
std::string query =
"SELECT id, name, iso2, country_id FROM states ORDER BY name";
if (limit > 0) {
query += " LIMIT " + std::to_string(limit);
}
int rc = sqlite3_prepare_v2(db_, query.c_str(), -1, &stmt, nullptr);
if (rc != SQLITE_OK) {
throw std::runtime_error("Failed to prepare states query");
}
while (sqlite3_step(stmt) == SQLITE_ROW) {
int id = sqlite3_column_int(stmt, 0);
const char* name =
reinterpret_cast<const char*>(sqlite3_column_text(stmt, 1));
const char* iso2 =
reinterpret_cast<const char*>(sqlite3_column_text(stmt, 2));
int country_id = sqlite3_column_int(stmt, 3);
states.push_back({id, name ? std::string(name) : "",
iso2 ? std::string(iso2) : "", country_id});
}
sqlite3_finalize(stmt);
return states;
}

View File

@@ -0,0 +1,67 @@
#include "json_handling/json_loader.h"
#include <spdlog/spdlog.h>
#include <chrono>
#include "json_handling/stream_parser.h"
void JsonLoader::LoadWorldCities(const std::string& json_path,
SqliteDatabase& db) {
constexpr size_t kBatchSize = 10000;
auto startTime = std::chrono::high_resolution_clock::now();
spdlog::info("\nLoading {} (streaming Boost.JSON SAX)...", json_path);
db.BeginTransaction();
bool transactionOpen = true;
size_t citiesProcessed = 0;
try {
StreamingJsonParser::Parse(
json_path, db,
[&](const CityRecord& record) {
db.InsertCity(record.id, record.state_id, record.country_id,
record.name, record.latitude, record.longitude);
++citiesProcessed;
if (citiesProcessed % kBatchSize == 0) {
db.CommitTransaction();
db.BeginTransaction();
}
},
[&](size_t current, size_t /*total*/) {
if (current % kBatchSize == 0 && current > 0) {
spdlog::info(" [Progress] Parsed {} cities...", current);
}
});
spdlog::info(" OK: Parsed all cities from JSON");
if (transactionOpen) {
db.CommitTransaction();
transactionOpen = false;
}
} catch (...) {
if (transactionOpen) {
db.RollbackTransaction();
transactionOpen = false;
}
throw;
}
auto endTime = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(
endTime - startTime);
spdlog::info("\n=== World City Data Loading Summary ===\n");
spdlog::info("Cities inserted: {}", citiesProcessed);
spdlog::info("Elapsed time: {} ms", duration.count());
long long throughput =
(citiesProcessed > 0 && duration.count() > 0)
? (1000LL * static_cast<long long>(citiesProcessed)) /
static_cast<long long>(duration.count())
: 0LL;
spdlog::info("Throughput: {} cities/sec", throughput);
spdlog::info("=======================================\n");
}

View File

@@ -0,0 +1,289 @@
#include "json_handling/stream_parser.h"
#include <spdlog/spdlog.h>
#include <boost/json.hpp>
#include <boost/json/basic_parser_impl.hpp>
#include <cstdio>
#include <stdexcept>
#include "database/database.h"
class CityRecordHandler {
friend class boost::json::basic_parser<CityRecordHandler>;
public:
static constexpr std::size_t max_array_size = static_cast<std::size_t>(-1);
static constexpr std::size_t max_object_size = static_cast<std::size_t>(-1);
static constexpr std::size_t max_string_size = static_cast<std::size_t>(-1);
static constexpr std::size_t max_key_size = static_cast<std::size_t>(-1);
struct ParseContext {
SqliteDatabase* db = nullptr;
std::function<void(const CityRecord&)> on_city;
std::function<void(size_t, size_t)> on_progress;
size_t cities_emitted = 0;
size_t total_file_size = 0;
int countries_inserted = 0;
int states_inserted = 0;
};
explicit CityRecordHandler(ParseContext& ctx) : context(ctx) {}
private:
ParseContext& context;
int depth = 0;
bool in_countries_array = false;
bool in_country_object = false;
bool in_states_array = false;
bool in_state_object = false;
bool in_cities_array = false;
bool building_city = false;
int current_country_id = 0;
int current_state_id = 0;
CityRecord current_city = {};
std::string current_key;
std::string current_key_val;
std::string current_string_val;
std::string country_info[3];
std::string state_info[2];
// Boost.JSON SAX Hooks
bool on_document_begin(boost::system::error_code&) { return true; }
bool on_document_end(boost::system::error_code&) { return true; }
bool on_array_begin(boost::system::error_code&) {
depth++;
if (depth == 1) {
in_countries_array = true;
} else if (depth == 3 && current_key == "states") {
in_states_array = true;
} else if (depth == 5 && current_key == "cities") {
in_cities_array = true;
}
return true;
}
bool on_array_end(std::size_t, boost::system::error_code&) {
if (depth == 1) {
in_countries_array = false;
} else if (depth == 3) {
in_states_array = false;
} else if (depth == 5) {
in_cities_array = false;
}
depth--;
return true;
}
bool on_object_begin(boost::system::error_code&) {
depth++;
if (depth == 2 && in_countries_array) {
in_country_object = true;
current_country_id = 0;
country_info[0].clear();
country_info[1].clear();
country_info[2].clear();
} else if (depth == 4 && in_states_array) {
in_state_object = true;
current_state_id = 0;
state_info[0].clear();
state_info[1].clear();
} else if (depth == 6 && in_cities_array) {
building_city = true;
current_city = {};
}
return true;
}
bool on_object_end(std::size_t, boost::system::error_code&) {
if (depth == 6 && building_city) {
if (current_city.id > 0 && current_state_id > 0 &&
current_country_id > 0) {
current_city.state_id = current_state_id;
current_city.country_id = current_country_id;
try {
context.on_city(current_city);
context.cities_emitted++;
if (context.on_progress && context.cities_emitted % 10000 == 0) {
context.on_progress(context.cities_emitted,
context.total_file_size);
}
} catch (const std::exception& e) {
spdlog::warn("Record parsing failed: {}", e.what());
}
}
building_city = false;
} else if (depth == 4 && in_state_object) {
if (current_state_id > 0 && current_country_id > 0) {
try {
context.db->InsertState(current_state_id, current_country_id,
state_info[0], state_info[1]);
context.states_inserted++;
} catch (const std::exception& e) {
spdlog::warn("Record parsing failed: {}", e.what());
}
}
in_state_object = false;
} else if (depth == 2 && in_country_object) {
if (current_country_id > 0) {
try {
context.db->InsertCountry(current_country_id, country_info[0],
country_info[1], country_info[2]);
context.countries_inserted++;
} catch (const std::exception& e) {
spdlog::warn("Record parsing failed: {}", e.what());
}
}
in_country_object = false;
}
depth--;
return true;
}
bool on_key_part(boost::json::string_view s, std::size_t,
boost::system::error_code&) {
current_key_val.append(s.data(), s.size());
return true;
}
bool on_key(boost::json::string_view s, std::size_t,
boost::system::error_code&) {
current_key_val.append(s.data(), s.size());
current_key = current_key_val;
current_key_val.clear();
return true;
}
bool on_string_part(boost::json::string_view s, std::size_t,
boost::system::error_code&) {
current_string_val.append(s.data(), s.size());
return true;
}
bool on_string(boost::json::string_view s, std::size_t,
boost::system::error_code&) {
current_string_val.append(s.data(), s.size());
if (building_city && current_key == "name") {
current_city.name = current_string_val;
} else if (in_state_object && current_key == "name") {
state_info[0] = current_string_val;
} else if (in_state_object && current_key == "iso2") {
state_info[1] = current_string_val;
} else if (in_country_object && current_key == "name") {
country_info[0] = current_string_val;
} else if (in_country_object && current_key == "iso2") {
country_info[1] = current_string_val;
} else if (in_country_object && current_key == "iso3") {
country_info[2] = current_string_val;
}
current_string_val.clear();
return true;
}
bool on_number_part(boost::json::string_view, boost::system::error_code&) {
return true;
}
bool on_int64(int64_t i, boost::json::string_view,
boost::system::error_code&) {
if (building_city && current_key == "id") {
current_city.id = static_cast<int>(i);
} else if (in_state_object && current_key == "id") {
current_state_id = static_cast<int>(i);
} else if (in_country_object && current_key == "id") {
current_country_id = static_cast<int>(i);
}
return true;
}
bool on_uint64(uint64_t u, boost::json::string_view,
boost::system::error_code& ec) {
return on_int64(static_cast<int64_t>(u), "", ec);
}
bool on_double(double d, boost::json::string_view,
boost::system::error_code&) {
if (building_city) {
if (current_key == "latitude") {
current_city.latitude = d;
} else if (current_key == "longitude") {
current_city.longitude = d;
}
}
return true;
}
bool on_bool(bool, boost::system::error_code&) { return true; }
bool on_null(boost::system::error_code&) { return true; }
bool on_comment_part(boost::json::string_view, boost::system::error_code&) {
return true;
}
bool on_comment(boost::json::string_view, boost::system::error_code&) {
return true;
}
};
void StreamingJsonParser::Parse(
const std::string& file_path, SqliteDatabase& db,
std::function<void(const CityRecord&)> on_city,
std::function<void(size_t, size_t)> on_progress) {
spdlog::info(" Streaming parse of {} (Boost.JSON)...", file_path);
FILE* file = std::fopen(file_path.c_str(), "rb");
if (!file) {
throw std::runtime_error("Failed to open JSON file: " + file_path);
}
size_t total_size = 0;
if (std::fseek(file, 0, SEEK_END) == 0) {
long file_size = std::ftell(file);
if (file_size > 0) {
total_size = static_cast<size_t>(file_size);
}
std::rewind(file);
}
CityRecordHandler::ParseContext ctx{&db, on_city, on_progress, 0, total_size,
0, 0};
boost::json::basic_parser<CityRecordHandler> parser(
boost::json::parse_options{}, ctx);
char buf[65536];
size_t bytes_read;
boost::system::error_code ec;
while ((bytes_read = std::fread(buf, 1, sizeof(buf), file)) > 0) {
char const* p = buf;
std::size_t remain = bytes_read;
while (remain > 0) {
std::size_t consumed = parser.write_some(true, p, remain, ec);
if (ec) {
std::fclose(file);
throw std::runtime_error("JSON parse error: " + ec.message());
}
p += consumed;
remain -= consumed;
}
}
parser.write_some(false, nullptr, 0, ec); // Signal EOF
std::fclose(file);
if (ec) {
throw std::runtime_error("JSON parse error at EOF: " + ec.message());
}
spdlog::info(" OK: Parsed {} countries, {} states, {} cities",
ctx.countries_inserted, ctx.states_inserted,
ctx.cities_emitted);
}

134
pipeline/src/main.cpp Normal file
View File

@@ -0,0 +1,134 @@
#include <spdlog/spdlog.h>
#include <boost/program_options.hpp>
#include <iostream>
#include <memory>
#include "biergarten_data_generator.h"
#include "database/database.h"
#include "web_client/curl_web_client.h"
namespace po = boost::program_options;
/**
* @brief Parse command-line arguments into ApplicationOptions.
*
* @param argc Command-line argument count.
* @param argv Command-line arguments.
* @param options Output ApplicationOptions struct.
* @return true if parsing succeeded and should proceed, false otherwise.
*/
bool ParseArguments(int argc, char** argv, ApplicationOptions& options) {
// If no arguments provided, display usage and exit
if (argc == 1) {
std::cout << "Biergarten Pipeline - Geographic Data Pipeline with "
"Brewery Generation\n\n";
std::cout << "Usage: biergarten-pipeline [options]\n\n";
std::cout << "Options:\n";
std::cout << " --mocked Use mocked generator for "
"brewery/user data\n";
std::cout << " --model, -m PATH Path to LLM model file (gguf) for "
"generation\n";
std::cout << " --cache-dir, -c DIR Directory for cached JSON (default: "
"/tmp)\n";
std::cout << " --temperature TEMP LLM sampling temperature 0.0-1.0 "
"(default: 0.8)\n";
std::cout << " --top-p VALUE Nucleus sampling parameter 0.0-1.0 "
"(default: 0.92)\n";
std::cout << " --n-ctx SIZE Context window size in tokens "
"(default: 4096)\n";
std::cout << " --seed SEED Random seed: -1 for random "
"(default: -1)\n";
std::cout << " --help, -h Show this help message\n\n";
std::cout << "Note: --mocked and --model are mutually exclusive. Exactly "
"one must be provided.\n";
std::cout << "Data source is always pinned to commit c5eb7772 (stable "
"2026-03-28).\n";
return false;
}
po::options_description desc("Pipeline Options");
desc.add_options()("help,h", "Produce help message")(
"mocked", po::bool_switch(),
"Use mocked generator for brewery/user data")(
"model,m", po::value<std::string>()->default_value(""),
"Path to LLM model (gguf)")(
"cache-dir,c", po::value<std::string>()->default_value("/tmp"),
"Directory for cached JSON")(
"temperature", po::value<float>()->default_value(0.8f),
"Sampling temperature (higher = more random)")(
"top-p", po::value<float>()->default_value(0.92f),
"Nucleus sampling top-p in (0,1] (higher = more random)")(
"n-ctx", po::value<uint32_t>()->default_value(8192),
"Context window size in tokens (1-32768)")(
"seed", po::value<int>()->default_value(-1),
"Sampler seed: -1 for random, otherwise non-negative integer");
po::variables_map vm;
po::store(po::parse_command_line(argc, argv, desc), vm);
po::notify(vm);
if (vm.count("help")) {
std::cout << desc << "\n";
return false;
}
// Check for mutually exclusive --mocked and --model flags
bool use_mocked = vm["mocked"].as<bool>();
std::string model_path = vm["model"].as<std::string>();
if (use_mocked && !model_path.empty()) {
spdlog::error("ERROR: --mocked and --model are mutually exclusive");
return false;
}
if (!use_mocked && model_path.empty()) {
spdlog::error("ERROR: Either --mocked or --model must be specified");
return false;
}
// Warn if sampling parameters are provided with --mocked
if (use_mocked) {
bool hasTemperature = vm["temperature"].defaulted() == false;
bool hasTopP = vm["top-p"].defaulted() == false;
bool hasSeed = vm["seed"].defaulted() == false;
if (hasTemperature || hasTopP || hasSeed) {
spdlog::warn(
"WARNING: Sampling parameters (--temperature, --top-p, --seed) "
"are ignored when using --mocked");
}
}
options.use_mocked = use_mocked;
options.model_path = model_path;
options.cache_dir = vm["cache-dir"].as<std::string>();
options.temperature = vm["temperature"].as<float>();
options.top_p = vm["top-p"].as<float>();
options.n_ctx = vm["n-ctx"].as<uint32_t>();
options.seed = vm["seed"].as<int>();
// commit is always pinned to c5eb7772
return true;
}
int main(int argc, char* argv[]) {
try {
const CurlGlobalState curl_state;
ApplicationOptions options;
if (!ParseArguments(argc, argv, options)) {
return 0;
}
auto webClient = std::make_shared<CURLWebClient>();
SqliteDatabase database;
BiergartenDataGenerator generator(options, webClient, database);
return generator.Run();
} catch (const std::exception& e) {
spdlog::error("ERROR: Application failed: {}", e.what());
return 1;
}
}

View File

@@ -0,0 +1,141 @@
#include "web_client/curl_web_client.h"
#include <curl/curl.h>
#include <cstdio>
#include <fstream>
#include <memory>
#include <sstream>
#include <stdexcept>
CurlGlobalState::CurlGlobalState() {
if (curl_global_init(CURL_GLOBAL_DEFAULT) != CURLE_OK) {
throw std::runtime_error(
"[CURLWebClient] Failed to initialize libcurl globally");
}
}
CurlGlobalState::~CurlGlobalState() { curl_global_cleanup(); }
namespace {
// curl write callback that appends response data into a std::string
size_t WriteCallbackString(void* contents, size_t size, size_t nmemb,
void* userp) {
size_t realsize = size * nmemb;
auto* s = static_cast<std::string*>(userp);
s->append(static_cast<char*>(contents), realsize);
return realsize;
}
// curl write callback that writes to a file stream
size_t WriteCallbackFile(void* contents, size_t size, size_t nmemb,
void* userp) {
size_t realsize = size * nmemb;
auto* outFile = static_cast<std::ofstream*>(userp);
outFile->write(static_cast<char*>(contents), realsize);
return realsize;
}
// RAII wrapper for CURL handle using unique_ptr
using CurlHandle = std::unique_ptr<CURL, decltype(&curl_easy_cleanup)>;
CurlHandle create_handle() {
CURL* handle = curl_easy_init();
if (!handle) {
throw std::runtime_error(
"[CURLWebClient] Failed to initialize libcurl handle");
}
return CurlHandle(handle, &curl_easy_cleanup);
}
void set_common_get_options(CURL* curl, const std::string& url,
long connect_timeout, long total_timeout) {
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_USERAGENT, "biergarten-pipeline/0.1.0");
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
curl_easy_setopt(curl, CURLOPT_MAXREDIRS, 5L);
curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT, connect_timeout);
curl_easy_setopt(curl, CURLOPT_TIMEOUT, total_timeout);
curl_easy_setopt(curl, CURLOPT_ACCEPT_ENCODING, "gzip");
}
} // namespace
CURLWebClient::CURLWebClient() {}
CURLWebClient::~CURLWebClient() {}
void CURLWebClient::DownloadToFile(const std::string& url,
const std::string& file_path) {
auto curl = create_handle();
std::ofstream outFile(file_path, std::ios::binary);
if (!outFile.is_open()) {
throw std::runtime_error(
"[CURLWebClient] Cannot open file for writing: " + file_path);
}
set_common_get_options(curl.get(), url, 30L, 300L);
curl_easy_setopt(curl.get(), CURLOPT_WRITEFUNCTION, WriteCallbackFile);
curl_easy_setopt(curl.get(), CURLOPT_WRITEDATA,
static_cast<void*>(&outFile));
CURLcode res = curl_easy_perform(curl.get());
outFile.close();
if (res != CURLE_OK) {
std::remove(file_path.c_str());
std::string error = std::string("[CURLWebClient] Download failed: ") +
curl_easy_strerror(res);
throw std::runtime_error(error);
}
long httpCode = 0;
curl_easy_getinfo(curl.get(), CURLINFO_RESPONSE_CODE, &httpCode);
if (httpCode != 200) {
std::remove(file_path.c_str());
std::stringstream ss;
ss << "[CURLWebClient] HTTP error " << httpCode << " for URL " << url;
throw std::runtime_error(ss.str());
}
}
std::string CURLWebClient::Get(const std::string& url) {
auto curl = create_handle();
std::string response_string;
set_common_get_options(curl.get(), url, 10L, 20L);
curl_easy_setopt(curl.get(), CURLOPT_WRITEFUNCTION, WriteCallbackString);
curl_easy_setopt(curl.get(), CURLOPT_WRITEDATA, &response_string);
CURLcode res = curl_easy_perform(curl.get());
if (res != CURLE_OK) {
std::string error =
std::string("[CURLWebClient] GET failed: ") + curl_easy_strerror(res);
throw std::runtime_error(error);
}
long httpCode = 0;
curl_easy_getinfo(curl.get(), CURLINFO_RESPONSE_CODE, &httpCode);
if (httpCode != 200) {
std::stringstream ss;
ss << "[CURLWebClient] HTTP error " << httpCode << " for URL " << url;
throw std::runtime_error(ss.str());
}
return response_string;
}
std::string CURLWebClient::UrlEncode(const std::string& value) {
// A NULL handle is fine for UTF-8 encoding according to libcurl docs.
char* output = curl_easy_escape(nullptr, value.c_str(), 0);
if (output) {
std::string result(output);
curl_free(output);
return result;
}
throw std::runtime_error("[CURLWebClient] curl_easy_escape failed");
}

View File

@@ -0,0 +1,89 @@
#include "wikipedia/wikipedia_service.h"
#include <spdlog/spdlog.h>
#include <boost/json.hpp>
WikipediaService::WikipediaService(std::shared_ptr<WebClient> client)
: client_(std::move(client)) {}
std::string WikipediaService::FetchExtract(std::string_view query) {
const std::string encoded = client_->UrlEncode(std::string(query));
const std::string url =
"https://en.wikipedia.org/w/api.php?action=query&titles=" + encoded +
"&prop=extracts&explaintext=1&format=json";
const std::string body = client_->Get(url);
boost::system::error_code ec;
boost::json::value doc = boost::json::parse(body, ec);
if (!ec && doc.is_object()) {
try {
auto& pages = doc.at("query").at("pages").get_object();
if (!pages.empty()) {
auto& page = pages.begin()->value().get_object();
if (page.contains("extract") && page.at("extract").is_string()) {
std::string extract(page.at("extract").as_string().c_str());
spdlog::debug("WikipediaService fetched {} chars for '{}'",
extract.size(), query);
return extract;
}
}
} catch (const std::exception& e) {
spdlog::warn(
"WikipediaService: failed to parse response structure for '{}': "
"{}",
query, e.what());
return {};
}
} else if (ec) {
spdlog::warn("WikipediaService: JSON parse error for '{}': {}", query,
ec.message());
}
return {};
}
std::string WikipediaService::GetSummary(std::string_view city,
std::string_view country) {
const std::string key = std::string(city) + "|" + std::string(country);
const auto cacheIt = cache_.find(key);
if (cacheIt != cache_.end()) {
return cacheIt->second;
}
std::string result;
if (!client_) {
cache_.emplace(key, result);
return result;
}
std::string regionQuery(city);
if (!country.empty()) {
regionQuery += ", ";
regionQuery += country;
}
const std::string beerQuery = "beer in " + std::string(country);
try {
const std::string regionExtract = FetchExtract(regionQuery);
const std::string beerExtract = FetchExtract(beerQuery);
if (!regionExtract.empty()) {
result += regionExtract;
}
if (!beerExtract.empty()) {
if (!result.empty()) result += "\n\n";
result += beerExtract;
}
} catch (const std::runtime_error& e) {
spdlog::debug("WikipediaService lookup failed for '{}': {}", regionQuery,
e.what());
}
cache_.emplace(key, result);
return result;
}

View File

@@ -1,5 +1,7 @@
using System.Security.Claims;
using System.Text.Encodings.Web;
using System.Text.Json;
using API.Core.Contracts.Common;
using Infrastructure.Jwt;
using Microsoft.AspNetCore.Authentication;
using Microsoft.Extensions.Options;
@@ -16,12 +18,17 @@ public class JwtAuthenticationHandler(
{
protected override async Task<AuthenticateResult> HandleAuthenticateAsync()
{
// Get the JWT secret from configuration
var secret =
configuration["Jwt:SecretKey"]
?? throw new InvalidOperationException(
"JWT SecretKey is not configured"
);
// Use the same access-token secret source as TokenService to avoid mismatched validation.
var secret = Environment.GetEnvironmentVariable("ACCESS_TOKEN_SECRET");
if (string.IsNullOrWhiteSpace(secret))
{
secret = configuration["Jwt:SecretKey"];
}
if (string.IsNullOrWhiteSpace(secret))
{
return AuthenticateResult.Fail("JWT secret is not configured");
}
// Check if Authorization header exists
if (
@@ -65,6 +72,15 @@ public class JwtAuthenticationHandler(
);
}
}
protected override async Task HandleChallengeAsync(AuthenticationProperties properties)
{
Response.ContentType = "application/json";
Response.StatusCode = 401;
var response = new ResponseBody { Message = "Unauthorized: Invalid or missing authentication token" };
await Response.WriteAsJsonAsync(response);
}
}
public class JwtAuthenticationOptions : AuthenticationSchemeOptions { }

View File

@@ -1,6 +1,7 @@
using API.Core.Contracts.Auth;
using API.Core.Contracts.Common;
using Domain.Entities;
using Microsoft.AspNetCore.Authorization;
using Microsoft.AspNetCore.Mvc;
using Service.Auth;
@@ -8,6 +9,7 @@ namespace API.Core.Controllers
{
[ApiController]
[Route("api/[controller]")]
[Authorize(AuthenticationSchemes = "JWT")]
public class AuthController(
IRegisterService registerService,
ILoginService loginService,
@@ -15,6 +17,7 @@ namespace API.Core.Controllers
ITokenService tokenService
) : ControllerBase
{
[AllowAnonymous]
[HttpPost("register")]
public async Task<ActionResult<UserAccount>> Register(
[FromBody] RegisterRequest req
@@ -47,6 +50,7 @@ namespace API.Core.Controllers
return Created("/", response);
}
[AllowAnonymous]
[HttpPost("login")]
public async Task<ActionResult> Login([FromBody] LoginRequest req)
{
@@ -82,6 +86,7 @@ namespace API.Core.Controllers
);
}
[AllowAnonymous]
[HttpPost("refresh")]
public async Task<ActionResult> Refresh(
[FromBody] RefreshTokenRequest req

View File

@@ -19,7 +19,7 @@ Feature: Protected Endpoint Access Token Validation
Given the API is running
When I submit a request to a protected endpoint with an invalid access token
Then the response has HTTP status 401
And the response JSON should have "message" containing "Invalid"
And the response JSON should have "message" containing "Unauthorized"
Scenario: Protected endpoint rejects expired access token
Given the API is running
@@ -27,14 +27,14 @@ Feature: Protected Endpoint Access Token Validation
And I am logged in with an immediately-expiring access token
When I submit a request to a protected endpoint with the expired token
Then the response has HTTP status 401
And the response JSON should have "message" containing "expired"
And the response JSON should have "message" containing "Unauthorized"
Scenario: Protected endpoint rejects token signed with wrong secret
Given the API is running
And I have an access token signed with the wrong secret
When I submit a request to a protected endpoint with the tampered token
Then the response has HTTP status 401
And the response JSON should have "message" containing "Invalid"
And the response JSON should have "message" containing "Unauthorized"
Scenario: Protected endpoint rejects refresh token as access token
Given the API is running

View File

@@ -2,47 +2,58 @@ Feature: User Account Confirmation
As a newly registered user
I want to confirm my email address via a validation token
So that my account is fully activated
Scenario: Successful confirmation with valid token
Given the API is running
And I have registered a new account
And I have a valid confirmation token for my account
And I have a valid access token for my account
When I submit a confirmation request with the valid token
Then the response has HTTP status 200
And the response JSON should have "message" containing "confirmed"
And the response JSON should have "message" containing "is confirmed"
Scenario: Re-confirming an already verified account remains successful
Given the API is running
And I have registered a new account
And I have a valid confirmation token for my account
And I have a valid access token for my account
When I submit a confirmation request with the valid token
And I submit the same confirmation request again
Then the response has HTTP status 200
And the response JSON should have "message" containing "is confirmed"
@Ignore
Scenario: Confirmation fails with invalid token
Given the API is running
And I have registered a new account
And I have a valid access token for my account
When I submit a confirmation request with an invalid token
Then the response has HTTP status 401
And the response JSON should have "message" containing "Invalid"
And the response JSON should have "message" containing "Invalid token"
@Ignore
Scenario: Confirmation fails with expired token
Given the API is running
And I have registered a new account
And I have an expired confirmation token for my account
And I have a valid access token for my account
When I submit a confirmation request with the expired token
Then the response has HTTP status 401
And the response JSON should have "message" containing "expired"
And the response JSON should have "message" containing "Invalid token"
@Ignore
Scenario: Confirmation fails with tampered token (wrong secret)
Given the API is running
And I have registered a new account
And I have a confirmation token signed with the wrong secret
And I have a valid access token for my account
When I submit a confirmation request with the tampered token
Then the response has HTTP status 401
And the response JSON should have "message" containing "Invalid"
And the response JSON should have "message" containing "Invalid token"
@Ignore
Scenario: Confirmation fails when token is missing
Given the API is running
And I have registered a new account
And I have a valid access token for my account
When I submit a confirmation request with a missing token
Then the response has HTTP status 400
@Ignore
Scenario: Confirmation endpoint only accepts POST requests
Given the API is running
And I have a valid confirmation token
@@ -51,6 +62,15 @@ Feature: User Account Confirmation
Scenario: Confirmation fails with malformed token
Given the API is running
And I have registered a new account
And I have a valid access token for my account
When I submit a confirmation request with a malformed token
Then the response has HTTP status 401
And the response JSON should have "message" containing "Invalid"
And the response JSON should have "message" containing "Invalid token"
Scenario: Confirmation fails without an access token
Given the API is running
And I have registered a new account
And I have a valid confirmation token for my account
When I submit a confirmation request with the valid token without an access token
Then the response has HTTP status 401

View File

@@ -0,0 +1,36 @@
Feature: Resend Confirmation Email
As a user who did not receive the confirmation email
I want to request a resend of the confirmation email
So that I can obtain a working confirmation link while preventing abuse
Scenario: Legitimate resend for an unconfirmed user
Given the API is running
And I have registered a new account
And I have a valid access token for my account
When I submit a resend confirmation request for my account
Then the response has HTTP status 200
And the response JSON should have "message" containing "confirmation email has been resent"
Scenario: Resend is a no-op for an already confirmed user
Given the API is running
And I have registered a new account
And I have a valid confirmation token for my account
And I have a valid access token for my account
And I have confirmed my account
When I submit a resend confirmation request for my account
Then the response has HTTP status 200
And the response JSON should have "message" containing "confirmation email has been resent"
Scenario: Resend is a no-op for a non-existent user
Given the API is running
And I have registered a new account
And I have a valid access token for my account
When I submit a resend confirmation request for a non-existent user
Then the response has HTTP status 200
And the response JSON should have "message" containing "confirmation email has been resent"
Scenario: Resend requires authentication
Given the API is running
And I have registered a new account
When I submit a resend confirmation request without an access token
Then the response has HTTP status 401

View File

@@ -3,7 +3,6 @@ Feature: Token Refresh
I want to refresh my access token using my refresh token
So that I can maintain my session without logging in again
@Ignore
Scenario: Successful token refresh with valid refresh token
Given the API is running
And I have an existing account
@@ -14,7 +13,6 @@ Feature: Token Refresh
And the response JSON should have a new access token
And the response JSON should have a new refresh token
@Ignore
Scenario: Token refresh fails with invalid refresh token
Given the API is running
When I submit a refresh token request with an invalid refresh token
@@ -27,7 +25,7 @@ Feature: Token Refresh
And I am logged in with an immediately-expiring refresh token
When I submit a refresh token request with the expired refresh token
Then the response has HTTP status 401
And the response JSON should have "message" containing "expired"
And the response JSON should have "message" containing "Invalid token"
Scenario: Token refresh fails when refresh token is missing
Given the API is running

View File

@@ -7,6 +7,8 @@ public class MockEmailService : IEmailService
{
public List<RegistrationEmail> SentRegistrationEmails { get; } = new();
public List<ResendConfirmationEmail> SentResendConfirmationEmails { get; } = new();
public Task SendRegistrationEmailAsync(
UserAccount createdUser,
string confirmationToken
@@ -24,9 +26,27 @@ public class MockEmailService : IEmailService
return Task.CompletedTask;
}
public Task SendResendConfirmationEmailAsync(
UserAccount user,
string confirmationToken
)
{
SentResendConfirmationEmails.Add(
new ResendConfirmationEmail
{
UserAccount = user,
ConfirmationToken = confirmationToken,
SentAt = DateTime.UtcNow,
}
);
return Task.CompletedTask;
}
public void Clear()
{
SentRegistrationEmails.Clear();
SentResendConfirmationEmails.Clear();
}
public class RegistrationEmail
@@ -35,4 +55,11 @@ public class MockEmailService : IEmailService
public string ConfirmationToken { get; init; } = string.Empty;
public DateTime SentAt { get; init; }
}
public class ResendConfirmationEmail
{
public UserAccount UserAccount { get; init; } = null!;
public string ConfirmationToken { get; init; } = string.Empty;
public DateTime SentAt { get; init; }
}
}

View File

@@ -1,6 +1,7 @@
using System.Text.Json;
using API.Specs;
using FluentAssertions;
using Infrastructure.Jwt;
using Reqnroll;
namespace API.Specs.Steps;
@@ -13,6 +14,10 @@ public class AuthSteps(ScenarioContext scenario)
private const string ResponseKey = "response";
private const string ResponseBodyKey = "responseBody";
private const string TestUserKey = "testUser";
private const string RegisteredUserIdKey = "registeredUserId";
private const string RegisteredUsernameKey = "registeredUsername";
private const string PreviousAccessTokenKey = "previousAccessToken";
private const string PreviousRefreshTokenKey = "previousRefreshToken";
private HttpClient GetClient()
{
@@ -34,6 +39,66 @@ public class AuthSteps(ScenarioContext scenario)
return client;
}
private static string GetRequiredEnvVar(string name)
{
return Environment.GetEnvironmentVariable(name)
?? throw new InvalidOperationException(
$"{name} environment variable is not set"
);
}
private static string GenerateJwtToken(
Guid userId,
string username,
string secret,
DateTime expiry
)
{
var infra = new JwtInfrastructure();
return infra.GenerateJwt(userId, username, expiry, secret);
}
private static Guid ParseRegisteredUserId(JsonElement root)
{
return root
.GetProperty("payload")
.GetProperty("userAccountId")
.GetGuid();
}
private static string ParseRegisteredUsername(JsonElement root)
{
return root
.GetProperty("payload")
.GetProperty("username")
.GetString()
?? throw new InvalidOperationException(
"username missing from registration payload"
);
}
private static string ParseTokenFromPayload(
JsonElement payload,
string camelCaseName,
string pascalCaseName
)
{
if (
payload.TryGetProperty(camelCaseName, out var tokenElem)
|| payload.TryGetProperty(pascalCaseName, out tokenElem)
)
{
return tokenElem.GetString()
?? throw new InvalidOperationException(
$"{camelCaseName} is null"
);
}
throw new InvalidOperationException(
$"Could not find token field '{camelCaseName}' in payload"
);
}
[Given("I have an existing account")]
public void GivenIHaveAnExistingAccount()
{
@@ -229,6 +294,18 @@ public class AuthSteps(ScenarioContext scenario)
dateOfBirth = DateTime.UtcNow.AddYears(-18).ToString("yyyy-MM-dd");
}
// Keep default registration fixture values unique across repeated runs.
if (email == "newuser@example.com")
{
var suffix = Guid.NewGuid().ToString("N")[..8];
email = $"newuser-{suffix}@example.com";
if (username == "newuser")
{
username = $"newuser-{suffix}";
}
}
var password = row["Password"];
var registrationData = new
@@ -289,12 +366,13 @@ public class AuthSteps(ScenarioContext scenario)
public async Task GivenIHaveRegisteredANewAccount()
{
var client = GetClient();
var suffix = Guid.NewGuid().ToString("N")[..8];
var registrationData = new
{
username = "newuser",
username = $"newuser-{suffix}",
firstName = "New",
lastName = "User",
email = "newuser@example.com",
email = $"newuser-{suffix}@example.com",
dateOfBirth = "1990-01-01",
password = "Password1!",
};
@@ -316,6 +394,11 @@ public class AuthSteps(ScenarioContext scenario)
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
using var doc = JsonDocument.Parse(responseBody);
var root = doc.RootElement;
scenario[RegisteredUserIdKey] = ParseRegisteredUserId(root);
scenario[RegisteredUsernameKey] = ParseRegisteredUsername(root);
}
[Given("I am logged in")]
@@ -374,11 +457,109 @@ public class AuthSteps(ScenarioContext scenario)
await GivenIAmLoggedIn();
}
[Given("I have a valid access token for my account")]
public void GivenIHaveAValidAccessTokenForMyAccount()
{
var userId = scenario.TryGetValue<Guid>(RegisteredUserIdKey, out var id)
? id
: throw new InvalidOperationException(
"registered user ID not found in scenario"
);
var username = scenario.TryGetValue<string>(
RegisteredUsernameKey,
out var user
)
? user
: throw new InvalidOperationException(
"registered username not found in scenario"
);
var secret = GetRequiredEnvVar("ACCESS_TOKEN_SECRET");
scenario["accessToken"] = GenerateJwtToken(
userId,
username,
secret,
DateTime.UtcNow.AddMinutes(60)
);
}
[Given("I have a valid confirmation token for my account")]
public void GivenIHaveAValidConfirmationTokenForMyAccount()
{
// Store a valid confirmation token - in real scenario this would be generated
scenario["confirmationToken"] = "valid-confirmation-token";
var userId = scenario.TryGetValue<Guid>(RegisteredUserIdKey, out var id)
? id
: throw new InvalidOperationException(
"registered user ID not found in scenario"
);
var username = scenario.TryGetValue<string>(
RegisteredUsernameKey,
out var user
)
? user
: throw new InvalidOperationException(
"registered username not found in scenario"
);
var secret = GetRequiredEnvVar("CONFIRMATION_TOKEN_SECRET");
scenario["confirmationToken"] = GenerateJwtToken(
userId,
username,
secret,
DateTime.UtcNow.AddMinutes(5)
);
}
[Given("I have an expired confirmation token for my account")]
public void GivenIHaveAnExpiredConfirmationTokenForMyAccount()
{
var userId = scenario.TryGetValue<Guid>(RegisteredUserIdKey, out var id)
? id
: throw new InvalidOperationException(
"registered user ID not found in scenario"
);
var username = scenario.TryGetValue<string>(
RegisteredUsernameKey,
out var user
)
? user
: throw new InvalidOperationException(
"registered username not found in scenario"
);
var secret = GetRequiredEnvVar("CONFIRMATION_TOKEN_SECRET");
scenario["confirmationToken"] = GenerateJwtToken(
userId,
username,
secret,
DateTime.UtcNow.AddMinutes(-5)
);
}
[Given("I have a confirmation token signed with the wrong secret")]
public void GivenIHaveAConfirmationTokenSignedWithTheWrongSecret()
{
var userId = scenario.TryGetValue<Guid>(RegisteredUserIdKey, out var id)
? id
: throw new InvalidOperationException(
"registered user ID not found in scenario"
);
var username = scenario.TryGetValue<string>(
RegisteredUsernameKey,
out var user
)
? user
: throw new InvalidOperationException(
"registered username not found in scenario"
);
const string wrongSecret =
"wrong-confirmation-secret-that-is-very-long-1234567890";
scenario["confirmationToken"] = GenerateJwtToken(
userId,
username,
wrongSecret,
DateTime.UtcNow.AddMinutes(5)
);
}
[When(
@@ -400,7 +581,9 @@ public class AuthSteps(ScenarioContext scenario)
};
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When(
@@ -418,7 +601,9 @@ public class AuthSteps(ScenarioContext scenario)
};
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When("I submit a confirmation request with the valid token")]
@@ -428,19 +613,40 @@ public class AuthSteps(ScenarioContext scenario)
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
? t
: "valid-token";
var body = JsonSerializer.Serialize(new { token });
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
? at
: string.Empty;
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
"/api/auth/confirm"
)
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
);
if (!string.IsNullOrEmpty(accessToken))
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When("I submit the same confirmation request again")]
public async Task WhenISubmitTheSameConfirmationRequestAgain()
{
Content = new StringContent(
body,
System.Text.Encoding.UTF8,
"application/json"
),
};
var client = GetClient();
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
? t
: "valid-token";
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
? at
: string.Empty;
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
);
if (!string.IsNullOrEmpty(accessToken))
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
@@ -452,13 +658,45 @@ public class AuthSteps(ScenarioContext scenario)
public async Task WhenISubmitAConfirmationRequestWithAMalformedToken()
{
var client = GetClient();
var body = JsonSerializer.Serialize(
new { token = "malformed-token-not-jwt" }
);
const string token = "malformed-token-not-jwt";
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
? at
: string.Empty;
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
"/api/auth/confirm"
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
);
if (!string.IsNullOrEmpty(accessToken))
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When("I submit a refresh token request with a valid refresh token")]
public async Task WhenISubmitARefreshTokenRequestWithTheValidRefreshToken()
{
var client = GetClient();
if (scenario.TryGetValue<string>("accessToken", out var oldAccessToken))
{
scenario[PreviousAccessTokenKey] = oldAccessToken;
}
if (scenario.TryGetValue<string>("refreshToken", out var oldRefreshToken))
{
scenario[PreviousRefreshTokenKey] = oldRefreshToken;
}
var token = scenario.TryGetValue<string>("refreshToken", out var t)
? t
: "valid-refresh-token";
var body = JsonSerializer.Serialize(new { refreshToken = token });
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
"/api/auth/refresh"
)
{
Content = new StringContent(
@@ -474,14 +712,13 @@ public class AuthSteps(ScenarioContext scenario)
scenario[ResponseBodyKey] = responseBody;
}
[When("I submit a refresh token request with the valid refresh token")]
public async Task WhenISubmitARefreshTokenRequestWithTheValidRefreshToken()
[When("I submit a refresh token request with an invalid refresh token")]
public async Task WhenISubmitARefreshTokenRequestWithAnInvalidRefreshToken()
{
var client = GetClient();
var token = scenario.TryGetValue<string>("refreshToken", out var t)
? t
: "valid-refresh-token";
var body = JsonSerializer.Serialize(new { refreshToken = token });
var body = JsonSerializer.Serialize(
new { refreshToken = "invalid-refresh-token" }
);
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
@@ -569,7 +806,9 @@ public class AuthSteps(ScenarioContext scenario)
};
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
// Protected Endpoint Steps
@@ -583,14 +822,17 @@ public class AuthSteps(ScenarioContext scenario)
);
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[Given("I am logged in with an immediately-expiring access token")]
public async Task GivenIAmLoggedInWithAnImmediatelyExpiringAccessToken()
public Task GivenIAmLoggedInWithAnImmediatelyExpiringAccessToken()
{
// For now, create a normal login; in production this would generate an immediately-expiring token
await GivenIAmLoggedIn();
// Simulate an expired access token for auth rejection behavior.
scenario["accessToken"] = "expired-access-token";
return Task.CompletedTask;
}
[Given("I have an access token signed with the wrong secret")]
@@ -618,7 +860,9 @@ public class AuthSteps(ScenarioContext scenario)
};
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When("I submit a request to a protected endpoint with the tampered token")]
@@ -638,7 +882,9 @@ public class AuthSteps(ScenarioContext scenario)
};
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When(
@@ -660,7 +906,9 @@ public class AuthSteps(ScenarioContext scenario)
};
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[Given("I have a valid confirmation token")]
@@ -669,6 +917,91 @@ public class AuthSteps(ScenarioContext scenario)
scenario["confirmationToken"] = "valid-confirmation-token";
}
[When("I submit a confirmation request with the expired token")]
public async Task WhenISubmitAConfirmationRequestWithTheExpiredToken()
{
var client = GetClient();
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
? t
: "expired-confirmation-token";
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
? at
: string.Empty;
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
);
if (!string.IsNullOrEmpty(accessToken))
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When("I submit a confirmation request with the tampered token")]
public async Task WhenISubmitAConfirmationRequestWithTheTamperedToken()
{
var client = GetClient();
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
? t
: "tampered-confirmation-token";
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
? at
: string.Empty;
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
);
if (!string.IsNullOrEmpty(accessToken))
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When("I submit a confirmation request with a missing token")]
public async Task WhenISubmitAConfirmationRequestWithAMissingToken()
{
var client = GetClient();
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
? at
: string.Empty;
var requestMessage = new HttpRequestMessage(HttpMethod.Post, "/api/auth/confirm");
if (!string.IsNullOrEmpty(accessToken))
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When("I submit a confirmation request using an invalid HTTP method")]
public async Task WhenISubmitAConfirmationRequestUsingAnInvalidHttpMethod()
{
var client = GetClient();
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
? t
: "valid-confirmation-token";
var requestMessage = new HttpRequestMessage(
HttpMethod.Get,
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
);
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When(
"I submit a request to a protected endpoint with my confirmation token instead of access token"
)]
@@ -688,6 +1021,194 @@ public class AuthSteps(ScenarioContext scenario)
};
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When("I submit a confirmation request with an invalid token")]
public async Task WhenISubmitAConfirmationRequestWithAnInvalidToken()
{
var client = GetClient();
const string token = "invalid-confirmation-token";
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
? at
: string.Empty;
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
);
if (!string.IsNullOrEmpty(accessToken))
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When("I submit a confirmation request with the valid token without an access token")]
public async Task WhenISubmitAConfirmationRequestWithTheValidTokenWithoutAnAccessToken()
{
var client = GetClient();
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
? t
: "valid-token";
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
);
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[Then("the response JSON should have a new access token")]
public void ThenTheResponseJsonShouldHaveANewAccessToken()
{
scenario
.TryGetValue<string>(ResponseBodyKey, out var responseBody)
.Should()
.BeTrue();
using var doc = JsonDocument.Parse(responseBody!);
var payload = doc.RootElement.GetProperty("payload");
var accessToken = ParseTokenFromPayload(
payload,
"accessToken",
"AccessToken"
);
accessToken.Should().NotBeNullOrWhiteSpace();
if (
scenario.TryGetValue<string>(
PreviousAccessTokenKey,
out var previousAccessToken
)
)
{
accessToken.Should().NotBe(previousAccessToken);
}
}
[Then("the response JSON should have a new refresh token")]
public void ThenTheResponseJsonShouldHaveANewRefreshToken()
{
scenario
.TryGetValue<string>(ResponseBodyKey, out var responseBody)
.Should()
.BeTrue();
using var doc = JsonDocument.Parse(responseBody!);
var payload = doc.RootElement.GetProperty("payload");
var refreshToken = ParseTokenFromPayload(
payload,
"refreshToken",
"RefreshToken"
);
refreshToken.Should().NotBeNullOrWhiteSpace();
if (
scenario.TryGetValue<string>(
PreviousRefreshTokenKey,
out var previousRefreshToken
)
)
{
refreshToken.Should().NotBe(previousRefreshToken);
}
}
[Given("I have confirmed my account")]
public async Task GivenIHaveConfirmedMyAccount()
{
var client = GetClient();
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
? t
: throw new InvalidOperationException("confirmation token not found");
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
? at
: string.Empty;
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
);
if (!string.IsNullOrEmpty(accessToken))
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
var response = await client.SendAsync(requestMessage);
response.EnsureSuccessStatusCode();
}
[When("I submit a resend confirmation request for my account")]
public async Task WhenISubmitAResendConfirmationRequestForMyAccount()
{
var client = GetClient();
var userId = scenario.TryGetValue<Guid>(RegisteredUserIdKey, out var id)
? id
: throw new InvalidOperationException("registered user ID not found");
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
? at
: string.Empty;
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
$"/api/auth/confirm/resend?userId={userId}"
);
if (!string.IsNullOrEmpty(accessToken))
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When("I submit a resend confirmation request for a non-existent user")]
public async Task WhenISubmitAResendConfirmationRequestForANonExistentUser()
{
var client = GetClient();
var fakeUserId = Guid.NewGuid();
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
? at
: string.Empty;
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
$"/api/auth/confirm/resend?userId={fakeUserId}"
);
if (!string.IsNullOrEmpty(accessToken))
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
[When("I submit a resend confirmation request without an access token")]
public async Task WhenISubmitAResendConfirmationRequestWithoutAnAccessToken()
{
var client = GetClient();
var userId = scenario.TryGetValue<Guid>(RegisteredUserIdKey, out var id)
? id
: Guid.NewGuid();
var requestMessage = new HttpRequestMessage(
HttpMethod.Post,
$"/api/auth/confirm/resend?userId={userId}"
);
var response = await client.SendAsync(requestMessage);
var responseBody = await response.Content.ReadAsStringAsync();
scenario[ResponseKey] = response;
scenario[ResponseBodyKey] = responseBody;
}
}

View File

@@ -0,0 +1,117 @@
@using Infrastructure.Email.Templates.Components
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="x-apple-disable-message-reformatting">
<title>Resend Confirmation - The Biergarten App</title>
<!--[if mso]>
<style>
* { font-family: Arial, sans-serif !important; }
table { border-collapse: collapse; }
</style>
<![endif]-->
<!--[if !mso]><!-->
<style>
* {
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif;
}
</style>
<!--<![endif]-->
</head>
<body style="margin:0; padding:0; background-color:#f4f4f4; width:100%;">
<table role="presentation" border="0" cellpadding="0" cellspacing="0" width="100%" style="background-color:#f4f4f4;">
<tr>
<td align="center" style="padding:40px 10px;">
<!--[if mso]>
<table border="0" cellpadding="0" cellspacing="0" width="600" style="width:600px;">
<tr><td>
<![endif]-->
<table role="presentation" border="0" cellpadding="0" cellspacing="0" width="100%"
style="max-width:600px; background:#ffffff; border-radius:8px; box-shadow:0 2px 8px rgba(0,0,0,.08);">
<Header />
<tr>
<td style="padding:40px 40px 16px 40px; text-align:center;">
<h1 style="margin:0; color:#333333; font-size:26px; font-weight:700;">
New Confirmation Link
</h1>
</td>
</tr>
<tr>
<td style="padding:0 40px 20px 40px; text-align:center;">
<p style="margin:0; color:#666666; font-size:16px; line-height:24px;">
Hi <strong style="color:#333333;">@Username</strong>, you requested another email confirmation
link.
Use the button below to verify your account.
</p>
</td>
</tr>
<tr>
<td style="padding:8px 40px;">
<table role="presentation" border="0" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td align="center">
<!--[if mso]>
<v:roundrect xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="urn:schemas-microsoft-com:office:word"
href="@ConfirmationLink" style="height:50px;v-text-anchor:middle;width:260px;"
arcsize="10%" stroke="f" fillcolor="#f59e0b">
<w:anchorlock/>
<center style="color:#ffffff;font-family:Arial,sans-serif;font-size:16px;font-weight:700;">
Confirm Email Again
</center>
</v:roundrect>
<![endif]-->
<!--[if !mso]><!-->
<a href="@ConfirmationLink" target="_blank" rel="noopener noreferrer"
style="display:inline-block; padding:16px 40px; background:#d97706; color:#ffffff; text-decoration:none; border-radius:6px; font-size:16px; font-weight:700;">
Confirm Email Again
</a>
<!--<![endif]-->
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td style="padding:20px 40px 8px 40px; text-align:center;">
<p style="margin:0; color:#999999; font-size:13px; line-height:20px;">
This replacement link expires in 24 hours.
</p>
</td>
</tr>
<tr>
<td style="padding:0 40px 28px 40px; text-align:center;">
<p style="margin:0; color:#999999; font-size:13px; line-height:20px;">
If you did not request this, you can safely ignore this email.
</p>
</td>
</tr>
<EmailFooter FooterText="Cheers, The Biergarten App Team" />
</table>
<!--[if mso]></td></tr></table><![endif]-->
</td>
</tr>
</table>
</body>
</html>
@code {
[Parameter]
public string Username { get; set; } = string.Empty;
[Parameter]
public string ConfirmationLink { get; set; } = string.Empty;
}

View File

@@ -30,6 +30,23 @@ public class EmailTemplateProvider(
return await RenderComponentAsync<UserRegistration>(parameters);
}
/// <summary>
/// Renders the ResendConfirmation template with the specified parameters.
/// </summary>
public async Task<string> RenderResendConfirmationEmailAsync(
string username,
string confirmationLink
)
{
var parameters = new Dictionary<string, object?>
{
{ nameof(ResendConfirmation.Username), username },
{ nameof(ResendConfirmation.ConfirmationLink), confirmationLink },
};
return await RenderComponentAsync<ResendConfirmation>(parameters);
}
/// <summary>
/// Generic method to render any Razor component to HTML.
/// </summary>

View File

@@ -15,4 +15,15 @@ public interface IEmailTemplateProvider
string username,
string confirmationLink
);
/// <summary>
/// Renders the ResendConfirmation template with the specified parameters.
/// </summary>
/// <param name="username">The username to include in the email</param>
/// <param name="confirmationLink">The new confirmation link</param>
/// <returns>The rendered HTML string</returns>
Task<string> RenderResendConfirmationEmailAsync(
string username,
string confirmationLink
);
}

View File

@@ -2,6 +2,7 @@ using System.Data;
using System.Data.Common;
using Domain.Entities;
using Infrastructure.Repository.Sql;
using Microsoft.Data.SqlClient;
namespace Infrastructure.Repository.Auth;
@@ -132,6 +133,12 @@ public class AuthRepository(ISqlConnectionFactory connectionFactory)
return null;
}
// Idempotency: if already verified, treat as successful confirmation.
if (await IsUserVerifiedAsync(userAccountId))
{
return user;
}
await using var connection = await CreateConnection();
await using var command = connection.CreateCommand();
command.CommandText = "USP_CreateUserVerification";
@@ -139,12 +146,39 @@ public class AuthRepository(ISqlConnectionFactory connectionFactory)
AddParameter(command, "@UserAccountID_", userAccountId);
try
{
await command.ExecuteNonQueryAsync();
}
catch (SqlException ex) when (IsDuplicateVerificationViolation(ex))
{
// A concurrent request verified this user first. Keep behavior idempotent.
}
// Fetch and return the updated user
return await GetUserByIdAsync(userAccountId);
}
public async Task<bool> IsUserVerifiedAsync(Guid userAccountId)
{
await using var connection = await CreateConnection();
await using var command = connection.CreateCommand();
command.CommandText =
"SELECT TOP 1 1 FROM dbo.UserVerification WHERE UserAccountID = @UserAccountID";
command.CommandType = CommandType.Text;
AddParameter(command, "@UserAccountID", userAccountId);
var result = await command.ExecuteScalarAsync();
return result != null && result != DBNull.Value;
}
private static bool IsDuplicateVerificationViolation(SqlException ex)
{
// 2601/2627 are duplicate key violations in SQL Server.
return ex.Number == 2601 || ex.Number == 2627;
}
/// <summary>
/// Maps a data reader row to a UserAccount entity.

View File

@@ -75,4 +75,11 @@ public interface IAuthRepository
/// <param name="userAccountId">ID of the user account</param>
/// <returns>UserAccount if found, null otherwise</returns>
Task<Domain.Entities.UserAccount?> GetUserByIdAsync(Guid userAccountId);
/// <summary>
/// Checks whether a user account has been verified.
/// </summary>
/// <param name="userAccountId">ID of the user account</param>
/// <returns>True if the user has a verification record, false otherwise</returns>
Task<bool> IsUserVerifiedAsync(Guid userAccountId);
}

View File

@@ -5,6 +5,7 @@ using Domain.Exceptions;
using FluentAssertions;
using Infrastructure.Repository.Auth;
using Moq;
using Service.Emails;
namespace Service.Auth.Tests;
@@ -12,16 +13,19 @@ public class ConfirmationServiceTest
{
private readonly Mock<IAuthRepository> _authRepositoryMock;
private readonly Mock<ITokenService> _tokenServiceMock;
private readonly Mock<IEmailService> _emailServiceMock;
private readonly ConfirmationService _confirmationService;
public ConfirmationServiceTest()
{
_authRepositoryMock = new Mock<IAuthRepository>();
_tokenServiceMock = new Mock<ITokenService>();
_emailServiceMock = new Mock<IEmailService>();
_confirmationService = new ConfirmationService(
_authRepositoryMock.Object,
_tokenServiceMock.Object
_tokenServiceMock.Object,
_emailServiceMock.Object
);
}

View File

@@ -0,0 +1,53 @@
using Domain.Exceptions;
using Infrastructure.Repository.Auth;
using Service.Emails;
namespace Service.Auth;
public class ConfirmationService(
IAuthRepository authRepository,
ITokenService tokenService,
IEmailService emailService
) : IConfirmationService
{
public async Task<ConfirmationServiceReturn> ConfirmUserAsync(
string confirmationToken
)
{
var validatedToken = await tokenService.ValidateConfirmationTokenAsync(
confirmationToken
);
var user = await authRepository.ConfirmUserAccountAsync(
validatedToken.UserId
);
if (user == null)
{
throw new UnauthorizedException("User account not found");
}
return new ConfirmationServiceReturn(
DateTime.UtcNow,
user.UserAccountId
);
}
public async Task ResendConfirmationEmailAsync(Guid userId)
{
var user = await authRepository.GetUserByIdAsync(userId);
if (user == null)
{
return; // Silent return to prevent user enumeration
}
if (await authRepository.IsUserVerifiedAsync(userId))
{
return; // Already confirmed, no-op
}
var confirmationToken = tokenService.GenerateConfirmationToken(user);
await emailService.SendResendConfirmationEmailAsync(user, confirmationToken);
}
}

View File

@@ -1,4 +1,4 @@
using System.Runtime.InteropServices.JavaScript;
using Domain.Exceptions;
using Infrastructure.Repository.Auth;
namespace Service.Auth;
@@ -8,16 +8,6 @@ public record ConfirmationServiceReturn(DateTime ConfirmedAt, Guid UserId);
public interface IConfirmationService
{
Task<ConfirmationServiceReturn> ConfirmUserAsync(string confirmationToken);
}
Task ResendConfirmationEmailAsync(Guid userId);
public class ConfirmationService(IAuthRepository authRepository, ITokenService tokenService)
: IConfirmationService
{
public async Task<ConfirmationServiceReturn> ConfirmUserAsync(
string confirmationToken
)
{
return new ConfirmationServiceReturn(DateTime.Now, Guid.NewGuid());
}
}

View File

@@ -10,6 +10,11 @@ public interface IEmailService
UserAccount createdUser,
string confirmationToken
);
public Task SendResendConfirmationEmailAsync(
UserAccount user,
string confirmationToken
);
}
public class EmailService(
@@ -17,13 +22,17 @@ public class EmailService(
IEmailTemplateProvider emailTemplateProvider
) : IEmailService
{
private static readonly string WebsiteBaseUrl =
Environment.GetEnvironmentVariable("WEBSITE_BASE_URL")
?? throw new InvalidOperationException("WEBSITE_BASE_URL environment variable is not set");
public async Task SendRegistrationEmailAsync(
UserAccount createdUser,
string confirmationToken
)
{
var confirmationLink =
$"https://thebiergarten.app/confirm?token={confirmationToken}";
$"{WebsiteBaseUrl}/users/confirm?token={confirmationToken}";
var emailHtml =
await emailTemplateProvider.RenderUserRegisteredEmailAsync(
@@ -38,4 +47,26 @@ public class EmailService(
isHtml: true
);
}
public async Task SendResendConfirmationEmailAsync(
UserAccount user,
string confirmationToken
)
{
var confirmationLink =
$"{WebsiteBaseUrl}/users/confirm?token={confirmationToken}";
var emailHtml =
await emailTemplateProvider.RenderResendConfirmationEmailAsync(
user.FirstName,
confirmationLink
);
await emailProvider.SendAsync(
user.Email,
"Confirm Your Email - The Biergarten App",
emailHtml,
isHtml: true
);
}
}

10881
src/Website-v1/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,98 @@
{
"name": "biergarten",
"version": "0.1.0",
"private": true,
"scripts": {
"dev": "next dev",
"build": "next build",
"prestart": "npm run build",
"start": "next start",
"lint": "next lint",
"clear-db": "npx ts-node ./src/prisma/seed/clear/index.ts",
"format": "npx prettier . --write; npx prisma format;",
"format-watch": "npx onchange \"**/*\" -- prettier --write --ignore-unknown {{changed}}",
"seed": "npx --max-old-space-size=4096 ts-node ./src/prisma/seed/index.ts"
},
"dependencies": {
"@hapi/iron": "^7.0.1",
"@headlessui/react": "^1.7.15",
"@headlessui/tailwindcss": "^0.2.0",
"@hookform/resolvers": "^3.3.1",
"@mapbox/mapbox-sdk": "^0.15.2",
"@mapbox/search-js-core": "^1.0.0-beta.17",
"@mapbox/search-js-react": "^1.0.0-beta.17",
"@next/bundle-analyzer": "^14.0.3",
"@prisma/client": "^5.7.0",
"@react-email/components": "^0.0.11",
"@react-email/render": "^0.0.9",
"@react-email/tailwind": "^0.0.12",
"@vercel/analytics": "^1.1.0",
"argon2": "^0.31.1",
"classnames": "^2.5.1",
"cloudinary": "^1.41.0",
"cookie": "^0.7.0",
"date-fns": "^2.30.0",
"dotenv": "^16.3.1",
"jsonwebtoken": "^9.0.1",
"lodash": "^4.17.21",
"mapbox-gl": "^3.4.0",
"multer": "^1.4.5-lts.1",
"next": "^14.2.22",
"next-cloudinary": "^5.10.0",
"next-connect": "^1.0.0-next.3",
"passport": "^0.6.0",
"passport-local": "^1.0.0",
"pino": "^10.0.0",
"react": "^18.2.0",
"react-daisyui": "^5.0.0",
"react-dom": "^18.2.0",
"react-email": "^1.9.5",
"react-hook-form": "^7.45.2",
"react-hot-toast": "^2.4.1",
"react-icons": "^4.10.1",
"react-intersection-observer": "^9.5.2",
"react-map-gl": "^7.1.7",
"react-responsive-carousel": "^3.2.23",
"swr": "^2.2.0",
"theme-change": "^2.5.0",
"zod": "^3.21.4"
},
"devDependencies": {
"@faker-js/faker": "^8.3.1",
"@types/cookie": "^0.5.1",
"@types/express": "^4.17.21",
"@types/jsonwebtoken": "^9.0.2",
"@types/lodash": "^4.14.195",
"@types/mapbox__mapbox-sdk": "^0.13.4",
"@types/multer": "^1.4.7",
"@types/node": "^20.4.2",
"@types/passport-local": "^1.0.35",
"@types/react": "^18.2.15",
"@types/react-dom": "^18.2.7",
"@vercel/fetch": "^7.0.0",
"autoprefixer": "^10.4.14",
"daisyui": "^4.7.2",
"dotenv-cli": "^7.2.1",
"eslint": "^8.51.0",
"eslint-config-airbnb-base": "15.0.0",
"eslint-config-airbnb-typescript": "17.1.0",
"eslint-config-next": "^13.5.4",
"eslint-config-prettier": "^9.0.0",
"eslint-plugin-react": "^7.33.2",
"generate-password": "^1.7.1",
"onchange": "^7.1.0",
"postcss": "^8.4.26",
"prettier": "^3.0.0",
"prettier-plugin-jsdoc": "^1.0.2",
"prettier-plugin-tailwindcss": "^0.5.7",
"prisma": "^5.7.0",
"tailwindcss": "^3.4.1",
"tailwindcss-animated": "^1.0.1",
"ts-node": "^10.9.1",
"typescript": "^5.3.2"
},
"prisma": {
"schema": "./src/prisma/schema.prisma",
"seed": "npm run seed"
}
}

View File

@@ -0,0 +1,6 @@
module.exports = {
plugins: {
tailwindcss: {},
autoprefixer: {},
},
};

View File

Before

Width:  |  Height:  |  Size: 203 KiB

After

Width:  |  Height:  |  Size: 203 KiB

View File

Before

Width:  |  Height:  |  Size: 7.3 KiB

After

Width:  |  Height:  |  Size: 7.3 KiB

View File

Before

Width:  |  Height:  |  Size: 26 KiB

After

Width:  |  Height:  |  Size: 26 KiB

View File

Before

Width:  |  Height:  |  Size: 6.5 KiB

After

Width:  |  Height:  |  Size: 6.5 KiB

View File

Before

Width:  |  Height:  |  Size: 515 B

After

Width:  |  Height:  |  Size: 515 B

View File

Before

Width:  |  Height:  |  Size: 961 B

After

Width:  |  Height:  |  Size: 961 B

View File

Before

Width:  |  Height:  |  Size: 15 KiB

After

Width:  |  Height:  |  Size: 15 KiB

View File

Before

Width:  |  Height:  |  Size: 256 KiB

After

Width:  |  Height:  |  Size: 256 KiB

Some files were not shown because too many files have changed in this diff Show More