Compare commits
20 Commits
f1194d3da8
...
pipeline
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
60ee2ecf74 | ||
|
|
e4e16a5084 | ||
|
|
8d306bf691 | ||
|
|
077f6ab4ae | ||
|
|
534403734a | ||
|
|
3af053f0eb | ||
|
|
ba165d8aa7 | ||
|
|
eb9a2767b4 | ||
|
|
29ea47fdb6 | ||
|
|
52e2333304 | ||
|
|
a1f0ca5b20 | ||
|
|
2ea8aa52b4 | ||
|
|
98083ab40c | ||
|
|
ac136f7179 | ||
|
|
280c9c61bd | ||
|
|
248a51b35f | ||
|
|
35aa7bc0df | ||
|
|
581863d69b | ||
|
|
9238036042 | ||
|
|
431e11e052 |
14
.gitignore
vendored
@@ -15,6 +15,14 @@
|
||||
# production
|
||||
/build
|
||||
|
||||
# project-specific build artifacts
|
||||
/src/Website/build/
|
||||
/src/Website/storybook-static/
|
||||
/src/Website/.react-router/
|
||||
/src/Website/playwright-report/
|
||||
/src/Website/test-results/
|
||||
/test-results/
|
||||
|
||||
# misc
|
||||
.DS_Store
|
||||
*.pem
|
||||
@@ -42,6 +50,9 @@ next-env.d.ts
|
||||
|
||||
# vscode
|
||||
.vscode
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
|
||||
/cloudinary-images
|
||||
|
||||
@@ -487,3 +498,6 @@ FodyWeavers.xsd
|
||||
.env.dev
|
||||
.env.test
|
||||
.env.prod
|
||||
|
||||
*storybook.log
|
||||
storybook-static
|
||||
|
||||
959
LICENSE.md
285
README.md
@@ -1,261 +1,142 @@
|
||||
# The Biergarten App
|
||||
|
||||
A social platform for craft beer enthusiasts to discover breweries, share reviews, and
|
||||
connect with fellow beer lovers.
|
||||
The Biergarten App is a multi-project monorepo with a .NET backend and an active React
|
||||
Router frontend in `src/Website`. The current website focuses on account flows, theme
|
||||
switching, shared UI components, Storybook coverage, and integration with the API.
|
||||
|
||||
**Documentation**
|
||||
## Documentation
|
||||
|
||||
- [Getting Started](docs/getting-started.md) - Setup and installation
|
||||
- [Architecture](docs/architecture.md) - System design and patterns
|
||||
- [Database](docs/database.md) - Schema and stored procedures
|
||||
- [Docker Guide](docs/docker.md) - Container deployment
|
||||
- [Testing](docs/testing.md) - Test strategy and commands
|
||||
- [Environment Variables](docs/environment-variables.md) - Configuration reference
|
||||
- [Getting Started](docs/getting-started.md) - Local setup for backend and active website
|
||||
- [Architecture](docs/architecture.md) - Current backend and frontend architecture
|
||||
- [Docker Guide](docs/docker.md) - Container-based backend development and testing
|
||||
- [Testing](docs/testing.md) - Backend and frontend test commands
|
||||
- [Environment Variables](docs/environment-variables.md) - Active configuration reference
|
||||
- [Token Validation](docs/token-validation.md) - JWT validation architecture
|
||||
- [Legacy Website Archive](docs/archive/legacy-website-v1.md) - Archived notes for the old Next.js frontend
|
||||
|
||||
**Diagrams**
|
||||
## Diagrams
|
||||
|
||||
- [Architecture](docs/diagrams/pdf/architecture.pdf) - Layered architecture
|
||||
- [Deployment](docs/diagrams/pdf/deployment.pdf) - Docker topology
|
||||
- [Authentication Flow](docs/diagrams/pdf/authentication-flow.pdf) - Auth sequence
|
||||
- [Database Schema](docs/diagrams/pdf/database-schema.pdf) - Entity relationships
|
||||
- [Architecture](docs/diagrams-out/architecture.svg) - Layered architecture
|
||||
- [Deployment](docs/diagrams-out/deployment.svg) - Docker topology
|
||||
- [Authentication Flow](docs/diagrams-out/authentication-flow.svg) - Auth sequence
|
||||
- [Database Schema](docs/diagrams-out/database-schema.svg) - Entity relationships
|
||||
|
||||
## Project Status
|
||||
## Current Status
|
||||
|
||||
**Active Development** - Transitioning from full-stack Next.js to multi-project monorepo
|
||||
Active areas in the repository:
|
||||
|
||||
- Core authentication and user management APIs
|
||||
- Database schema with migrations and seeding
|
||||
- Layered architecture (Domain, Service, Infrastructure, Repository, API)
|
||||
- Comprehensive test suite (unit + integration)
|
||||
- Frontend integration with .NET API (in progress)
|
||||
- Migration from Next.js serverless functions
|
||||
- .NET 10 backend with layered architecture and SQL Server
|
||||
- React Router 7 website in `src/Website`
|
||||
- Shared Biergarten theme system with a theme guide route
|
||||
- Storybook stories and browser-based checks for shared UI
|
||||
- Auth demo flows for home, login, register, dashboard, logout, and confirmation
|
||||
- Toast-based feedback for auth outcomes
|
||||
|
||||
---
|
||||
Legacy area retained for reference:
|
||||
|
||||
- `src/Website-v1` contains the archived Next.js frontend and is no longer the active website
|
||||
|
||||
## Tech Stack
|
||||
|
||||
**Backend**: .NET 10, ASP.NET Core, SQL Server 2022, DbUp **Frontend**: Next.js 14+,
|
||||
TypeScript, TailwindCSS **Testing**: xUnit, Reqnroll (BDD), FluentAssertions, Moq
|
||||
**Infrastructure**: Docker, Docker Compose **Security**: Argon2id password hashing, JWT
|
||||
(HS256)
|
||||
|
||||
---
|
||||
- **Backend**: .NET 10, ASP.NET Core, SQL Server 2022, DbUp
|
||||
- **Frontend**: React 19, React Router 7, Vite 7, Tailwind CSS 4, DaisyUI 5
|
||||
- **UI Documentation**: Storybook 10, Vitest browser mode, Playwright
|
||||
- **Testing**: xUnit, Reqnroll (BDD), FluentAssertions, Moq
|
||||
- **Infrastructure**: Docker, Docker Compose
|
||||
- **Security**: Argon2id password hashing, JWT access/refresh/confirmation tokens
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- [.NET SDK 10+](https://dotnet.microsoft.com/download)
|
||||
- [Docker Desktop](https://www.docker.com/products/docker-desktop)
|
||||
- [Node.js 18+](https://nodejs.org/) (for frontend)
|
||||
|
||||
### Start Development Environment
|
||||
### Backend
|
||||
|
||||
```bash
|
||||
# Clone repository
|
||||
git clone https://github.com/aaronpo97/the-biergarten-app
|
||||
cd the-biergarten-app
|
||||
|
||||
# Configure environment
|
||||
cp .env.example .env.dev
|
||||
|
||||
# Start all services
|
||||
docker compose -f docker-compose.dev.yaml up -d
|
||||
|
||||
# View logs
|
||||
docker compose -f docker-compose.dev.yaml logs -f
|
||||
```
|
||||
|
||||
**Access**:
|
||||
Backend access:
|
||||
|
||||
- API: http://localhost:8080/swagger
|
||||
- Health: http://localhost:8080/health
|
||||
- API Swagger: http://localhost:8080/swagger
|
||||
- Health Check: http://localhost:8080/health
|
||||
|
||||
### Run Tests
|
||||
### Frontend
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.test.yaml up --abort-on-container-exit
|
||||
cd src/Website
|
||||
npm install
|
||||
API_BASE_URL=http://localhost:8080 SESSION_SECRET=dev-secret npm run dev
|
||||
```
|
||||
|
||||
Results are in `./test-results/`
|
||||
Optional frontend tools:
|
||||
|
||||
---
|
||||
```bash
|
||||
cd src/Website
|
||||
npm run storybook
|
||||
npm run test:storybook
|
||||
npm run test:storybook:playwright
|
||||
```
|
||||
|
||||
## Repository Structure
|
||||
|
||||
```text
|
||||
src/Core/ Backend projects (.NET)
|
||||
src/Website/ Active React Router frontend
|
||||
src/Website-v1/ Archived legacy Next.js frontend
|
||||
docs/ Active project documentation
|
||||
docs/archive/ Archived legacy documentation
|
||||
```
|
||||
src/Core/ # Backend (.NET)
|
||||
├── API/
|
||||
│ ├── API.Core/ # ASP.NET Core Web API
|
||||
│ └── API.Specs/ # Integration tests (Reqnroll)
|
||||
├── Database/
|
||||
│ ├── Database.Migrations/ # DbUp migrations
|
||||
│ └── Database.Seed/ # Data seeding
|
||||
├── Domain.Entities/ # Domain models
|
||||
├── Infrastructure/ # Cross-cutting concerns
|
||||
│ ├── Infrastructure.Jwt/
|
||||
│ ├── Infrastructure.PasswordHashing/
|
||||
│ ├── Infrastructure.Email/
|
||||
│ ├── Infrastructure.Repository/
|
||||
│ └── Infrastructure.Repository.Tests/
|
||||
└── Service/ # Business logic
|
||||
├── Service.Auth/
|
||||
├── Service.Auth.Tests/
|
||||
└── Service.UserManagement/
|
||||
|
||||
Website/ # Frontend (Next.js)
|
||||
docs/ # Documentation
|
||||
docs/diagrams/ # PlantUML diagrams
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Key Features
|
||||
|
||||
### Implemented
|
||||
Implemented today:
|
||||
|
||||
- User registration and authentication
|
||||
- JWT token-based auth
|
||||
- Argon2id password hashing
|
||||
- SQL Server with stored procedures
|
||||
- Database migrations (DbUp)
|
||||
- Docker containerization
|
||||
- Comprehensive test suite
|
||||
- Swagger/OpenAPI documentation
|
||||
- Health checks
|
||||
- User registration and login against the API
|
||||
- JWT-based auth with access, refresh, and confirmation flows
|
||||
- SQL Server migrations and seed projects
|
||||
- Shared form components and auth screens
|
||||
- Theme switching with Lager, Stout, Cassis, and Weizen variants
|
||||
- Storybook documentation and automated story interaction tests
|
||||
- Toast feedback for auth-related outcomes
|
||||
|
||||
### Planned
|
||||
Planned next:
|
||||
|
||||
- [ ] Brewery discovery and management
|
||||
- [ ] Beer reviews and ratings
|
||||
- [ ] Social following/followers
|
||||
- [ ] Geospatial brewery search
|
||||
- [ ] Image upload (Cloudinary)
|
||||
- [ ] Email notifications
|
||||
- [ ] OAuth integration
|
||||
|
||||
---
|
||||
|
||||
## Architecture Highlights
|
||||
|
||||
### Layered Architecture
|
||||
|
||||
```
|
||||
API Layer (Controllers)
|
||||
│
|
||||
Service Layer (Business Logic)
|
||||
│
|
||||
Infrastructure Layer (Repositories, JWT, Email)
|
||||
│
|
||||
Domain Layer (Entities)
|
||||
│
|
||||
Database (SQL Server + Stored Procedures)
|
||||
```
|
||||
|
||||
### SQL-First Approach
|
||||
|
||||
- All queries via stored procedures
|
||||
- No ORM (no Entity Framework)
|
||||
- Version-controlled schema
|
||||
|
||||
### Security
|
||||
|
||||
- **Password Hashing**: Argon2id (64MB memory, 4 iterations)
|
||||
- **JWT Tokens**: HS256 with configurable expiration
|
||||
- **Credential Rotation**: Built-in password change support
|
||||
|
||||
See [Architecture Guide](docs/architecture.md) for details.
|
||||
|
||||
---
|
||||
- Brewery discovery and management
|
||||
- Beer reviews and ratings
|
||||
- Social follow relationships
|
||||
- Geospatial brewery experiences
|
||||
- Additional frontend routes beyond the auth demo
|
||||
|
||||
## Testing
|
||||
|
||||
The project includes three test suites:
|
||||
Backend suites:
|
||||
|
||||
| Suite | Type | Framework | Purpose |
|
||||
| ---------------------- | ----------- | -------------- | ---------------------- |
|
||||
| **API.Specs** | Integration | Reqnroll (BDD) | End-to-end API testing |
|
||||
| **Repository.Tests** | Unit | xUnit | Data access layer |
|
||||
| **Service.Auth.Tests** | Unit | xUnit + Moq | Business logic |
|
||||
- `API.Specs` - integration tests
|
||||
- `Infrastructure.Repository.Tests` - repository unit tests
|
||||
- `Service.Auth.Tests` - service unit tests
|
||||
|
||||
**Run All Tests**:
|
||||
Frontend suites:
|
||||
|
||||
- Storybook interaction tests via Vitest
|
||||
- Storybook browser regression checks via Playwright
|
||||
|
||||
Run all backend tests with Docker:
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.test.yaml up --abort-on-container-exit
|
||||
```
|
||||
|
||||
**Run Individual Test Suite**:
|
||||
|
||||
```bash
|
||||
cd src/Core
|
||||
dotnet test API/API.Specs/API.Specs.csproj
|
||||
dotnet test Infrastructure/Infrastructure.Repository.Tests/Infrastructure.Repository.Tests.csproj
|
||||
dotnet test Service/Service.Auth.Tests/Service.Auth.Tests.csproj
|
||||
```
|
||||
|
||||
See [Testing Guide](docs/testing.md) for more information.
|
||||
|
||||
---
|
||||
|
||||
## Docker Environments
|
||||
|
||||
The project uses three Docker Compose configurations:
|
||||
|
||||
| File | Purpose | Features |
|
||||
| ---------------------------- | ------------- | ------------------------------------------------- |
|
||||
| **docker-compose.dev.yaml** | Development | Persistent data, hot reload, Swagger UI |
|
||||
| **docker-compose.test.yaml** | CI/CD Testing | Isolated DB, auto-exit, test results export |
|
||||
| **docker-compose.prod.yaml** | Production | Optimized builds, health checks, restart policies |
|
||||
|
||||
**Common Commands**:
|
||||
|
||||
```bash
|
||||
# Development
|
||||
docker compose -f docker-compose.dev.yaml up -d
|
||||
docker compose -f docker-compose.dev.yaml logs -f api.core
|
||||
docker compose -f docker-compose.dev.yaml down -v
|
||||
|
||||
# Testing
|
||||
docker compose -f docker-compose.test.yaml up --abort-on-container-exit
|
||||
docker compose -f docker-compose.test.yaml down -v
|
||||
|
||||
# Build
|
||||
docker compose -f docker-compose.dev.yaml build
|
||||
docker compose -f docker-compose.dev.yaml build --no-cache
|
||||
```
|
||||
|
||||
See [Docker Guide](docs/docker.md) for troubleshooting and advanced usage.
|
||||
|
||||
---
|
||||
See [Testing](docs/testing.md) for the full command list.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Required Environment Variables
|
||||
Common active variables:
|
||||
|
||||
**Backend** (`.env.dev`):
|
||||
- Backend: `DB_SERVER`, `DB_NAME`, `DB_USER`, `DB_PASSWORD`, `ACCESS_TOKEN_SECRET`, `REFRESH_TOKEN_SECRET`, `CONFIRMATION_TOKEN_SECRET`
|
||||
- Frontend: `API_BASE_URL`, `SESSION_SECRET`, `NODE_ENV`
|
||||
|
||||
```bash
|
||||
DB_SERVER=sqlserver,1433
|
||||
DB_NAME=Biergarten
|
||||
DB_USER=sa
|
||||
DB_PASSWORD=YourStrong!Passw0rd
|
||||
JWT_SECRET=<min-32-chars>
|
||||
```
|
||||
|
||||
**Frontend** (`.env.local`):
|
||||
|
||||
```bash
|
||||
BASE_URL=http://localhost:3000
|
||||
NODE_ENV=development
|
||||
CONFIRMATION_TOKEN_SECRET=<generated>
|
||||
RESET_PASSWORD_TOKEN_SECRET=<generated>
|
||||
SESSION_SECRET=<generated>
|
||||
# + External services (Cloudinary, Mapbox, SparkPost)
|
||||
```
|
||||
|
||||
See [Environment Variables Guide](docs/environment-variables.md) for complete reference.
|
||||
|
||||
---
|
||||
See [Environment Variables](docs/environment-variables.md) for details.
|
||||
|
||||
## Contributing
|
||||
|
||||
|
||||
@@ -94,6 +94,7 @@ services:
|
||||
ACCESS_TOKEN_SECRET: "${ACCESS_TOKEN_SECRET}"
|
||||
REFRESH_TOKEN_SECRET: "${REFRESH_TOKEN_SECRET}"
|
||||
CONFIRMATION_TOKEN_SECRET: "${CONFIRMATION_TOKEN_SECRET}"
|
||||
WEBSITE_BASE_URL: "${WEBSITE_BASE_URL}"
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- devnet
|
||||
|
||||
@@ -69,6 +69,7 @@ services:
|
||||
ACCESS_TOKEN_SECRET: "${ACCESS_TOKEN_SECRET}"
|
||||
REFRESH_TOKEN_SECRET: "${REFRESH_TOKEN_SECRET}"
|
||||
CONFIRMATION_TOKEN_SECRET: "${CONFIRMATION_TOKEN_SECRET}"
|
||||
WEBSITE_BASE_URL: "${WEBSITE_BASE_URL}"
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- prodnet
|
||||
|
||||
@@ -88,6 +88,7 @@ services:
|
||||
ACCESS_TOKEN_SECRET: "${ACCESS_TOKEN_SECRET}"
|
||||
REFRESH_TOKEN_SECRET: "${REFRESH_TOKEN_SECRET}"
|
||||
CONFIRMATION_TOKEN_SECRET: "${CONFIRMATION_TOKEN_SECRET}"
|
||||
WEBSITE_BASE_URL: "${WEBSITE_BASE_URL}"
|
||||
volumes:
|
||||
- ./test-results:/app/test-results
|
||||
restart: "no"
|
||||
|
||||
@@ -1,28 +1,27 @@
|
||||
# Architecture
|
||||
|
||||
This document describes the architecture patterns and design decisions for The Biergarten
|
||||
App.
|
||||
This document describes the active architecture of The Biergarten App.
|
||||
|
||||
## High-Level Overview
|
||||
|
||||
The Biergarten App follows a **multi-project monorepo** architecture with clear separation
|
||||
between backend and frontend:
|
||||
The Biergarten App is a monorepo with a clear split between the backend and the active
|
||||
website:
|
||||
|
||||
- **Backend**: .NET 10 Web API with SQL Server
|
||||
- **Frontend**: Next.js with TypeScript
|
||||
- **Architecture Style**: Layered architecture with SQL-first approach
|
||||
- **Backend**: .NET 10 Web API with SQL Server and a layered architecture
|
||||
- **Frontend**: React 19 + React Router 7 website in `src/Website`
|
||||
- **Architecture Style**: Layered backend plus server-rendered React frontend
|
||||
|
||||
The legacy Next.js frontend has been retained in `src/Website-v1` for reference only and is
|
||||
documented in [archive/legacy-website-v1.md](archive/legacy-website-v1.md).
|
||||
|
||||
## Diagrams
|
||||
|
||||
For visual representations, see:
|
||||
|
||||
- [architecture.pdf](diagrams/pdf/architecture.pdf) - Layered architecture diagram
|
||||
- [deployment.pdf](diagrams/pdf/deployment.pdf) - Docker deployment diagram
|
||||
- [authentication-flow.pdf](diagrams/pdf/authentication-flow.pdf) - Authentication
|
||||
workflow
|
||||
- [database-schema.pdf](diagrams/pdf/database-schema.pdf) - Database relationships
|
||||
|
||||
Generate diagrams with: `make diagrams`
|
||||
- [architecture.svg](diagrams-out/architecture.svg) - Layered architecture diagram
|
||||
- [deployment.svg](diagrams-out/deployment.svg) - Docker deployment diagram
|
||||
- [authentication-flow.svg](diagrams-out/authentication-flow.svg) - Authentication workflow
|
||||
- [database-schema.svg](diagrams-out/database-schema.svg) - Database relationships
|
||||
|
||||
## Backend Architecture
|
||||
|
||||
@@ -217,39 +216,49 @@ public interface IAuthRepository
|
||||
|
||||
## Frontend Architecture
|
||||
|
||||
### Next.js Application Structure
|
||||
### Active Website (`src/Website`)
|
||||
|
||||
```
|
||||
Website/src/
|
||||
├── components/ # React components
|
||||
├── pages/ # Next.js routes
|
||||
├── contexts/ # React context providers
|
||||
├── hooks/ # Custom React hooks
|
||||
├── controllers/ # Business logic layer
|
||||
├── services/ # API communication
|
||||
├── requests/ # API request builders
|
||||
├── validation/ # Form validation schemas
|
||||
├── config/ # Configuration & env vars
|
||||
└── prisma/ # Database schema (current)
|
||||
The current website is a React Router 7 application with server-side rendering enabled.
|
||||
|
||||
```text
|
||||
src/Website/
|
||||
├── app/
|
||||
│ ├── components/ Shared UI such as Navbar, FormField, SubmitButton, ToastProvider
|
||||
│ ├── lib/ Auth helpers, schemas, and theme metadata
|
||||
│ ├── routes/ Route modules for home, login, register, dashboard, confirm, theme
|
||||
│ ├── root.tsx App shell and global providers
|
||||
│ └── app.css Theme tokens and global styling
|
||||
├── .storybook/ Storybook config and preview setup
|
||||
├── stories/ Storybook stories for shared UI and themes
|
||||
├── tests/playwright/ Storybook Playwright coverage
|
||||
└── package.json Frontend scripts and dependencies
|
||||
```
|
||||
|
||||
### Migration Strategy
|
||||
### Frontend Responsibilities
|
||||
|
||||
The frontend is **transitioning** from a standalone architecture to integrate with the
|
||||
.NET API:
|
||||
- Render the auth demo and theme guide routes
|
||||
- Manage cookie-backed website session state
|
||||
- Call the .NET API for login, registration, token refresh, and confirmation
|
||||
- Provide shared UI building blocks for forms, navigation, themes, and toasts
|
||||
- Supply Storybook documentation and browser-based component verification
|
||||
|
||||
**Current State**:
|
||||
### Theme System
|
||||
|
||||
- Uses Prisma ORM with Postgres (Neon)
|
||||
- Has its own server-side API routes
|
||||
- Direct database access from Next.js
|
||||
The active website uses semantic DaisyUI theme tokens backed by four Biergarten themes:
|
||||
|
||||
**Target State**:
|
||||
- Biergarten Lager
|
||||
- Biergarten Stout
|
||||
- Biergarten Cassis
|
||||
- Biergarten Weizen
|
||||
|
||||
- Pure client-side Next.js app
|
||||
- All data via .NET API
|
||||
- No server-side database access
|
||||
- JWT-based authentication
|
||||
All component styling should prefer semantic tokens such as `primary`, `success`,
|
||||
`surface`, and `highlight` instead of hard-coded color values.
|
||||
|
||||
### Legacy Frontend
|
||||
|
||||
The previous Next.js frontend has been archived at `src/Website-v1`. Active product and
|
||||
engineering documentation should point to `src/Website`, while legacy notes live in
|
||||
[archive/legacy-website-v1.md](archive/legacy-website-v1.md).
|
||||
|
||||
## Security Architecture
|
||||
|
||||
@@ -385,7 +394,7 @@ dependencies
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ['CMD-SHELL', 'sqlcmd health check']
|
||||
test: ["CMD-SHELL", "sqlcmd health check"]
|
||||
interval: 10s
|
||||
retries: 12
|
||||
start_period: 30s
|
||||
|
||||
56
docs/archive/legacy-website-v1.md
Normal file
@@ -0,0 +1,56 @@
|
||||
# Legacy Website Archive (`src/Website-v1`)
|
||||
|
||||
This archive captures high-level notes about the previous Biergarten frontend so active
|
||||
project documentation can focus on the current website in `src/Website`.
|
||||
|
||||
## Status
|
||||
|
||||
- `src/Website-v1` is retained for historical reference only
|
||||
- It is not the active frontend used by current setup, docs, or testing guidance
|
||||
- New product and engineering work should target `src/Website`
|
||||
|
||||
## Legacy Stack Summary
|
||||
|
||||
The archived frontend used a different application model from the current website:
|
||||
|
||||
- Next.js 14
|
||||
- React 18
|
||||
- Prisma
|
||||
- Postgres / Neon-hosted database workflows
|
||||
- Next.js API routes and server-side controllers
|
||||
- Additional third-party integrations such as Cloudinary, Mapbox, and SparkPost
|
||||
|
||||
## Why It Was Archived
|
||||
|
||||
The active website moved to a React Router-based frontend that talks directly to the .NET
|
||||
API. As part of that shift, the main docs were updated to describe:
|
||||
|
||||
- `src/Website` as the active frontend
|
||||
- React Router route modules and server rendering
|
||||
- Storybook-based component documentation and tests
|
||||
- Current frontend runtime variables: `API_BASE_URL`, `SESSION_SECRET`, and `NODE_ENV`
|
||||
|
||||
## Legacy Documentation Topics Moved Out of Active Docs
|
||||
|
||||
The following categories were removed from active documentation and intentionally archived:
|
||||
|
||||
- Next.js application structure guidance
|
||||
- Prisma and Postgres frontend setup
|
||||
- Legacy frontend environment variables
|
||||
- External service setup that only applied to `src/Website-v1`
|
||||
- Old frontend local setup instructions
|
||||
|
||||
## When To Use This Archive
|
||||
|
||||
Use this file only if you need to:
|
||||
|
||||
- inspect the historical frontend implementation
|
||||
- compare old flows against the current website
|
||||
- migrate or recover legacy logic from `src/Website-v1`
|
||||
|
||||
For all active work, use:
|
||||
|
||||
- [Getting Started](../getting-started.md)
|
||||
- [Architecture](../architecture.md)
|
||||
- [Environment Variables](../environment-variables.md)
|
||||
- [Testing](../testing.md)
|
||||
@@ -1,14 +1,15 @@
|
||||
# Environment Variables
|
||||
|
||||
Complete documentation for all environment variables used in The Biergarten App.
|
||||
This document covers the active environment variables used by the current Biergarten
|
||||
stack.
|
||||
|
||||
## Overview
|
||||
|
||||
The application uses environment variables for configuration across:
|
||||
The application uses environment variables for:
|
||||
|
||||
- **.NET API Backend** - Database connections, JWT secrets
|
||||
- **Next.js Frontend** - External services, authentication
|
||||
- **Docker Containers** - Runtime configuration
|
||||
- **.NET API backend** - database connections, token secrets, runtime settings
|
||||
- **React Router website** - API base URL and session signing
|
||||
- **Docker containers** - environment-specific orchestration
|
||||
|
||||
## Configuration Patterns
|
||||
|
||||
@@ -16,10 +17,10 @@ The application uses environment variables for configuration across:
|
||||
|
||||
Direct environment variable access via `Environment.GetEnvironmentVariable()`.
|
||||
|
||||
### Frontend (Next.js)
|
||||
### Frontend (`src/Website`)
|
||||
|
||||
Centralized configuration module at `src/Website/src/config/env/index.ts` with Zod
|
||||
validation.
|
||||
The active website reads runtime values from the server environment for its auth and API
|
||||
integration.
|
||||
|
||||
### Docker
|
||||
|
||||
@@ -71,6 +72,9 @@ REFRESH_TOKEN_SECRET=<generated-secret> # Signs long-lived refresh t
|
||||
|
||||
# Confirmation token secret (30-minute tokens)
|
||||
CONFIRMATION_TOKEN_SECRET=<generated-secret> # Signs email confirmation tokens
|
||||
|
||||
# Website base URL (used in confirmation emails)
|
||||
WEBSITE_BASE_URL=https://thebiergarten.app # Base URL for the website
|
||||
```
|
||||
|
||||
**Security Requirements**:
|
||||
@@ -125,91 +129,38 @@ ASPNETCORE_URLS=http://0.0.0.0:8080 # Binding address and port
|
||||
DOTNET_RUNNING_IN_CONTAINER=true # Flag for container execution
|
||||
```
|
||||
|
||||
## Frontend Variables (Next.js)
|
||||
## Frontend Variables (`src/Website`)
|
||||
|
||||
Create `.env.local` in the `Website/` directory.
|
||||
|
||||
### Base Configuration
|
||||
The active website does not use the old Next.js/Prisma environment model. Its core runtime
|
||||
variables are:
|
||||
|
||||
```bash
|
||||
BASE_URL=http://localhost:3000 # Application base URL
|
||||
NODE_ENV=development # Environment: development, production, test
|
||||
API_BASE_URL=http://localhost:8080 # Base URL for the .NET API
|
||||
SESSION_SECRET=<generated-secret> # Cookie session signing secret
|
||||
NODE_ENV=development # Standard Node runtime mode
|
||||
```
|
||||
|
||||
### Authentication & Sessions
|
||||
### Frontend Variable Details
|
||||
|
||||
```bash
|
||||
# Token signing secrets (use openssl rand -base64 127)
|
||||
CONFIRMATION_TOKEN_SECRET=<generated-secret> # Email confirmation tokens
|
||||
RESET_PASSWORD_TOKEN_SECRET=<generated-secret> # Password reset tokens
|
||||
SESSION_SECRET=<generated-secret> # Session cookie signing
|
||||
#### `API_BASE_URL`
|
||||
|
||||
# Session configuration
|
||||
SESSION_TOKEN_NAME=biergarten # Cookie name (optional)
|
||||
SESSION_MAX_AGE=604800 # Cookie max age in seconds (optional, default: 1 week)
|
||||
```
|
||||
- **Required**: Yes for local development
|
||||
- **Default in code**: `http://localhost:8080`
|
||||
- **Used by**: `src/Website/app/lib/auth.server.ts`
|
||||
- **Purpose**: Routes website auth actions to the .NET API
|
||||
|
||||
**Security Requirements**:
|
||||
#### `SESSION_SECRET`
|
||||
|
||||
- All secrets should be 127+ characters
|
||||
- Generate using cryptographically secure random functions
|
||||
- Never reuse secrets across environments
|
||||
- Rotate secrets periodically in production
|
||||
- **Required**: Strongly recommended in all environments
|
||||
- **Default in local code path**: `dev-secret-change-me`
|
||||
- **Used by**: React Router cookie session storage in `auth.server.ts`
|
||||
- **Purpose**: Signs and validates the website session cookie
|
||||
|
||||
### Database (Current - Prisma/Postgres)
|
||||
#### `NODE_ENV`
|
||||
|
||||
**Note**: Frontend currently uses Neon Postgres. Will migrate to .NET API.
|
||||
|
||||
```bash
|
||||
POSTGRES_PRISMA_URL=postgresql://user:pass@host/db?pgbouncer=true # Pooled connection
|
||||
POSTGRES_URL_NON_POOLING=postgresql://user:pass@host/db # Direct connection (migrations)
|
||||
SHADOW_DATABASE_URL=postgresql://user:pass@host/shadow_db # Prisma shadow DB (optional)
|
||||
```
|
||||
|
||||
### External Services
|
||||
|
||||
#### Cloudinary (Image Hosting)
|
||||
|
||||
```bash
|
||||
NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME=your-cloud-name # Public, client-accessible
|
||||
CLOUDINARY_KEY=your-api-key # Server-side API key
|
||||
CLOUDINARY_SECRET=your-api-secret # Server-side secret
|
||||
```
|
||||
|
||||
**Setup Steps**:
|
||||
|
||||
1. Sign up at [cloudinary.com](https://cloudinary.com)
|
||||
2. Navigate to Dashboard
|
||||
3. Copy Cloud Name, API Key, and API Secret
|
||||
|
||||
**Note**: `NEXT_PUBLIC_` prefix makes variable accessible in client-side code.
|
||||
|
||||
#### Mapbox (Maps & Geocoding)
|
||||
|
||||
```bash
|
||||
MAPBOX_ACCESS_TOKEN=pk.your-public-token
|
||||
```
|
||||
|
||||
**Setup Steps**:
|
||||
|
||||
1. Create account at [mapbox.com](https://mapbox.com)
|
||||
2. Navigate to Account → Tokens
|
||||
3. Create new token with public scopes
|
||||
4. Copy access token
|
||||
|
||||
#### SparkPost (Email Service)
|
||||
|
||||
```bash
|
||||
SPARKPOST_API_KEY=your-api-key
|
||||
SPARKPOST_SENDER_ADDRESS=noreply@yourdomain.com
|
||||
```
|
||||
|
||||
**Setup Steps**:
|
||||
|
||||
1. Sign up at [sparkpost.com](https://sparkpost.com)
|
||||
2. Verify sending domain or use sandbox
|
||||
3. Create API key with "Send via SMTP" permission
|
||||
4. Configure sender address (must match verified domain)
|
||||
- **Required**: No
|
||||
- **Typical values**: `development`, `production`, `test`
|
||||
- **Purpose**: Controls secure cookie behavior and runtime mode
|
||||
|
||||
### Admin Account (Seeding)
|
||||
|
||||
@@ -255,68 +206,39 @@ cp .env.example .env.dev
|
||||
# Edit .env.dev with your values
|
||||
```
|
||||
|
||||
## Legacy Frontend Variables
|
||||
|
||||
Variables for the archived Next.js frontend (`src/Website-v1`) have been removed from this
|
||||
active reference. See [archive/legacy-website-v1.md](archive/legacy-website-v1.md) if you
|
||||
need the legacy Prisma, Cloudinary, Mapbox, or SparkPost notes.
|
||||
|
||||
**Docker Compose Mapping**:
|
||||
|
||||
- `docker-compose.dev.yaml` → `.env.dev`
|
||||
- `docker-compose.test.yaml` → `.env.test`
|
||||
- `docker-compose.prod.yaml` → `.env.prod`
|
||||
|
||||
### Frontend (Website Directory)
|
||||
|
||||
```
|
||||
.env.local # Local development (gitignored)
|
||||
.env.production # Production (gitignored)
|
||||
```
|
||||
|
||||
**Setup**:
|
||||
|
||||
```bash
|
||||
cd Website
|
||||
touch .env.local
|
||||
# Add frontend variables
|
||||
```
|
||||
|
||||
## Variable Reference Table
|
||||
|
||||
| Variable | Backend | Frontend | Docker | Required | Notes |
|
||||
| ----------------------------------- | :-----: | :------: | :----: | :------: | ------------------------- |
|
||||
| **Database** |
|
||||
| ----------------------------- | :-----: | :------: | :----: | :------: | -------------------------- |
|
||||
| `DB_SERVER` | ✓ | | ✓ | Yes\* | SQL Server address |
|
||||
| `DB_NAME` | ✓ | | ✓ | Yes\* | Database name |
|
||||
| `DB_USER` | ✓ | | ✓ | Yes\* | SQL username |
|
||||
| `DB_PASSWORD` | ✓ | | ✓ | Yes\* | SQL password |
|
||||
| `DB_CONNECTION_STRING` | ✓ | | | Yes\* | Alternative to components |
|
||||
| `DB_TRUST_SERVER_CERTIFICATE` | ✓ | | ✓ | No | Defaults to True |
|
||||
| `SA_PASSWORD` | | | ✓ | Yes | SQL Server container |
|
||||
| **Authentication (Backend - JWT)** |
|
||||
| `ACCESS_TOKEN_SECRET` | ✓ | | ✓ | Yes | Access token secret |
|
||||
| `REFRESH_TOKEN_SECRET` | ✓ | | ✓ | Yes | Refresh token secret |
|
||||
| `CONFIRMATION_TOKEN_SECRET` | ✓ | | ✓ | Yes | Confirmation token secret |
|
||||
| **Authentication (Frontend)** |
|
||||
| `CONFIRMATION_TOKEN_SECRET` | | ✓ | | Yes | Email confirmation |
|
||||
| `RESET_PASSWORD_TOKEN_SECRET` | | ✓ | | Yes | Password reset |
|
||||
| `SESSION_SECRET` | | ✓ | | Yes | Session signing |
|
||||
| `SESSION_TOKEN_NAME` | | ✓ | | No | Default: "biergarten" |
|
||||
| `SESSION_MAX_AGE` | | ✓ | | No | Default: 604800 |
|
||||
| **Base Configuration** |
|
||||
| `BASE_URL` | | ✓ | | Yes | App base URL |
|
||||
| `NODE_ENV` | | ✓ | | Yes | Node environment |
|
||||
| `DB_TRUST_SERVER_CERTIFICATE` | ✓ | | ✓ | No | Defaults to `True` |
|
||||
| `ACCESS_TOKEN_SECRET` | ✓ | | ✓ | Yes | Access token signing |
|
||||
| `REFRESH_TOKEN_SECRET` | ✓ | | ✓ | Yes | Refresh token signing |
|
||||
| `CONFIRMATION_TOKEN_SECRET` | ✓ | | ✓ | Yes | Confirmation token signing |
|
||||
| `WEBSITE_BASE_URL` | ✓ | | | Yes | Website URL for emails |
|
||||
| `API_BASE_URL` | | ✓ | | Yes | Website-to-API base URL |
|
||||
| `SESSION_SECRET` | | ✓ | | Yes | Website session signing |
|
||||
| `NODE_ENV` | | ✓ | | No | Runtime mode |
|
||||
| `CLEAR_DATABASE` | ✓ | | ✓ | No | Dev/test reset flag |
|
||||
| `ASPNETCORE_ENVIRONMENT` | ✓ | | ✓ | Yes | ASP.NET environment |
|
||||
| `ASPNETCORE_URLS` | ✓ | | ✓ | Yes | API binding address |
|
||||
| **Database (Frontend - Current)** |
|
||||
| `POSTGRES_PRISMA_URL` | | ✓ | | Yes | Pooled connection |
|
||||
| `POSTGRES_URL_NON_POOLING` | | ✓ | | Yes | Direct connection |
|
||||
| `SHADOW_DATABASE_URL` | | ✓ | | No | Prisma shadow DB |
|
||||
| **External Services** |
|
||||
| `NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME` | | ✓ | | Yes | Public, client-side |
|
||||
| `CLOUDINARY_KEY` | | ✓ | | Yes | Server-side |
|
||||
| `CLOUDINARY_SECRET` | | ✓ | | Yes | Server-side |
|
||||
| `MAPBOX_ACCESS_TOKEN` | | ✓ | | Yes | Maps/geocoding |
|
||||
| `SPARKPOST_API_KEY` | | ✓ | | Yes | Email service |
|
||||
| `SPARKPOST_SENDER_ADDRESS` | | ✓ | | Yes | From address |
|
||||
| **Other** |
|
||||
| `ADMIN_PASSWORD` | | ✓ | | No | Seeding only |
|
||||
| `CLEAR_DATABASE` | ✓ | | ✓ | No | Dev/test only |
|
||||
| `SA_PASSWORD` | | | ✓ | Yes | SQL Server container |
|
||||
| `ACCEPT_EULA` | | | ✓ | Yes | SQL Server EULA |
|
||||
| `MSSQL_PID` | | | ✓ | No | SQL Server edition |
|
||||
| `DOTNET_RUNNING_IN_CONTAINER` | ✓ | | ✓ | No | Container flag |
|
||||
@@ -336,13 +258,12 @@ Variables are validated at startup:
|
||||
|
||||
### Frontend Validation
|
||||
|
||||
Zod schemas validate variables at runtime:
|
||||
The active website relies on runtime defaults for local development and the surrounding
|
||||
server environment in deployed environments.
|
||||
|
||||
- Type checking (string, number, URL, etc.)
|
||||
- Format validation (email, URL patterns)
|
||||
- Required vs optional enforcement
|
||||
|
||||
**Location**: `src/Website/src/config/env/index.ts`
|
||||
- `API_BASE_URL` defaults to `http://localhost:8080`
|
||||
- `SESSION_SECRET` falls back to a development-only local secret
|
||||
- `NODE_ENV` controls secure cookie behavior
|
||||
|
||||
## Example Configuration Files
|
||||
|
||||
@@ -359,6 +280,7 @@ DB_PASSWORD=Dev_Password_123!
|
||||
ACCESS_TOKEN_SECRET=<generated-with-openssl>
|
||||
REFRESH_TOKEN_SECRET=<generated-with-openssl>
|
||||
CONFIRMATION_TOKEN_SECRET=<generated-with-openssl>
|
||||
WEBSITE_BASE_URL=http://localhost:3000
|
||||
|
||||
# Migration
|
||||
CLEAR_DATABASE=true
|
||||
@@ -373,28 +295,10 @@ ACCEPT_EULA=Y
|
||||
MSSQL_PID=Express
|
||||
```
|
||||
|
||||
### `.env.local` (Frontend)
|
||||
### Frontend local runtime example
|
||||
|
||||
```bash
|
||||
# Base
|
||||
BASE_URL=http://localhost:3000
|
||||
NODE_ENV=development
|
||||
|
||||
# Authentication
|
||||
API_BASE_URL=http://localhost:8080
|
||||
SESSION_SECRET=<generated-with-openssl>
|
||||
|
||||
# Database (current Prisma setup)
|
||||
POSTGRES_PRISMA_URL=postgresql://user:pass@db.neon.tech/biergarten?pgbouncer=true
|
||||
POSTGRES_URL_NON_POOLING=postgresql://user:pass@db.neon.tech/biergarten
|
||||
|
||||
# External Services
|
||||
NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME=my-cloud
|
||||
CLOUDINARY_KEY=123456789012345
|
||||
CLOUDINARY_SECRET=abcdefghijklmnopqrstuvwxyz
|
||||
MAPBOX_ACCESS_TOKEN=pk.eyJ...
|
||||
SPARKPOST_API_KEY=abc123...
|
||||
SPARKPOST_SENDER_ADDRESS=noreply@biergarten.app
|
||||
|
||||
# Admin (for seeding)
|
||||
ADMIN_PASSWORD=Admin_Dev_Password_123!
|
||||
NODE_ENV=development
|
||||
```
|
||||
|
||||
@@ -1,19 +1,16 @@
|
||||
# Getting Started
|
||||
|
||||
This guide will help you set up and run The Biergarten App in your development
|
||||
environment.
|
||||
This guide covers local setup for the current Biergarten stack: the .NET backend in
|
||||
`src/Core` and the active React Router frontend in `src/Website`.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before you begin, ensure you have the following installed:
|
||||
- **.NET SDK 10+**
|
||||
- **Node.js 18+**
|
||||
- **Docker Desktop** or equivalent Docker Engine setup
|
||||
- **Java 8+** if you want to regenerate PlantUML diagrams
|
||||
|
||||
- **.NET SDK 10+** - [Download](https://dotnet.microsoft.com/download)
|
||||
- **Node.js 18+** - [Download](https://nodejs.org/)
|
||||
- **Docker Desktop** - [Download](https://www.docker.com/products/docker-desktop)
|
||||
(recommended)
|
||||
- **Java 8+** - Required for generating diagrams from PlantUML (optional)
|
||||
|
||||
## Quick Start with Docker (Recommended)
|
||||
## Recommended Path: Docker for Backend, Node for Frontend
|
||||
|
||||
### 1. Clone the Repository
|
||||
|
||||
@@ -22,174 +19,120 @@ git clone <repository-url>
|
||||
cd the-biergarten-app
|
||||
```
|
||||
|
||||
### 2. Configure Environment Variables
|
||||
|
||||
Copy the example environment file:
|
||||
### 2. Configure Backend Environment Variables
|
||||
|
||||
```bash
|
||||
cp .env.example .env.dev
|
||||
```
|
||||
|
||||
Edit `.env.dev` with your configuration:
|
||||
At minimum, ensure `.env.dev` includes valid database and token values:
|
||||
|
||||
```bash
|
||||
# Database (component-based for Docker)
|
||||
DB_SERVER=sqlserver,1433
|
||||
DB_NAME=Biergarten
|
||||
DB_USER=sa
|
||||
DB_PASSWORD=YourStrong!Passw0rd
|
||||
|
||||
# JWT Authentication
|
||||
JWT_SECRET=your-secret-key-minimum-32-characters-required
|
||||
ACCESS_TOKEN_SECRET=<generated>
|
||||
REFRESH_TOKEN_SECRET=<generated>
|
||||
CONFIRMATION_TOKEN_SECRET=<generated>
|
||||
WEBSITE_BASE_URL=http://localhost:3000
|
||||
```
|
||||
|
||||
> For a complete list of environment variables, see
|
||||
> [Environment Variables](environment-variables.md).
|
||||
See [Environment Variables](environment-variables.md) for the full list.
|
||||
|
||||
### 3. Start the Development Environment
|
||||
### 3. Start the Backend Stack
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.dev.yaml up -d
|
||||
```
|
||||
|
||||
This command will:
|
||||
This starts SQL Server, migrations, seeding, and the API.
|
||||
|
||||
- Start SQL Server container
|
||||
- Run database migrations
|
||||
- Seed initial data
|
||||
- Start the API on http://localhost:8080
|
||||
Available endpoints:
|
||||
|
||||
### 4. Access the API
|
||||
- API Swagger: http://localhost:8080/swagger
|
||||
- Health Check: http://localhost:8080/health
|
||||
|
||||
- **Swagger UI**: http://localhost:8080/swagger
|
||||
- **Health Check**: http://localhost:8080/health
|
||||
|
||||
### 5. View Logs
|
||||
### 4. Start the Active Frontend
|
||||
|
||||
```bash
|
||||
# All services
|
||||
docker compose -f docker-compose.dev.yaml logs -f
|
||||
|
||||
# Specific service
|
||||
docker compose -f docker-compose.dev.yaml logs -f api.core
|
||||
cd src/Website
|
||||
npm install
|
||||
API_BASE_URL=http://localhost:8080 SESSION_SECRET=dev-secret-change-me npm run dev
|
||||
```
|
||||
|
||||
### 6. Stop the Environment
|
||||
The website will be available at the local address printed by React Router dev.
|
||||
|
||||
Required frontend runtime variables for local work:
|
||||
|
||||
- `API_BASE_URL` - Base URL for the .NET API
|
||||
- `SESSION_SECRET` - Cookie session signing secret for the website server
|
||||
|
||||
### 5. Optional: Run Storybook
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.dev.yaml down
|
||||
cd src/Website
|
||||
npm run storybook
|
||||
```
|
||||
|
||||
# Remove volumes (fresh start)
|
||||
Storybook runs at http://localhost:6006 by default.
|
||||
|
||||
## Useful Commands
|
||||
|
||||
### Backend
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.dev.yaml logs -f
|
||||
docker compose -f docker-compose.dev.yaml down
|
||||
docker compose -f docker-compose.dev.yaml down -v
|
||||
```
|
||||
|
||||
## Manual Setup (Without Docker)
|
||||
|
||||
If you prefer to run services locally without Docker:
|
||||
|
||||
### Backend Setup
|
||||
|
||||
#### 1. Start SQL Server
|
||||
|
||||
You can use a local SQL Server instance or a cloud-hosted one. Ensure it's accessible and
|
||||
you have the connection details.
|
||||
|
||||
#### 2. Set Environment Variables
|
||||
### Frontend
|
||||
|
||||
```bash
|
||||
# macOS/Linux
|
||||
export DB_CONNECTION_STRING="Server=localhost,1433;Database=Biergarten;User Id=sa;Password=YourStrong!Passw0rd;TrustServerCertificate=True;"
|
||||
export JWT_SECRET="your-secret-key-minimum-32-characters-required"
|
||||
|
||||
# Windows PowerShell
|
||||
$env:DB_CONNECTION_STRING="Server=localhost,1433;Database=Biergarten;User Id=sa;Password=YourStrong!Passw0rd;TrustServerCertificate=True;"
|
||||
$env:JWT_SECRET="your-secret-key-minimum-32-characters-required"
|
||||
cd src/Website
|
||||
npm run lint
|
||||
npm run typecheck
|
||||
npm run format:check
|
||||
npm run test:storybook
|
||||
npm run test:storybook:playwright
|
||||
```
|
||||
|
||||
#### 3. Run Database Migrations
|
||||
## Manual Backend Setup
|
||||
|
||||
If you do not want to use Docker, you can run the backend locally.
|
||||
|
||||
### 1. Set Environment Variables
|
||||
|
||||
```bash
|
||||
export DB_CONNECTION_STRING="Server=localhost,1433;Database=Biergarten;User Id=sa;Password=YourStrong!Passw0rd;TrustServerCertificate=True;"
|
||||
export ACCESS_TOKEN_SECRET="<generated>"
|
||||
export REFRESH_TOKEN_SECRET="<generated>"
|
||||
export CONFIRMATION_TOKEN_SECRET="<generated>"
|
||||
export WEBSITE_BASE_URL="http://localhost:3000"
|
||||
```
|
||||
|
||||
### 2. Run Migrations and Seed
|
||||
|
||||
```bash
|
||||
cd src/Core
|
||||
dotnet run --project Database/Database.Migrations/Database.Migrations.csproj
|
||||
```
|
||||
|
||||
#### 4. Seed the Database
|
||||
|
||||
```bash
|
||||
dotnet run --project Database/Database.Seed/Database.Seed.csproj
|
||||
```
|
||||
|
||||
#### 5. Start the API
|
||||
### 3. Start the API
|
||||
|
||||
```bash
|
||||
dotnet run --project API/API.Core/API.Core.csproj
|
||||
```
|
||||
|
||||
The API will be available at http://localhost:5000 (or the port specified in
|
||||
launchSettings.json).
|
||||
|
||||
### Frontend Setup
|
||||
|
||||
> **Note**: The frontend is currently transitioning from its standalone Prisma/Postgres
|
||||
> backend to the .NET API. Some features may still use the old backend.
|
||||
|
||||
#### 1. Navigate to Website Directory
|
||||
|
||||
```bash
|
||||
cd Website
|
||||
```
|
||||
|
||||
#### 2. Create Environment File
|
||||
|
||||
Create `.env.local` with frontend variables. See
|
||||
[Environment Variables - Frontend](environment-variables.md#frontend-variables) for the
|
||||
complete list.
|
||||
|
||||
```bash
|
||||
BASE_URL=http://localhost:3000
|
||||
NODE_ENV=development
|
||||
|
||||
# Generate secrets
|
||||
CONFIRMATION_TOKEN_SECRET=$(openssl rand -base64 127)
|
||||
RESET_PASSWORD_TOKEN_SECRET=$(openssl rand -base64 127)
|
||||
SESSION_SECRET=$(openssl rand -base64 127)
|
||||
|
||||
# External services (you'll need to register for these)
|
||||
NEXT_PUBLIC_CLOUDINARY_CLOUD_NAME=your-cloud-name
|
||||
CLOUDINARY_KEY=your-api-key
|
||||
CLOUDINARY_SECRET=your-api-secret
|
||||
NEXT_PUBLIC_MAPBOX_KEY=your-mapbox-token
|
||||
|
||||
# Database URL (current Prisma setup)
|
||||
DATABASE_URL=your-postgres-connection-string
|
||||
```
|
||||
|
||||
#### 3. Install Dependencies
|
||||
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
|
||||
#### 4. Run Prisma Migrations
|
||||
|
||||
```bash
|
||||
npx prisma generate
|
||||
npx prisma migrate dev
|
||||
```
|
||||
|
||||
#### 5. Start Development Server
|
||||
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
|
||||
The frontend will be available at http://localhost:3000.
|
||||
## Legacy Frontend Note
|
||||
|
||||
The previous Next.js frontend now lives in `src/Website-v1` and is not the active website.
|
||||
Legacy setup details have been moved to [docs/archive/legacy-website-v1.md](archive/legacy-website-v1.md).
|
||||
|
||||
## Next Steps
|
||||
|
||||
- **Test the API**: Visit http://localhost:8080/swagger and try the endpoints
|
||||
- **Run Tests**: See [Testing Guide](testing.md)
|
||||
- **Learn the Architecture**: Read [Architecture Overview](architecture.md)
|
||||
- **Understand Docker Setup**: See [Docker Guide](docker.md)
|
||||
- **Database Details**: Check [Database Schema](database.md)
|
||||
- Review [Architecture](architecture.md)
|
||||
- Run backend and frontend checks from [Testing](testing.md)
|
||||
- Use [Docker Guide](docker.md) for container troubleshooting
|
||||
|
||||
@@ -4,11 +4,13 @@ This document describes the testing strategy and how to run tests for The Bierga
|
||||
|
||||
## Overview
|
||||
|
||||
The project uses a multi-layered testing approach:
|
||||
The project uses a multi-layered testing approach across backend and frontend:
|
||||
|
||||
- **API.Specs** - BDD integration tests using Reqnroll (Gherkin)
|
||||
- **Infrastructure.Repository.Tests** - Unit tests for data access layer
|
||||
- **Service.Auth.Tests** - Unit tests for authentication business logic
|
||||
- **Storybook Vitest project** - Browser-based interaction tests for shared website stories
|
||||
- **Storybook Playwright suite** - Browser checks against Storybook-rendered components
|
||||
|
||||
## Running Tests with Docker (Recommended)
|
||||
|
||||
@@ -86,6 +88,33 @@ dotnet test Service/Service.Auth.Tests/Service.Auth.Tests.csproj
|
||||
|
||||
- No database required (uses Moq for mocking)
|
||||
|
||||
### Frontend Storybook Tests
|
||||
|
||||
```bash
|
||||
cd src/Website
|
||||
npm install
|
||||
npm run test:storybook
|
||||
```
|
||||
|
||||
**Purpose**:
|
||||
|
||||
- Verifies shared stories such as form fields, submit buttons, navbar states, toasts, and the theme gallery
|
||||
- Runs in browser mode via Vitest and Storybook integration
|
||||
|
||||
### Frontend Playwright Storybook Tests
|
||||
|
||||
```bash
|
||||
cd src/Website
|
||||
npm install
|
||||
npm run test:storybook:playwright
|
||||
```
|
||||
|
||||
**Requirements**:
|
||||
|
||||
- Storybook dependencies installed
|
||||
- Playwright browser dependencies installed
|
||||
- The command will start or reuse the Storybook server defined in `playwright.storybook.config.ts`
|
||||
|
||||
## Test Coverage
|
||||
|
||||
### Current Coverage
|
||||
@@ -112,6 +141,14 @@ dotnet test Service/Service.Auth.Tests/Service.Auth.Tests.csproj
|
||||
- Register service with validation
|
||||
- Business logic for authentication flow
|
||||
|
||||
**Frontend UI Coverage**:
|
||||
|
||||
- Shared submit button states
|
||||
- Form field happy path and error presentation
|
||||
- Navbar guest, authenticated, and mobile behavior
|
||||
- Theme gallery rendering across Biergarten themes
|
||||
- Toast interactions and themed notification display
|
||||
|
||||
### Planned Coverage
|
||||
|
||||
- [ ] Email verification workflow
|
||||
@@ -121,6 +158,7 @@ dotnet test Service/Service.Auth.Tests/Service.Auth.Tests.csproj
|
||||
- [ ] Beer post operations
|
||||
- [ ] User follow/unfollow
|
||||
- [ ] Image upload service
|
||||
- [ ] Frontend route integration coverage beyond Storybook stories
|
||||
|
||||
## Testing Frameworks & Tools
|
||||
|
||||
@@ -254,6 +292,15 @@ Exit codes:
|
||||
- `0` - All tests passed
|
||||
- Non-zero - Test failures occurred
|
||||
|
||||
Frontend UI checks should also be included in CI for the active website workspace:
|
||||
|
||||
```bash
|
||||
cd src/Website
|
||||
npm ci
|
||||
npm run test:storybook
|
||||
npm run test:storybook:playwright
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Tests Failing Due to Database Connection
|
||||
|
||||
5
pipeline/.clang-format
Normal file
@@ -0,0 +1,5 @@
|
||||
---
|
||||
BasedOnStyle: Google
|
||||
ColumnLimit: 80
|
||||
IndentWidth: 3
|
||||
...
|
||||
17
pipeline/.clang-tidy
Normal file
@@ -0,0 +1,17 @@
|
||||
---
|
||||
Checks: >
|
||||
-*,
|
||||
bugprone-*,
|
||||
clang-analyzer-*,
|
||||
cppcoreguidelines-*,
|
||||
google-*,
|
||||
modernize-*,
|
||||
performance-*,
|
||||
readability-*,
|
||||
-cppcoreguidelines-avoid-magic-numbers,
|
||||
-cppcoreguidelines-owning-memory,
|
||||
-readability-magic-numbers,
|
||||
-google-readability-todo
|
||||
HeaderFilterRegex: "^(src|includes)/.*"
|
||||
FormatStyle: file
|
||||
...
|
||||
3
pipeline/.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
dist
|
||||
build
|
||||
data
|
||||
170
pipeline/CMakeLists.txt
Normal file
@@ -0,0 +1,170 @@
|
||||
cmake_minimum_required(VERSION 3.20)
|
||||
project(biergarten-pipeline VERSION 0.1.0 LANGUAGES CXX)
|
||||
|
||||
# Allows older dependencies to configure on newer CMake.
|
||||
set(CMAKE_POLICY_VERSION_MINIMUM 3.5)
|
||||
|
||||
# Policies
|
||||
cmake_policy(SET CMP0167 NEW) # FindBoost improvements
|
||||
|
||||
# Global Settings
|
||||
set(CMAKE_CXX_STANDARD 23)
|
||||
set(CMAKE_CXX_STANDARD_REQUIRED ON)
|
||||
set(CMAKE_CXX_EXTENSIONS OFF)
|
||||
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
|
||||
|
||||
option(ENABLE_CLANG_TIDY "Enable clang-tidy static analysis for project targets" ON)
|
||||
option(ENABLE_CLANG_FORMAT_TARGETS "Enable clang-format helper targets" ON)
|
||||
|
||||
if(ENABLE_CLANG_TIDY)
|
||||
find_program(CLANG_TIDY_EXE NAMES clang-tidy)
|
||||
if(CLANG_TIDY_EXE)
|
||||
set(BIERGARTEN_CLANG_TIDY_COMMAND
|
||||
"${CLANG_TIDY_EXE};--config-file=${CMAKE_CURRENT_SOURCE_DIR}/.clang-tidy")
|
||||
message(STATUS "clang-tidy enabled: ${CLANG_TIDY_EXE}")
|
||||
else()
|
||||
message(STATUS "clang-tidy not found; static analysis is disabled")
|
||||
endif()
|
||||
endif()
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Compiler Options & Warnings (Interface Library)
|
||||
# -----------------------------------------------------------------------------
|
||||
add_library(project_options INTERFACE)
|
||||
target_compile_options(project_options INTERFACE
|
||||
$<$<CXX_COMPILER_ID:GNU,Clang>:
|
||||
-Wall -Wextra -Wpedantic -Wshadow -Wconversion -Wsign-conversion -Wunused
|
||||
>
|
||||
$<$<CXX_COMPILER_ID:MSVC>:
|
||||
/W4 /WX /permissive-
|
||||
>
|
||||
)
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Dependencies
|
||||
# -----------------------------------------------------------------------------
|
||||
find_package(CURL REQUIRED)
|
||||
find_package(SQLite3 REQUIRED)
|
||||
find_package(Boost 1.75 REQUIRED COMPONENTS program_options json)
|
||||
|
||||
include(FetchContent)
|
||||
|
||||
# spdlog (Logging)
|
||||
FetchContent_Declare(
|
||||
spdlog
|
||||
GIT_REPOSITORY https://github.com/gabime/spdlog.git
|
||||
GIT_TAG v1.11.0
|
||||
)
|
||||
FetchContent_MakeAvailable(spdlog)
|
||||
|
||||
# llama.cpp (LLM Inference)
|
||||
set(LLAMA_BUILD_TESTS OFF CACHE BOOL "" FORCE)
|
||||
set(LLAMA_BUILD_EXAMPLES OFF CACHE BOOL "" FORCE)
|
||||
set(LLAMA_BUILD_SERVER OFF CACHE BOOL "" FORCE)
|
||||
FetchContent_Declare(
|
||||
llama_cpp
|
||||
GIT_REPOSITORY https://github.com/ggerganov/llama.cpp.git
|
||||
GIT_TAG b8611
|
||||
)
|
||||
FetchContent_MakeAvailable(llama_cpp)
|
||||
|
||||
if(TARGET llama)
|
||||
target_compile_options(llama PRIVATE
|
||||
$<$<CXX_COMPILER_ID:AppleClang>:-include algorithm>
|
||||
)
|
||||
endif()
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Main Executable
|
||||
# -----------------------------------------------------------------------------
|
||||
set(PIPELINE_SOURCES
|
||||
src/biergarten_data_generator.cpp
|
||||
src/web_client/curl_web_client.cpp
|
||||
src/data_generation/data_downloader.cpp
|
||||
src/database/database.cpp
|
||||
src/json_handling/json_loader.cpp
|
||||
src/data_generation/llama/destructor.cpp
|
||||
src/data_generation/llama/set_sampling_options.cpp
|
||||
src/data_generation/llama/load.cpp
|
||||
src/data_generation/llama/infer.cpp
|
||||
src/data_generation/llama/generate_brewery.cpp
|
||||
src/data_generation/llama/generate_user.cpp
|
||||
src/data_generation/llama/helpers.cpp
|
||||
src/data_generation/llama/load_brewery_prompt.cpp
|
||||
src/data_generation/mock/data.cpp
|
||||
src/data_generation/mock/deterministic_hash.cpp
|
||||
src/data_generation/mock/load.cpp
|
||||
src/data_generation/mock/generate_brewery.cpp
|
||||
src/data_generation/mock/generate_user.cpp
|
||||
src/json_handling/stream_parser.cpp
|
||||
src/wikipedia/wikipedia_service.cpp
|
||||
src/main.cpp
|
||||
)
|
||||
|
||||
add_executable(biergarten-pipeline ${PIPELINE_SOURCES})
|
||||
|
||||
if(BIERGARTEN_CLANG_TIDY_COMMAND)
|
||||
set_target_properties(biergarten-pipeline PROPERTIES
|
||||
CXX_CLANG_TIDY "${BIERGARTEN_CLANG_TIDY_COMMAND}"
|
||||
)
|
||||
endif()
|
||||
|
||||
target_include_directories(biergarten-pipeline
|
||||
PRIVATE
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/includes
|
||||
${llama_cpp_SOURCE_DIR}/include
|
||||
)
|
||||
|
||||
target_link_libraries(biergarten-pipeline
|
||||
PRIVATE
|
||||
project_options
|
||||
CURL::libcurl
|
||||
SQLite::SQLite3
|
||||
spdlog::spdlog
|
||||
llama
|
||||
Boost::program_options
|
||||
Boost::json
|
||||
)
|
||||
|
||||
if(ENABLE_CLANG_FORMAT_TARGETS)
|
||||
find_program(CLANG_FORMAT_EXE NAMES clang-format)
|
||||
if(CLANG_FORMAT_EXE)
|
||||
file(GLOB_RECURSE FORMAT_SOURCES CONFIGURE_DEPENDS
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/src/**/*.cpp
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/src/**/*.cc
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/includes/**/*.h
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/includes/**/*.hpp
|
||||
)
|
||||
|
||||
add_custom_target(format
|
||||
COMMAND ${CLANG_FORMAT_EXE} -style=file -i ${FORMAT_SOURCES}
|
||||
COMMENT "Formatting source files with clang-format (Google style)"
|
||||
VERBATIM
|
||||
)
|
||||
|
||||
add_custom_target(format-check
|
||||
COMMAND ${CLANG_FORMAT_EXE} -style=file --dry-run --Werror ${FORMAT_SOURCES}
|
||||
COMMENT "Checking source formatting with clang-format (Google style)"
|
||||
VERBATIM
|
||||
)
|
||||
else()
|
||||
message(STATUS "clang-format not found; format targets are disabled")
|
||||
endif()
|
||||
endif()
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Post-Build Steps & Utilities
|
||||
# -----------------------------------------------------------------------------
|
||||
add_custom_command(TARGET biergarten-pipeline POST_BUILD
|
||||
COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_CURRENT_SOURCE_DIR}/output
|
||||
COMMENT "Ensuring output directory exists"
|
||||
)
|
||||
|
||||
find_program(VALGRIND valgrind)
|
||||
if(VALGRIND)
|
||||
add_custom_target(memcheck
|
||||
COMMAND ${VALGRIND} --leak-check=full --error-exitcode=1 $<TARGET_FILE:biergarten-pipeline> --help
|
||||
DEPENDS biergarten-pipeline
|
||||
COMMENT "Running Valgrind memory check"
|
||||
)
|
||||
endif()
|
||||
406
pipeline/README.md
Normal file
@@ -0,0 +1,406 @@
|
||||
# Biergarten Pipeline
|
||||
|
||||
A high-performance C++23 data pipeline for fetching, parsing, and storing geographic data (countries, states, cities) with brewery metadata generation capabilities. The system supports both mock and LLM-based (llama.cpp) generation modes.
|
||||
|
||||
## Overview
|
||||
|
||||
The pipeline orchestrates **four key stages**:
|
||||
|
||||
1. **Download** - Fetches `countries+states+cities.json` from a pinned GitHub commit with optional local filesystem caching
|
||||
2. **Parse** - Streams JSON using Boost.JSON's `basic_parser` to extract country/state/city records without loading the entire file into memory
|
||||
3. **Store** - Inserts records into a file-based SQLite database with all operations performed sequentially in a single thread
|
||||
4. **Generate** - Produces brewery metadata or user profiles (mock implementation; supports future LLM integration via llama.cpp)
|
||||
|
||||
## System Architecture
|
||||
|
||||
### Data Sources and Formats
|
||||
|
||||
- **Hierarchical Structure**: Countries array → states per country → cities per state
|
||||
- **Data Fields**:
|
||||
- `id` (integer)
|
||||
- `name` (string)
|
||||
- `iso2` / `iso3` (ISO country/state codes)
|
||||
- `latitude` / `longitude` (geographic coordinates)
|
||||
- **Source**: [dr5hn/countries-states-cities-database](https://github.com/dr5hn/countries-states-cities-database) on GitHub
|
||||
- **Output**: Structured SQLite file-based database (`biergarten-pipeline.db`) + structured logging via spdlog
|
||||
|
||||
### Concurrency Model
|
||||
|
||||
The pipeline currently operates **single-threaded** with sequential stage execution:
|
||||
|
||||
1. **Download Phase**: Main thread blocks while downloading the source JSON file (if not in cache)
|
||||
2. **Parse & Store Phase**: Main thread performs streaming JSON parse with immediate SQLite inserts
|
||||
|
||||
**Thread Safety**: While single-threaded, the `SqliteDatabase` component is **mutex-protected** using `std::mutex` (`dbMutex`) for all database operations. This design enables safe future parallelization without code modifications.
|
||||
|
||||
## Core Components
|
||||
|
||||
| Component | Purpose | Thread Safety | Dependencies |
|
||||
| ----------------------------- | ----------------------------------------------------------------------------------------------- | -------------------------------------------- | --------------------------------------------- |
|
||||
| **BiergartenDataGenerator** | Orchestrates pipeline execution; manages lifecycle of downloader, parser, and generator | Single-threaded coordinator | ApplicationOptions, WebClient, SqliteDatabase |
|
||||
| **DataDownloader** | HTTP fetch with curl; optional filesystem cache; ETag support and retries | Blocking I/O; safe for startup | IWebClient, filesystem |
|
||||
| **StreamingJsonParser** | Extends `boost::json::basic_parser`; emits country/state/city via callbacks; tracks parse depth | Single-threaded parse; callbacks thread-safe | Boost.JSON |
|
||||
| **JsonLoader** | Wraps parser; dispatches callbacks for country/state/city; manages WorkQueue lifecycle | Produces to WorkQueue; safe callbacks | StreamingJsonParser, SqliteDatabase |
|
||||
| **SqliteDatabase** | Manages schema initialization; insert/query methods for geographic data | Mutex-guarded all operations | SQLite3 |
|
||||
| **IDataGenerator** (Abstract) | Interface for brewery/user metadata generation | Stateless virtual methods | N/A |
|
||||
| **LlamaGenerator** | LLM-based generation via llama.cpp; configurable sampling (temperature, top-p, seed) | Manages llama_model* and llama_context* | llama.cpp, BreweryResult, UserResult |
|
||||
| **MockGenerator** | Deterministic mock generation using seeded randomization | Stateless; thread-safe | N/A |
|
||||
| **CURLWebClient** | HTTP client adapter; URL encoding; file downloads | cURL library bindings | libcurl |
|
||||
| **WikipediaService** | (Planned) Wikipedia data lookups for enrichment | N/A | IWebClient |
|
||||
|
||||
## Database Schema
|
||||
|
||||
SQLite file-based database with **three core tables** and **indexes for fast lookups**:
|
||||
|
||||
### Countries
|
||||
|
||||
```sql
|
||||
CREATE TABLE countries (
|
||||
id INTEGER PRIMARY KEY,
|
||||
name TEXT NOT NULL,
|
||||
iso2 TEXT,
|
||||
iso3 TEXT
|
||||
);
|
||||
CREATE INDEX idx_countries_iso2 ON countries(iso2);
|
||||
```
|
||||
|
||||
### States
|
||||
|
||||
```sql
|
||||
CREATE TABLE states (
|
||||
id INTEGER PRIMARY KEY,
|
||||
country_id INTEGER NOT NULL,
|
||||
name TEXT NOT NULL,
|
||||
iso2 TEXT,
|
||||
FOREIGN KEY (country_id) REFERENCES countries(id)
|
||||
);
|
||||
CREATE INDEX idx_states_country ON states(country_id);
|
||||
```
|
||||
|
||||
### Cities
|
||||
|
||||
```sql
|
||||
CREATE TABLE cities (
|
||||
id INTEGER PRIMARY KEY,
|
||||
state_id INTEGER NOT NULL,
|
||||
country_id INTEGER NOT NULL,
|
||||
name TEXT NOT NULL,
|
||||
latitude REAL,
|
||||
longitude REAL,
|
||||
FOREIGN KEY (state_id) REFERENCES states(id),
|
||||
FOREIGN KEY (country_id) REFERENCES countries(id)
|
||||
);
|
||||
CREATE INDEX idx_cities_state ON cities(state_id);
|
||||
CREATE INDEX idx_cities_country ON cities(country_id);
|
||||
```
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```plantuml
|
||||
@startuml biergarten-pipeline
|
||||
!theme plain
|
||||
skinparam monochrome true
|
||||
skinparam classBackgroundColor #FFFFFF
|
||||
skinparam classBorderColor #000000
|
||||
|
||||
package "Application Layer" {
|
||||
class BiergartenDataGenerator {
|
||||
- options: ApplicationOptions
|
||||
- webClient: IWebClient
|
||||
- database: SqliteDatabase
|
||||
- generator: IDataGenerator
|
||||
--
|
||||
+ Run() : int
|
||||
}
|
||||
}
|
||||
|
||||
package "Data Acquisition" {
|
||||
class DataDownloader {
|
||||
- webClient: IWebClient
|
||||
--
|
||||
+ Download(url: string, filePath: string)
|
||||
+ DownloadWithCache(url: string, cachePath: string)
|
||||
}
|
||||
|
||||
interface IWebClient {
|
||||
+ DownloadToFile(url: string, filePath: string)
|
||||
+ Get(url: string) : string
|
||||
+ UrlEncode(value: string) : string
|
||||
}
|
||||
|
||||
class CURLWebClient {
|
||||
- globalState: CurlGlobalState
|
||||
--
|
||||
+ DownloadToFile(url: string, filePath: string)
|
||||
+ Get(url: string) : string
|
||||
+ UrlEncode(value: string) : string
|
||||
}
|
||||
}
|
||||
|
||||
package "JSON Processing" {
|
||||
class StreamingJsonParser {
|
||||
- depth: int
|
||||
--
|
||||
+ on_object_begin()
|
||||
+ on_object_end()
|
||||
+ on_array_begin()
|
||||
+ on_array_end()
|
||||
+ on_key(str: string)
|
||||
+ on_string(str: string)
|
||||
+ on_number(value: int)
|
||||
}
|
||||
|
||||
class JsonLoader {
|
||||
--
|
||||
+ LoadWorldCities(jsonPath: string, db: SqliteDatabase)
|
||||
}
|
||||
}
|
||||
|
||||
package "Data Storage" {
|
||||
class SqliteDatabase {
|
||||
- db: sqlite3*
|
||||
- dbMutex: std::mutex
|
||||
--
|
||||
+ Initialize(dbPath: string)
|
||||
+ InsertCountry(id: int, name: string, iso2: string, iso3: string)
|
||||
+ InsertState(id: int, countryId: int, name: string, iso2: string)
|
||||
+ InsertCity(id: int, stateId: int, countryId: int, name: string, lat: double, lon: double)
|
||||
+ QueryCountries(limit: int) : vector<Country>
|
||||
+ QueryStates(limit: int) : vector<State>
|
||||
+ QueryCities() : vector<City>
|
||||
+ BeginTransaction()
|
||||
+ CommitTransaction()
|
||||
# InitializeSchema()
|
||||
}
|
||||
|
||||
struct Country {
|
||||
id: int
|
||||
name: string
|
||||
iso2: string
|
||||
iso3: string
|
||||
}
|
||||
|
||||
struct State {
|
||||
id: int
|
||||
name: string
|
||||
iso2: string
|
||||
countryId: int
|
||||
}
|
||||
|
||||
struct City {
|
||||
id: int
|
||||
name: string
|
||||
countryId: int
|
||||
}
|
||||
}
|
||||
|
||||
package "Data Generation" {
|
||||
interface IDataGenerator {
|
||||
+ load(modelPath: string)
|
||||
+ generateBrewery(cityName: string, countryName: string, regionContext: string) : BreweryResult
|
||||
+ generateUser(locale: string) : UserResult
|
||||
}
|
||||
|
||||
class LlamaGenerator {
|
||||
- model: llama_model*
|
||||
- context: llama_context*
|
||||
- sampling_temperature: float
|
||||
- sampling_top_p: float
|
||||
- sampling_seed: uint32_t
|
||||
--
|
||||
+ load(modelPath: string)
|
||||
+ generateBrewery(...) : BreweryResult
|
||||
+ generateUser(locale: string) : UserResult
|
||||
+ setSamplingOptions(temperature: float, topP: float, seed: int)
|
||||
# infer(prompt: string) : string
|
||||
}
|
||||
|
||||
class MockGenerator {
|
||||
--
|
||||
+ load(modelPath: string)
|
||||
+ generateBrewery(...) : BreweryResult
|
||||
+ generateUser(locale: string) : UserResult
|
||||
}
|
||||
|
||||
struct BreweryResult {
|
||||
name: string
|
||||
description: string
|
||||
}
|
||||
|
||||
struct UserResult {
|
||||
username: string
|
||||
bio: string
|
||||
}
|
||||
}
|
||||
|
||||
package "Enrichment (Planned)" {
|
||||
class WikipediaService {
|
||||
- webClient: IWebClient
|
||||
--
|
||||
+ SearchCity(cityName: string, countryName: string) : string
|
||||
}
|
||||
}
|
||||
|
||||
' Relationships
|
||||
BiergartenDataGenerator --> DataDownloader
|
||||
BiergartenDataGenerator --> JsonLoader
|
||||
BiergartenDataGenerator --> SqliteDatabase
|
||||
BiergartenDataGenerator --> IDataGenerator
|
||||
|
||||
DataDownloader --> IWebClient
|
||||
CURLWebClient ..|> IWebClient
|
||||
|
||||
JsonLoader --> StreamingJsonParser
|
||||
JsonLoader --> SqliteDatabase
|
||||
|
||||
LlamaGenerator ..|> IDataGenerator
|
||||
MockGenerator ..|> IDataGenerator
|
||||
|
||||
SqliteDatabase --> Country
|
||||
SqliteDatabase --> State
|
||||
SqliteDatabase --> City
|
||||
|
||||
LlamaGenerator --> BreweryResult
|
||||
LlamaGenerator --> UserResult
|
||||
MockGenerator --> BreweryResult
|
||||
MockGenerator --> UserResult
|
||||
|
||||
WikipediaService --> IWebClient
|
||||
|
||||
@enduml
|
||||
```
|
||||
|
||||
## Configuration and Extensibility
|
||||
|
||||
### Command-Line Arguments
|
||||
|
||||
Boost.Program_options provides named CLI arguments. Running without arguments displays usage instructions.
|
||||
|
||||
```bash
|
||||
./biergarten-pipeline [options]
|
||||
```
|
||||
|
||||
**Requirement**: Exactly one of `--mocked` or `--model` must be specified.
|
||||
|
||||
| Argument | Short | Type | Purpose |
|
||||
| --------------- | ----- | ------ | --------------------------------------------------------------- |
|
||||
| `--mocked` | - | flag | Use mocked generator for brewery/user data |
|
||||
| `--model` | `-m` | string | Path to LLM model file (gguf); mutually exclusive with --mocked |
|
||||
| `--cache-dir` | `-c` | path | Directory for cached JSON (default: `/tmp`) |
|
||||
| `--temperature` | - | float | LLM sampling temperature 0.0-1.0 (default: `0.8`) |
|
||||
| `--top-p` | - | float | Nucleus sampling parameter 0.0-1.0 (default: `0.92`) |
|
||||
| `--seed` | - | int | Random seed: -1 for random (default: `-1`) |
|
||||
| `--help` | `-h` | flag | Show help message |
|
||||
|
||||
**Note**: The data source is always pinned to commit `c5eb7772` (stable 2026-03-28) and cannot be changed.
|
||||
|
||||
**Note**: When `--mocked` is used, any sampling parameters (`--temperature`, `--top-p`, `--seed`) are ignored with a warning.
|
||||
|
||||
### Usage Examples
|
||||
|
||||
```bash
|
||||
# Mocked generator (deterministic, no LLM required)
|
||||
./biergarten-pipeline --mocked
|
||||
|
||||
# With LLM model
|
||||
./biergarten-pipeline --model ./models/llama.gguf --cache-dir /var/cache
|
||||
|
||||
# Mocked with extra parameters provided (will be ignored with warning)
|
||||
./biergarten-pipeline --mocked --temperature 0.5 --top-p 0.8 --seed 42
|
||||
|
||||
# Show help
|
||||
./biergarten-pipeline --help
|
||||
```
|
||||
|
||||
## Building and Running
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- **C++23 compiler** (g++, clang, MSVC)
|
||||
- **CMake** 3.20+
|
||||
- **curl** (for HTTP downloads)
|
||||
- **sqlite3** (database backend)
|
||||
- **Boost** 1.75+ (requires Boost.JSON and Boost.Program_options)
|
||||
- **spdlog** v1.11.0 (fetched via CMake FetchContent)
|
||||
- **llama.cpp** (fetched via CMake FetchContent for LLM inference)
|
||||
|
||||
### Build
|
||||
|
||||
```bash
|
||||
mkdir -p build
|
||||
cd build
|
||||
cmake ..
|
||||
cmake --build . --target biergarten-pipeline -- -j
|
||||
```
|
||||
|
||||
### Run
|
||||
|
||||
```bash
|
||||
./build/biergarten-pipeline
|
||||
```
|
||||
|
||||
**Output**:
|
||||
|
||||
- Console logs with structured spdlog output
|
||||
- Cached JSON file: `/tmp/countries+states+cities.json`
|
||||
- SQLite database: `biergarten-pipeline.db` (in output directory)
|
||||
|
||||
## Code Quality and Static Analysis
|
||||
|
||||
### Formatting
|
||||
|
||||
This project uses **clang-format** with the **Google C++ style guide**:
|
||||
|
||||
```bash
|
||||
# Apply formatting to all source files
|
||||
cmake --build build --target format
|
||||
|
||||
# Check formatting without modifications
|
||||
cmake --build build --target format-check
|
||||
```
|
||||
|
||||
### Static Analysis
|
||||
|
||||
This project uses **clang-tidy** with configurations for Google, modernize, performance, and bug-prone rules (`.clang-tidy`):
|
||||
|
||||
Static analysis runs automatically during compilation if `clang-tidy` is available.
|
||||
|
||||
## Code Implementation Summary
|
||||
|
||||
### Key Achievements
|
||||
|
||||
✅ **Full pipeline implementation** - Download → Parse → Store → Generate
|
||||
✅ **Streaming JSON parser** - Memory-efficient processing via Boost.JSON callbacks
|
||||
✅ **Thread-safe SQLite wrapper** - Mutex-protected database for future parallelization
|
||||
✅ **Flexible data generation** - Abstract IDataGenerator interface supporting both mock and LLM modes
|
||||
✅ **Comprehensive CLI** - Boost.Program_options with sensible defaults
|
||||
✅ **Production-grade logging** - spdlog integration for structured output
|
||||
✅ **Build quality** - CMake with clang-format/clang-tidy integration
|
||||
|
||||
### Architecture Patterns
|
||||
|
||||
- **Interface-based design**: `IWebClient`, `IDataGenerator` abstract base classes enable substitution and testing
|
||||
- **Dependency injection**: Components receive dependencies via constructors (BiergartenDataGenerator)
|
||||
- **RAII principle**: SQLite connections and resources managed via destructors
|
||||
- **Callback-driven parsing**: Boost.JSON parser emits events to processing callbacks
|
||||
- **Transaction-scoped inserts**: BeginTransaction/CommitTransaction for batch performance
|
||||
|
||||
### External Dependencies
|
||||
|
||||
| Dependency | Version | Purpose | Type |
|
||||
| ---------- | ------- | ---------------------------------- | ------- |
|
||||
| Boost | 1.75+ | JSON parsing, CLI argument parsing | Library |
|
||||
| SQLite3 | - | Persistent data storage | System |
|
||||
| libcurl | - | HTTP downloads | System |
|
||||
| spdlog | v1.11.0 | Structured logging | Fetched |
|
||||
| llama.cpp | b8611 | LLM inference engine | Fetched |
|
||||
|
||||
to validate formatting without modifying files.
|
||||
|
||||
clang-tidy runs automatically on the biergarten-pipeline target when available. You can disable it at configure time:
|
||||
|
||||
cmake -DENABLE_CLANG_TIDY=OFF ..
|
||||
|
||||
You can also disable format helper targets:
|
||||
|
||||
cmake -DENABLE_CLANG_FORMAT_TARGETS=OFF ..
|
||||
157
pipeline/includes/biergarten_data_generator.h
Normal file
@@ -0,0 +1,157 @@
|
||||
#ifndef BIERGARTEN_PIPELINE_BIERGARTEN_DATA_GENERATOR_H_
|
||||
#define BIERGARTEN_PIPELINE_BIERGARTEN_DATA_GENERATOR_H_
|
||||
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <unordered_map>
|
||||
#include <vector>
|
||||
|
||||
#include "data_generation/data_generator.h"
|
||||
#include "database/database.h"
|
||||
#include "web_client/web_client.h"
|
||||
#include "wikipedia/wikipedia_service.h"
|
||||
|
||||
/**
|
||||
* @brief Program options for the Biergarten pipeline application.
|
||||
*/
|
||||
struct ApplicationOptions {
|
||||
/// @brief Path to the LLM model file (gguf format); mutually exclusive with
|
||||
/// use_mocked.
|
||||
std::string model_path;
|
||||
|
||||
/// @brief Use mocked generator instead of LLM; mutually exclusive with
|
||||
/// model_path.
|
||||
bool use_mocked = false;
|
||||
|
||||
/// @brief Directory for cached JSON and database files.
|
||||
std::string cache_dir;
|
||||
|
||||
/// @brief LLM sampling temperature (0.0 to 1.0, higher = more random).
|
||||
float temperature = 0.8f;
|
||||
|
||||
/// @brief LLM nucleus sampling top-p parameter (0.0 to 1.0, higher = more
|
||||
/// random).
|
||||
float top_p = 0.92f;
|
||||
|
||||
/// @brief Context window size (tokens) for LLM inference. Higher values
|
||||
/// support longer prompts but use more memory.
|
||||
uint32_t n_ctx = 2048;
|
||||
|
||||
/// @brief Random seed for sampling (-1 for random, otherwise non-negative).
|
||||
int seed = -1;
|
||||
|
||||
/// @brief Git commit hash for database consistency (always pinned to
|
||||
/// c5eb7772).
|
||||
std::string commit = "c5eb7772";
|
||||
};
|
||||
|
||||
/**
|
||||
* @brief Main data generator class for the Biergarten pipeline.
|
||||
*
|
||||
* This class encapsulates the core logic for generating brewery data.
|
||||
* It handles database initialization, data loading/downloading, and brewery
|
||||
* generation.
|
||||
*/
|
||||
class BiergartenDataGenerator {
|
||||
public:
|
||||
/**
|
||||
* @brief Construct a BiergartenDataGenerator with injected dependencies.
|
||||
*
|
||||
* @param options Application configuration options.
|
||||
* @param web_client HTTP client for downloading data.
|
||||
* @param database SQLite database instance.
|
||||
*/
|
||||
BiergartenDataGenerator(const ApplicationOptions& options,
|
||||
std::shared_ptr<WebClient> web_client,
|
||||
SqliteDatabase& database);
|
||||
|
||||
/**
|
||||
* @brief Run the data generation pipeline.
|
||||
*
|
||||
* Performs the following steps:
|
||||
* 1. Initialize database
|
||||
* 2. Download geographic data if needed
|
||||
* 3. Initialize the generator (LLM or Mock)
|
||||
* 4. Generate brewery data for sample cities
|
||||
*
|
||||
* @return 0 on success, 1 on failure.
|
||||
*/
|
||||
int Run();
|
||||
|
||||
private:
|
||||
/// @brief Immutable application options.
|
||||
const ApplicationOptions options_;
|
||||
|
||||
/// @brief Shared HTTP client dependency.
|
||||
std::shared_ptr<WebClient> webClient_;
|
||||
|
||||
/// @brief Database dependency.
|
||||
SqliteDatabase& database_;
|
||||
|
||||
/**
|
||||
* @brief Enriched city data with Wikipedia context.
|
||||
*/
|
||||
struct EnrichedCity {
|
||||
int city_id;
|
||||
std::string city_name;
|
||||
std::string country_name;
|
||||
std::string region_context;
|
||||
};
|
||||
|
||||
/**
|
||||
* @brief Initialize the data generator based on options.
|
||||
*
|
||||
* Creates either a MockGenerator (if no model path) or LlamaGenerator.
|
||||
*
|
||||
* @return A unique_ptr to the initialized generator.
|
||||
*/
|
||||
std::unique_ptr<DataGenerator> InitializeGenerator();
|
||||
|
||||
/**
|
||||
* @brief Download and load geographic data if not cached.
|
||||
*/
|
||||
void LoadGeographicData();
|
||||
|
||||
/**
|
||||
* @brief Query cities from database and build country name map.
|
||||
*
|
||||
* @return Vector of (City, country_name) pairs capped at 30 entries.
|
||||
*/
|
||||
std::vector<std::pair<City, std::string>> QueryCitiesWithCountries();
|
||||
|
||||
/**
|
||||
* @brief Enrich cities with Wikipedia summaries.
|
||||
*
|
||||
* @param cities Vector of (City, country_name) pairs.
|
||||
* @return Vector of enriched city data with context.
|
||||
*/
|
||||
std::vector<EnrichedCity> EnrichWithWikipedia(
|
||||
const std::vector<std::pair<City, std::string>>& cities);
|
||||
|
||||
/**
|
||||
* @brief Generate breweries for enriched cities.
|
||||
*
|
||||
* @param generator The data generator instance.
|
||||
* @param cities Vector of enriched city data.
|
||||
*/
|
||||
void GenerateBreweries(DataGenerator& generator,
|
||||
const std::vector<EnrichedCity>& cities);
|
||||
|
||||
/**
|
||||
* @brief Log the generated brewery results.
|
||||
*/
|
||||
void LogResults() const;
|
||||
|
||||
/**
|
||||
* @brief Helper struct to store generated brewery data.
|
||||
*/
|
||||
struct GeneratedBrewery {
|
||||
int city_id;
|
||||
std::string city_name;
|
||||
BreweryResult brewery;
|
||||
};
|
||||
|
||||
/// @brief Stores generated brewery data.
|
||||
std::vector<GeneratedBrewery> generatedBreweries_;
|
||||
};
|
||||
#endif // BIERGARTEN_PIPELINE_BIERGARTEN_DATA_GENERATOR_H_
|
||||
31
pipeline/includes/data_generation/data_downloader.h
Normal file
@@ -0,0 +1,31 @@
|
||||
#ifndef BIERGARTEN_PIPELINE_DATA_GENERATION_DATA_DOWNLOADER_H_
|
||||
#define BIERGARTEN_PIPELINE_DATA_GENERATION_DATA_DOWNLOADER_H_
|
||||
|
||||
#include <memory>
|
||||
#include <stdexcept>
|
||||
#include <string>
|
||||
|
||||
#include "web_client/web_client.h"
|
||||
|
||||
/// @brief Downloads and caches source geography JSON payloads.
|
||||
class DataDownloader {
|
||||
public:
|
||||
/// @brief Initializes global curl state used by this downloader.
|
||||
explicit DataDownloader(std::shared_ptr<WebClient> web_client);
|
||||
|
||||
/// @brief Cleans up global curl state.
|
||||
~DataDownloader();
|
||||
|
||||
/// @brief Returns a local JSON path, downloading it when cache is missing.
|
||||
std::string DownloadCountriesDatabase(
|
||||
const std::string& cache_path,
|
||||
const std::string& commit =
|
||||
"c5eb7772" // Stable commit: 2026-03-28 export
|
||||
);
|
||||
|
||||
private:
|
||||
static bool FileExists(const std::string& file_path);
|
||||
std::shared_ptr<WebClient> web_client_;
|
||||
};
|
||||
|
||||
#endif // BIERGARTEN_PIPELINE_DATA_GENERATION_DATA_DOWNLOADER_H_
|
||||
29
pipeline/includes/data_generation/data_generator.h
Normal file
@@ -0,0 +1,29 @@
|
||||
#ifndef BIERGARTEN_PIPELINE_DATA_GENERATION_DATA_GENERATOR_H_
|
||||
#define BIERGARTEN_PIPELINE_DATA_GENERATION_DATA_GENERATOR_H_
|
||||
|
||||
#include <string>
|
||||
|
||||
struct BreweryResult {
|
||||
std::string name;
|
||||
std::string description;
|
||||
};
|
||||
|
||||
struct UserResult {
|
||||
std::string username;
|
||||
std::string bio;
|
||||
};
|
||||
|
||||
class DataGenerator {
|
||||
public:
|
||||
virtual ~DataGenerator() = default;
|
||||
|
||||
virtual void Load(const std::string& model_path) = 0;
|
||||
|
||||
virtual BreweryResult GenerateBrewery(const std::string& city_name,
|
||||
const std::string& country_name,
|
||||
const std::string& region_context) = 0;
|
||||
|
||||
virtual UserResult GenerateUser(const std::string& locale) = 0;
|
||||
};
|
||||
|
||||
#endif // BIERGARTEN_PIPELINE_DATA_GENERATION_DATA_GENERATOR_H_
|
||||
51
pipeline/includes/data_generation/llama_generator.h
Normal file
@@ -0,0 +1,51 @@
|
||||
#ifndef BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_H_
|
||||
#define BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_H_
|
||||
|
||||
#include <cstdint>
|
||||
#include <string>
|
||||
|
||||
#include "data_generation/data_generator.h"
|
||||
|
||||
struct llama_model;
|
||||
struct llama_context;
|
||||
|
||||
class LlamaGenerator final : public DataGenerator {
|
||||
public:
|
||||
LlamaGenerator() = default;
|
||||
~LlamaGenerator() override;
|
||||
|
||||
void SetSamplingOptions(float temperature, float top_p, int seed = -1);
|
||||
|
||||
void SetContextSize(uint32_t n_ctx);
|
||||
|
||||
void Load(const std::string& model_path) override;
|
||||
BreweryResult GenerateBrewery(const std::string& city_name,
|
||||
const std::string& country_name,
|
||||
const std::string& region_context) override;
|
||||
UserResult GenerateUser(const std::string& locale) override;
|
||||
|
||||
private:
|
||||
std::string Infer(const std::string& prompt, int max_tokens = 10000);
|
||||
// Overload that allows passing a system message separately so chat-capable
|
||||
// models receive a proper system role instead of having the system text
|
||||
// concatenated into the user prompt (helps avoid revealing internal
|
||||
// reasoning or instructions in model output).
|
||||
std::string Infer(const std::string& system_prompt,
|
||||
const std::string& prompt, int max_tokens = 10000);
|
||||
|
||||
std::string InferFormatted(const std::string& formatted_prompt,
|
||||
int max_tokens = 10000);
|
||||
|
||||
std::string LoadBrewerySystemPrompt(const std::string& prompt_file_path);
|
||||
std::string GetFallbackBreweryPrompt();
|
||||
|
||||
llama_model* model_ = nullptr;
|
||||
llama_context* context_ = nullptr;
|
||||
float sampling_temperature_ = 0.8f;
|
||||
float sampling_top_p_ = 0.92f;
|
||||
uint32_t sampling_seed_ = 0xFFFFFFFFu;
|
||||
uint32_t n_ctx_ = 8192;
|
||||
std::string brewery_system_prompt_;
|
||||
};
|
||||
|
||||
#endif // BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_H_
|
||||
32
pipeline/includes/data_generation/llama_generator_helpers.h
Normal file
@@ -0,0 +1,32 @@
|
||||
#ifndef BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_HELPERS_H_
|
||||
#define BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_HELPERS_H_
|
||||
|
||||
#include <string>
|
||||
#include <utility>
|
||||
|
||||
struct llama_model;
|
||||
struct llama_vocab;
|
||||
typedef int llama_token;
|
||||
|
||||
// Helper functions for LlamaGenerator methods
|
||||
std::string PrepareRegionContextPublic(std::string_view region_context,
|
||||
std::size_t max_chars = 700);
|
||||
|
||||
std::pair<std::string, std::string> ParseTwoLineResponsePublic(
|
||||
const std::string& raw, const std::string& error_message);
|
||||
|
||||
std::string ToChatPromptPublic(const llama_model* model,
|
||||
const std::string& user_prompt);
|
||||
|
||||
std::string ToChatPromptPublic(const llama_model* model,
|
||||
const std::string& system_prompt,
|
||||
const std::string& user_prompt);
|
||||
|
||||
void AppendTokenPiecePublic(const llama_vocab* vocab, llama_token token,
|
||||
std::string& output);
|
||||
|
||||
std::string ValidateBreweryJsonPublic(const std::string& raw,
|
||||
std::string& name_out,
|
||||
std::string& description_out);
|
||||
|
||||
#endif // BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_HELPERS_H_
|
||||
28
pipeline/includes/data_generation/mock_generator.h
Normal file
@@ -0,0 +1,28 @@
|
||||
#ifndef BIERGARTEN_PIPELINE_DATA_GENERATION_MOCK_GENERATOR_H_
|
||||
#define BIERGARTEN_PIPELINE_DATA_GENERATION_MOCK_GENERATOR_H_
|
||||
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
#include "data_generation/data_generator.h"
|
||||
|
||||
class MockGenerator final : public DataGenerator {
|
||||
public:
|
||||
void Load(const std::string& model_path) override;
|
||||
BreweryResult GenerateBrewery(const std::string& city_name,
|
||||
const std::string& country_name,
|
||||
const std::string& region_context) override;
|
||||
UserResult GenerateUser(const std::string& locale) override;
|
||||
|
||||
private:
|
||||
static std::size_t DeterministicHash(const std::string& a,
|
||||
const std::string& b);
|
||||
|
||||
static const std::vector<std::string> kBreweryAdjectives;
|
||||
static const std::vector<std::string> kBreweryNouns;
|
||||
static const std::vector<std::string> kBreweryDescriptions;
|
||||
static const std::vector<std::string> kUsernames;
|
||||
static const std::vector<std::string> kBios;
|
||||
};
|
||||
|
||||
#endif // BIERGARTEN_PIPELINE_DATA_GENERATION_MOCK_GENERATOR_H_
|
||||
87
pipeline/includes/database/database.h
Normal file
@@ -0,0 +1,87 @@
|
||||
#ifndef BIERGARTEN_PIPELINE_DATABASE_DATABASE_H_
|
||||
#define BIERGARTEN_PIPELINE_DATABASE_DATABASE_H_
|
||||
|
||||
#include <sqlite3.h>
|
||||
|
||||
#include <mutex>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
struct Country {
|
||||
/// @brief Country identifier from the source dataset.
|
||||
int id;
|
||||
/// @brief Country display name.
|
||||
std::string name;
|
||||
/// @brief ISO 3166-1 alpha-2 code.
|
||||
std::string iso2;
|
||||
/// @brief ISO 3166-1 alpha-3 code.
|
||||
std::string iso3;
|
||||
};
|
||||
|
||||
struct State {
|
||||
/// @brief State or province identifier from the source dataset.
|
||||
int id;
|
||||
/// @brief State or province display name.
|
||||
std::string name;
|
||||
/// @brief State or province short code.
|
||||
std::string iso2;
|
||||
/// @brief Parent country identifier.
|
||||
int country_id;
|
||||
};
|
||||
|
||||
struct City {
|
||||
/// @brief City identifier from the source dataset.
|
||||
int id;
|
||||
/// @brief City display name.
|
||||
std::string name;
|
||||
/// @brief Parent country identifier.
|
||||
int country_id;
|
||||
};
|
||||
|
||||
/// @brief Thread-safe SQLite wrapper for pipeline writes and readbacks.
|
||||
class SqliteDatabase {
|
||||
private:
|
||||
sqlite3* db_ = nullptr;
|
||||
std::mutex db_mutex_;
|
||||
|
||||
void InitializeSchema();
|
||||
|
||||
public:
|
||||
/// @brief Closes the SQLite connection if initialized.
|
||||
~SqliteDatabase();
|
||||
|
||||
/// @brief Opens the SQLite database at db_path and creates schema objects.
|
||||
void Initialize(const std::string& db_path = ":memory:");
|
||||
|
||||
/// @brief Starts a database transaction for batched writes.
|
||||
void BeginTransaction();
|
||||
|
||||
/// @brief Commits the active database transaction.
|
||||
void CommitTransaction();
|
||||
|
||||
/// @brief Rolls back the active database transaction.
|
||||
void RollbackTransaction();
|
||||
|
||||
/// @brief Inserts a country row.
|
||||
void InsertCountry(int id, const std::string& name, const std::string& iso2,
|
||||
const std::string& iso3);
|
||||
|
||||
/// @brief Inserts a state row linked to a country.
|
||||
void InsertState(int id, int country_id, const std::string& name,
|
||||
const std::string& iso2);
|
||||
|
||||
/// @brief Inserts a city row linked to state and country.
|
||||
void InsertCity(int id, int state_id, int country_id,
|
||||
const std::string& name, double latitude, double longitude);
|
||||
|
||||
/// @brief Returns city records including parent country id.
|
||||
std::vector<City> QueryCities();
|
||||
|
||||
/// @brief Returns countries with optional row limit.
|
||||
std::vector<Country> QueryCountries(int limit = 0);
|
||||
|
||||
/// @brief Returns states with optional row limit.
|
||||
std::vector<State> QueryStates(int limit = 0);
|
||||
};
|
||||
|
||||
#endif // BIERGARTEN_PIPELINE_DATABASE_DATABASE_H_
|
||||
17
pipeline/includes/json_handling/json_loader.h
Normal file
@@ -0,0 +1,17 @@
|
||||
#ifndef BIERGARTEN_PIPELINE_JSON_HANDLING_JSON_LOADER_H_
|
||||
#define BIERGARTEN_PIPELINE_JSON_HANDLING_JSON_LOADER_H_
|
||||
|
||||
#include <string>
|
||||
|
||||
#include "database/database.h"
|
||||
#include "json_handling/stream_parser.h"
|
||||
|
||||
/// @brief Loads world-city JSON data into SQLite through streaming parsing.
|
||||
class JsonLoader {
|
||||
public:
|
||||
/// @brief Parses a JSON file and writes country/state/city rows into db.
|
||||
static void LoadWorldCities(const std::string& json_path,
|
||||
SqliteDatabase& db);
|
||||
};
|
||||
|
||||
#endif // BIERGARTEN_PIPELINE_JSON_HANDLING_JSON_LOADER_H_
|
||||
52
pipeline/includes/json_handling/stream_parser.h
Normal file
@@ -0,0 +1,52 @@
|
||||
#ifndef BIERGARTEN_PIPELINE_JSON_HANDLING_STREAM_PARSER_H_
|
||||
#define BIERGARTEN_PIPELINE_JSON_HANDLING_STREAM_PARSER_H_
|
||||
|
||||
#include <functional>
|
||||
#include <string>
|
||||
|
||||
#include "database/database.h"
|
||||
|
||||
// Forward declaration to avoid circular dependency
|
||||
class SqliteDatabase;
|
||||
|
||||
/// @brief In-memory representation of one parsed city entry.
|
||||
struct CityRecord {
|
||||
int id;
|
||||
int state_id;
|
||||
int country_id;
|
||||
std::string name;
|
||||
double latitude;
|
||||
double longitude;
|
||||
};
|
||||
|
||||
/// @brief Streaming SAX parser that emits city records during traversal.
|
||||
class StreamingJsonParser {
|
||||
public:
|
||||
/// @brief Parses file_path and invokes callbacks for city rows and progress.
|
||||
static void Parse(const std::string& file_path, SqliteDatabase& db,
|
||||
std::function<void(const CityRecord&)> on_city,
|
||||
std::function<void(size_t, size_t)> on_progress = nullptr);
|
||||
|
||||
private:
|
||||
/// @brief Mutable SAX handler state while traversing nested JSON arrays.
|
||||
struct ParseState {
|
||||
int current_country_id = 0;
|
||||
int current_state_id = 0;
|
||||
|
||||
CityRecord current_city = {};
|
||||
bool building_city = false;
|
||||
std::string current_key;
|
||||
|
||||
int array_depth = 0;
|
||||
int object_depth = 0;
|
||||
bool in_countries_array = false;
|
||||
bool in_states_array = false;
|
||||
bool in_cities_array = false;
|
||||
|
||||
std::function<void(const CityRecord&)> on_city;
|
||||
std::function<void(size_t, size_t)> on_progress;
|
||||
size_t bytes_processed = 0;
|
||||
};
|
||||
};
|
||||
|
||||
#endif // BIERGARTEN_PIPELINE_JSON_HANDLING_STREAM_PARSER_H_
|
||||
30
pipeline/includes/web_client/curl_web_client.h
Normal file
@@ -0,0 +1,30 @@
|
||||
#ifndef BIERGARTEN_PIPELINE_WEB_CLIENT_CURL_WEB_CLIENT_H_
|
||||
#define BIERGARTEN_PIPELINE_WEB_CLIENT_CURL_WEB_CLIENT_H_
|
||||
|
||||
#include <memory>
|
||||
|
||||
#include "web_client/web_client.h"
|
||||
|
||||
// RAII for curl_global_init/cleanup.
|
||||
// An instance of this class should be created in main() before any curl
|
||||
// operations and exist for the lifetime of the application.
|
||||
class CurlGlobalState {
|
||||
public:
|
||||
CurlGlobalState();
|
||||
~CurlGlobalState();
|
||||
CurlGlobalState(const CurlGlobalState&) = delete;
|
||||
CurlGlobalState& operator=(const CurlGlobalState&) = delete;
|
||||
};
|
||||
|
||||
class CURLWebClient : public WebClient {
|
||||
public:
|
||||
CURLWebClient();
|
||||
~CURLWebClient() override;
|
||||
|
||||
void DownloadToFile(const std::string& url,
|
||||
const std::string& file_path) override;
|
||||
std::string Get(const std::string& url) override;
|
||||
std::string UrlEncode(const std::string& value) override;
|
||||
};
|
||||
|
||||
#endif // BIERGARTEN_PIPELINE_WEB_CLIENT_CURL_WEB_CLIENT_H_
|
||||
22
pipeline/includes/web_client/web_client.h
Normal file
@@ -0,0 +1,22 @@
|
||||
#ifndef BIERGARTEN_PIPELINE_WEB_CLIENT_WEB_CLIENT_H_
|
||||
#define BIERGARTEN_PIPELINE_WEB_CLIENT_WEB_CLIENT_H_
|
||||
|
||||
#include <string>
|
||||
|
||||
class WebClient {
|
||||
public:
|
||||
virtual ~WebClient() = default;
|
||||
|
||||
// Downloads content from a URL to a file. Throws on error.
|
||||
virtual void DownloadToFile(const std::string& url,
|
||||
const std::string& file_path) = 0;
|
||||
|
||||
// Performs a GET request and returns the response body as a string. Throws
|
||||
// on error.
|
||||
virtual std::string Get(const std::string& url) = 0;
|
||||
|
||||
// URL-encodes a string.
|
||||
virtual std::string UrlEncode(const std::string& value) = 0;
|
||||
};
|
||||
|
||||
#endif // BIERGARTEN_PIPELINE_WEB_CLIENT_WEB_CLIENT_H_
|
||||
27
pipeline/includes/wikipedia/wikipedia_service.h
Normal file
@@ -0,0 +1,27 @@
|
||||
#ifndef BIERGARTEN_PIPELINE_WIKIPEDIA_WIKIPEDIA_SERVICE_H_
|
||||
#define BIERGARTEN_PIPELINE_WIKIPEDIA_WIKIPEDIA_SERVICE_H_
|
||||
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <string_view>
|
||||
#include <unordered_map>
|
||||
|
||||
#include "web_client/web_client.h"
|
||||
|
||||
/// @brief Provides cached Wikipedia summary lookups for city and country pairs.
|
||||
class WikipediaService {
|
||||
public:
|
||||
/// @brief Creates a new Wikipedia service with the provided web client.
|
||||
explicit WikipediaService(std::shared_ptr<WebClient> client);
|
||||
|
||||
/// @brief Returns the Wikipedia summary extract for city and country.
|
||||
[[nodiscard]] std::string GetSummary(std::string_view city,
|
||||
std::string_view country);
|
||||
|
||||
private:
|
||||
std::string FetchExtract(std::string_view query);
|
||||
std::shared_ptr<WebClient> client_;
|
||||
std::unordered_map<std::string, std::string> cache_;
|
||||
};
|
||||
|
||||
#endif // BIERGARTEN_PIPELINE_WIKIPEDIA_WIKIPEDIA_SERVICE_H_
|
||||
425
pipeline/prompts/brewery_system_prompt.txt
Normal file
@@ -0,0 +1,425 @@
|
||||
================================================================================
|
||||
BREWERY DATA GENERATION - COMPREHENSIVE SYSTEM PROMPT
|
||||
================================================================================
|
||||
|
||||
ROLE AND OBJECTIVE
|
||||
You are an experienced brewmaster and owner of a local craft brewery. Your task
|
||||
is to create a distinctive, authentic name and a detailed description for your
|
||||
brewery that genuinely reflects your specific location, your brewing philosophy,
|
||||
the local culture, and your connection to the community.
|
||||
|
||||
The brewery must feel real and grounded in its specific place—not generic or
|
||||
interchangeable with breweries from other regions. Every detail should build
|
||||
authenticity and distinctiveness.
|
||||
|
||||
================================================================================
|
||||
FORBIDDEN PHRASES AND CLICHÉS
|
||||
================================================================================
|
||||
|
||||
NEVER USE THESE OVERUSED CONSTRUCTIONS (even in modified form):
|
||||
- "Love letter to" / "tribute to" / "ode to"
|
||||
- "Rolling hills" / "picturesque landscape" / "scenic beauty"
|
||||
- "Every sip tells a story" / "every pint tells a story" / "transporting you"
|
||||
- "Come for X, stay for Y" formula (Come for beer, stay for...)
|
||||
- "Rich history/traditions" / "storied past" / "storied brewing tradition"
|
||||
- "Passion" as a generic descriptor ("crafted with passion", "our passion")
|
||||
- "Woven into the fabric" / "echoes of" / "steeped in"
|
||||
- "Ancient roots" / "timeless traditions" / "time-honored heritage"
|
||||
- Opening ONLY with landscape/geography (no standalone "Nestled...", "Where...")
|
||||
- "Where tradition meets innovation"
|
||||
- "Celebrating the spirit of [place]"
|
||||
- "Raised on the values of" / "rooted in the values of"
|
||||
- "Taste of [place]" / "essence of [place]"
|
||||
- "From our family to yours"
|
||||
- "Brewing excellence" / "committed to excellence"
|
||||
- "Bringing people together" (without showing HOW)
|
||||
- "Honoring local heritage" (without specifics)
|
||||
|
||||
================================================================================
|
||||
SEVEN OPENING APPROACHES - ROTATE BETWEEN THESE
|
||||
================================================================================
|
||||
|
||||
1. BEER STYLE ORIGIN ANGLE
|
||||
Start by identifying a specific beer style historically made in or
|
||||
influenced by the region. Explain why THIS place inspired that style.
|
||||
Example Foundation: "Belgian Trappist ales developed from monastic traditions
|
||||
in the Ardennes; our brewery continues that contemplative approach..."
|
||||
|
||||
2. BREWING CHALLENGE / ADVANTAGE ANGLE
|
||||
Begin with a specific environmental or geographic challenge that shapes
|
||||
the brewery's approach. Water hardness, altitude, climate, ingredient scarcity.
|
||||
Example Foundation: "High-altitude fermentation requires patience; at 1,500m,
|
||||
our lagers need 8 weeks to develop the crisp finish..."
|
||||
|
||||
3. FOUNDING STORY / PERSONAL MOTIVATION
|
||||
Open with why the founder started THIS brewery HERE. Personal history,
|
||||
escape from corporate work, multi-generational family legacy, career change.
|
||||
Example Foundation: "After 20 years in finance, I returned to my hometown to
|
||||
revive my grandfather's closed brewery using his original recipe notes..."
|
||||
|
||||
4. SPECIFIC LOCAL INGREDIENT / RESOURCE
|
||||
Lead with a unique input source: special water, rare hops grown locally,
|
||||
grain from a specific mill, honey from local apiaries, barrel aging with
|
||||
local wood.
|
||||
Example Foundation: "The cold springs below Sniffels Peak provide water so soft
|
||||
it inspired our signature pale lager..."
|
||||
|
||||
5. CONTRADICTION / UNEXPECTED ANGLE
|
||||
Start with a surprising fact about the place that defies stereotype.
|
||||
Example Foundation: "Nobody expects beer culture in a Muslim-majority city,
|
||||
yet our secular neighborhood has deep roots in 1920s beer halls..."
|
||||
|
||||
6. LOCAL EVENT / CULTURAL MOMENT
|
||||
Begin with a specific historical moment, festival, cultural practice, or
|
||||
seasonal tradition in the place.
|
||||
Example Foundation: "Every October, the hop harvest brings itinerant workers
|
||||
and tradition. Our brewery grew from a harvest celebration in 2008..."
|
||||
|
||||
7. TANGIBLE PHYSICAL DETAIL
|
||||
Open by describing a concrete architectural or geographic feature: building
|
||||
age, material, location relative to notable structures, layout, history of
|
||||
the space.
|
||||
Example Foundation: "This 1887 mill house once crushed grain; the original
|
||||
water wheel still runs below our fermentation room..."
|
||||
|
||||
================================================================================
|
||||
SPECIFICITY AND CONCRETENESS REQUIREMENTS
|
||||
================================================================================
|
||||
|
||||
DO NOT GENERALIZE. Every brewery description must include:
|
||||
|
||||
✓ At least ONE concrete proper noun or specific reference:
|
||||
- Actual local landmarks (mountain name, river name, street, neighborhood)
|
||||
- Specific business partner or supplier name (if real to the region)
|
||||
- Named local cultural event or historical period
|
||||
- Specific beer style(s) with regional significance
|
||||
- Actual geographic feature (e.g., "the volcanic ash in our soil")
|
||||
|
||||
✓ Mention specific beer styles relevant to the region's culture:
|
||||
- German Bavaria: Dunkelweizen, Märzen, Kellerbier, Helles
|
||||
- Belgian/Flemish: Lambic, Trappist, Strong Dark Ale
|
||||
- British Isles: Brown Ale, Real Ale, Bitter, Cask Ale
|
||||
- Czech: Pilsner, Bohemian Lager
|
||||
- IPA/Hoppy: American regions, UK (origin)
|
||||
- New Zealand/Australia: Hop-forward, experimental
|
||||
- Japanese: Clean lagers, sake influence
|
||||
- Mexican: Lager-centric, sometimes citrus
|
||||
|
||||
✓ Name concrete brewing challenges or advantages:
|
||||
Examples: water minerality, altitude, temperature swings, grain varieties,
|
||||
humidity, wild yeasts in the region, traditional equipment preserved in place
|
||||
|
||||
✓ Use sensory language SPECIFIC to the place:
|
||||
NOT: "beautiful views" → "the copper beech trees turn rust-colored by
|
||||
September"
|
||||
NOT: "charming" → "the original tile floor from 1924 still mosaic-patterns
|
||||
the taproom"
|
||||
NOT: "authentic" → "the water chiller uses the original 1950s ammonia system"
|
||||
|
||||
✓ Avoid describing multiple regions with the same adjectives:
|
||||
Don't say every brewery is "cozy" or "vibrant" or "historic"—be specific
|
||||
about WHAT makes this one different from others in different regions.
|
||||
|
||||
================================================================================
|
||||
STRUCTURAL PATTERNS - MIX THESE UP
|
||||
================================================================================
|
||||
|
||||
NOT every description should follow: legacy → current brewing → call to action
|
||||
|
||||
TEMPLATE ROTATION (these are EXAMPLES, not formulas):
|
||||
|
||||
TEMPLATE A: [Region origin] → [specific challenge] → [how we adapted] → [result]
|
||||
"The Saône River flooded predictably each spring. Medieval brewers learned
|
||||
to schedule production around it. We use the same seasonal rhythm..."
|
||||
|
||||
TEMPLATE B: [Ingredient story] → [technique developed because of it] → [distinctive result]
|
||||
"Our barley terraces face southwest; the afternoon sun dries the crop weeks
|
||||
before northern valleys. This inspired our crisp, mineral-forward pale ale..."
|
||||
|
||||
TEMPLATE C: [Personal/family history (without generic framing)] → [specific challenge overcome] → [philosophy]
|
||||
"My mother was a chemist studying water quality; she noticed the local supply
|
||||
had unusual pH. Rather than fight it, we formulated our entire range around
|
||||
it. The sulfate content sharpens our bitters..."
|
||||
|
||||
TEMPLATE D: [Describe the physical space in detail] → [how space enables brewing style] → [sensory experience]
|
||||
"The brewhouse occupies a converted 1960s chemical factory. The stainless steel
|
||||
vats still bear faded original markings. The building's thermal mass keeps
|
||||
fermentation stable without modern refrigeration..."
|
||||
|
||||
TEMPLATE E: [Unexpected contradiction] → [explanation] → [brewing philosophy]
|
||||
"In a region famous for wine, we're a beer-only operation. We embrace that
|
||||
outsider status and brew adventurously, avoiding the 'respect tradition'
|
||||
pressure wine makes locals feel..."
|
||||
|
||||
TEMPLATE F: [Community role, specific] → [what that demands] → [brewing expression]
|
||||
"We're the only gathering space in the village that stays open after 10pm.
|
||||
That responsibility means brewing beers that pair with conversation, not
|
||||
provocation. Sessionable, food-friendly, endlessly drinkable..."
|
||||
|
||||
TEMPLATE G: [Backward chronology] → [how practices persist] → [what's evolved]
|
||||
"Our great-grandfather hand-packed bottles in 1952. We still own his bench.
|
||||
Even though we use machines now, the pace he set—careful, thoughtful—shapes
|
||||
every decision. Nothing about us is fast..."
|
||||
|
||||
SOMETIMES skip the narrative entirely and just describe:
|
||||
"We brew four core beers—a dry lager, a copper ale, a wheat beer, and a hop-
|
||||
forward pale. The range itself tells our story: accessible, varied,
|
||||
unpretentious. No flagship. No hero beer. Balance."
|
||||
|
||||
================================================================================
|
||||
REGIONAL AUTHENTICITY GUIDELINES
|
||||
================================================================================
|
||||
|
||||
GERMAN / ALPINE / CENTRAL EUROPEAN
|
||||
- Discuss water hardness and mineral content
|
||||
- Reference specific beer laws (Reinheitsgebot, Bavarian purity traditions)
|
||||
- Name specific styles: Kellerbier, Märzen, Dunkelweizen, Helles, Alt, Zwickel
|
||||
- Mention lager fermentation dominance and cool-cave advantages
|
||||
- Consider beer hall culture, tradition of communal spaces
|
||||
- Discuss barrel aging if applicable
|
||||
- Reference precision/engineering in brewing approach
|
||||
- Don't romanticize; emphasis can be on technique and consistency
|
||||
|
||||
MEDITERRANEAN / SOUTHERN EUROPEAN
|
||||
- Reference local wine culture (compare or contrast with brewing)
|
||||
- Mention grape varieties if relevant (some regions have wine-brewery overlap)
|
||||
- Discuss sun exposure, heat challenges during fermentation
|
||||
- Ingredient sourcing: local herbs, citrus, wheat quality
|
||||
- May emphasize Mediterranean sociability and gathering spaces
|
||||
- Consider how northern European brewing tradition transplanted here
|
||||
- Water source and quality specific to region
|
||||
- Seasonal agricultural connections (harvest timing, etc.)
|
||||
|
||||
ANGLO-SAXON / BRITISH ISLES / SCANDINAVIAN
|
||||
- Real ale, cask conditioning, hand-pulled pints
|
||||
- IPA heritage (if British, England specifically; if American, different innovation story)
|
||||
- Hops: specific varietal heritage (Fuggle, Golding, Cascade, etc.)
|
||||
- Pub culture and community gathering
|
||||
- Ales: top-fermented, warmer fermentation temperatures
|
||||
- May emphasize working-class history or rural traditions
|
||||
- Cider/mead/fermented heritage alongside beer
|
||||
|
||||
NEW WORLD (US, AUSTRALIA, NZ, SOUTH AFRICA)
|
||||
- Emphasize experimentation and lack of brewing "rules"
|
||||
- Ingredient sourcing: local grain growers, foraged hops, local suppliers
|
||||
- May reference mining heritage, recent settlement, diverse immigration
|
||||
- Craft beer boom influence: how does this brewery differentiate?
|
||||
- Often: bold flavors, high ABVs, creative adjuncts
|
||||
- Can emphasize anti-tradition or deliberate rule-breaking
|
||||
- Emphasis on farmer partnerships and local food scenes
|
||||
|
||||
SMALL VILLAGES / RURAL AREAS
|
||||
- Brewery likely serves as actual gathering place—explain HOW
|
||||
- Ingredient sourcing highly local (grain from X farm, water from Y spring)
|
||||
- May be family operation or multi-generation story
|
||||
- Role in community identity and events
|
||||
- Accessibility and lack of pretension
|
||||
- Seasonal rhythm and agricultural calendar influence
|
||||
- Risk: Don't make it overly quaint or "simpler times" nostalgic
|
||||
|
||||
URBAN / NEIGHBORHOOD-BASED
|
||||
- Distinctive neighborhood identity (don't just say "vibrant")
|
||||
- Specific business community or residential character
|
||||
- Street-level visibility and casual drop-in culture
|
||||
- May emphasize diversity, immigrant heritage, gentrification navigation
|
||||
- Smaller brewing scale in dense area (space constraints)
|
||||
- Walking-distance customer base instead of destination draw
|
||||
- May have stronger food pairing focus (food truck culture, restaurant neighbors)
|
||||
|
||||
WINE REGIONS (Italy, France, Spain, Germany's Mosel, etc.)
|
||||
- Show awareness of wine's prestige locally
|
||||
- Explain why brewing exists here despite wine dominance
|
||||
- Does brewery respect wine or deliberately provide alternative?
|
||||
- Ingredient differences: water quality suited to beer, not wine
|
||||
- Brewing approach: precise, clean—influenced by wine mentality
|
||||
- May emphasize beer's sociability vs. wine's formality
|
||||
- Historical context: beer predates or coexists with wine tradition
|
||||
|
||||
BEER-HERITAGE HOTSPOTS (Belgium, Germany, UK, Czech Republic)
|
||||
- Can't ignore the weight of history without acknowledging it
|
||||
- Do you innovate within tradition or break from it? Say which.
|
||||
- Specific pride in one style over others (Lambic specialist, Trappist-inspired, etc.)
|
||||
- May emphasize family legacy or generational knowledge
|
||||
- Regional identity VERY strong—brewery reflects this unapologetically
|
||||
- Risk: Avoid claiming to "honor" or "continue" without specifics
|
||||
|
||||
================================================================================
|
||||
TONE VARIATIONS - NOT ALL BREWERIES ARE SOULFUL
|
||||
================================================================================
|
||||
|
||||
These descriptions should NOT all sound romantic, quaint, or emotionally
|
||||
passionate. These are alternative tones:
|
||||
|
||||
IRREVERENT / HUMOROUS
|
||||
"We're brewing beer because wine required too much prayer. Less spirituality,
|
||||
more hops. Our ales are big, unpolished, and perfect after a day's work."
|
||||
|
||||
MATTER-OF-FACT / ENGINEERING-FOCUSED
|
||||
"Brewing is chemistry. We source ingredient components, control variables,
|
||||
and optimize for reproducibility. If that sounds clinical, good—consistency
|
||||
is our craft."
|
||||
|
||||
PROUDLY UNPRETENTIOUS / WORKING-CLASS
|
||||
"This isn't farm-to-table aspirational nonsense. It's a neighborhood beer.
|
||||
$4 pints. No reservations. No sipping notes. Tastes good, fills the glass,
|
||||
keeps you coming back."
|
||||
|
||||
MINIMALIST / DIRECT
|
||||
"We brew three beers. They're good. Come drink one."
|
||||
|
||||
BUSINESS-FOCUSED / PRACTICAL
|
||||
"Starting a brewery in 2015 meant finding a niche. We're the only nano-
|
||||
brewery serving the airport district. Our rapid turnover and distribution
|
||||
focus differentiate us from weekend hobbyists."
|
||||
|
||||
CONFRONTATIONAL / REBELLIOUS
|
||||
"Craft beer got boring. Expensive IPAs and flavor-chasing. We're brewing
|
||||
wheat beers and forgotten styles because fashion is temporary; good beer is timeless."
|
||||
|
||||
MIX these tones across your descriptions. Some breweries should sound romantic
|
||||
and place-proud. Others should sound irreverent or practical.
|
||||
|
||||
================================================================================
|
||||
NARRATIVE CLICHÉS TO ABSOLUTELY AVOID
|
||||
================================================================================
|
||||
|
||||
1. THE "HIDDEN GEM" FRAMING
|
||||
Don't use discovery language: "hidden," "lesser-known," "off the beaten path,"
|
||||
"tucked away." Implies marketing speak, not authenticity.
|
||||
|
||||
2. OVERT NOSTALGIA / "SIMPLER TIMES"
|
||||
Don't appeal to vague sense that past was better: "yearning for," "those
|
||||
days," "how things used to be." Lazy and off-putting.
|
||||
|
||||
3. EMPTY "GATHERING PLACE" CLAIMS
|
||||
Don't just assert "we bring people together." Show HOW: local workers' lunch
|
||||
spot? Trivia night tradition? Live music venue? Political meeting ground?
|
||||
|
||||
4. "SPECIAL" WITHOUT EVIDENCE
|
||||
Don't declare location is "special" or "unique." SHOW what makes it distinct
|
||||
through specific details, not assertion.
|
||||
|
||||
5. "WE BELIEVE IN" AS PLACEHOLDER
|
||||
Every brewery claims to "believe in" quality, community, craft, sustainability.
|
||||
These are empty. What specific belief drives THIS brewery's choices?
|
||||
|
||||
6. "ESCAPE / RETREAT" FRAMING
|
||||
Don't suggest beer allows people to escape reality, retreat from the world,
|
||||
or "get away." Implies you don't trust the place itself to be compelling.
|
||||
|
||||
7. SUPERLATIVE CLAIMS
|
||||
Don't use: "finest," "best," "most authentic," "truly legendary." Let details
|
||||
prove these implied claims instead.
|
||||
|
||||
8. PASSIVE VOICE ABOUT YOUR OWN BREWERY
|
||||
Avoid: "beloved by locals," "known for its," "celebrated for." Active voice:
|
||||
what does the brewery actively DO?
|
||||
|
||||
================================================================================
|
||||
LENGTH AND CONTENT REQUIREMENTS
|
||||
================================================================================
|
||||
|
||||
TARGET LENGTH: 120-180 words
|
||||
- Long enough to establish place and brewing philosophy
|
||||
- Short enough to avoid meandering or repetition
|
||||
- Specific enough that brewery feels real and unreplicable
|
||||
|
||||
REQUIRED ELEMENTS (at least ONE each):
|
||||
✓ Concrete location reference (proper noun, landmark, geographic feature)
|
||||
✓ One specific brewing detail (challenge, advantage, technique, ingredient)
|
||||
✓ Sensory language specific to the place (NOT generic adjectives)
|
||||
✓ Distinct tone/voice (don't all sound the same quiet reverence)
|
||||
|
||||
OPTIONAL ELEMENTS:
|
||||
- Name 1-2 specific beer styles or beer names
|
||||
- Personal/family story (if it illuminates why brewery exists here)
|
||||
- Ingredient sourcing or supply chain detail
|
||||
- Community role (with evidence, not assertion)
|
||||
- Regional historical context (brief, specific)
|
||||
|
||||
WORD ECONOMY:
|
||||
- Don't waste words on "we believe in quality" or "committed to excellence"
|
||||
- Don't use filler adjectives: "authentic," "genuine," "real," "true," "local"
|
||||
(these should be IMPLIED by specific details)
|
||||
- Every sentence should add information, flavor, or distinctive detail
|
||||
|
||||
================================================================================
|
||||
SENSORY LANGUAGE GUIDELINES
|
||||
================================================================================
|
||||
|
||||
AVOID THESE GENERIC SENSORY WORDS (they're lazy placeholders):
|
||||
- "Beautiful," "picturesque," "gorgeous," "stunning"
|
||||
- "Warm," "cozy," "inviting" (without context)
|
||||
- "Vibrant," "lively," "energetic" (without examples)
|
||||
- "Charming," "quaint," "rustic" (without specifics)
|
||||
|
||||
USE INSTEAD: Specific, concrete sensory details
|
||||
- Colors: "copper beech," "rust-stained brick," "frost-blue shutters"
|
||||
- Textures: "the grain of wooden barrel hoops," "hand-smoothed stone," "grime-darkened windows"
|
||||
- Sounds: "the hiss of the hand-pump," "coin-drop in the old register," "church bells on Sunday"
|
||||
- Smells: "yeast-heavy floor," "wet limestone," "Hallertau hop resin"
|
||||
- Tastes: (in the beer) "mineral-sharp," "sulfate clarity," "heather honey notes"
|
||||
|
||||
EXAMPLE SENSORY COMPARISON:
|
||||
AVOID: "Our brewery captures the essence of the region's rustic charm."
|
||||
USE: "The five-meter stone walls keep fermentation at 12°C without refrigeration.
|
||||
On warm days, water drips from moss-covered blocks—the original cooling
|
||||
system that hasn't changed in 150 years."
|
||||
|
||||
================================================================================
|
||||
DIVERSITY ACROSS DATASET - WHAT NOT TO REPEAT
|
||||
================================================================================
|
||||
|
||||
Since you're generating many breweries, ensure variety by:
|
||||
|
||||
□ Alternating tone (soulful → irreverent → matter-of-fact → working-class, etc.)
|
||||
□ Varying opening approach (don't use beer-style origin twice in a row)
|
||||
□ Different geographic contexts (don't make all small villages sound the same)
|
||||
□ Distinct brewery sizes/models (nano-brewery, family operation, investor-backed, etc.)
|
||||
□ Various types of "draw" (neighborhood destination vs. local-only vs. tourist
|
||||
attraction vs. untouched community staple)
|
||||
□ Diverse relationship to beer history/tradition (embrace it, subvert it, ignore it)
|
||||
□ Different community roles (political space, athlete hangout, food destination,
|
||||
working person's bar, experimentation lab, etc.)
|
||||
|
||||
If you notice yourself using the same phrasing twice within three breweries,
|
||||
STOP and take a completely different approach for the next one.
|
||||
|
||||
================================================================================
|
||||
QUALITY CHECKLIST
|
||||
================================================================================
|
||||
|
||||
Before submitting your brewery description, verify:
|
||||
|
||||
□ Zero clichés from the FORBIDDEN list appear anywhere
|
||||
□ At least one specific proper noun or concrete reference included
|
||||
□ No more than two generic adjectives in the entire description
|
||||
□ The brewery is genuinely unreplicable (wouldn't work in a different location)
|
||||
□ Tone matches a SPECIFIC angle (not generic reverence)
|
||||
□ Opening sentence is distinctive and unexpected
|
||||
□ No sentence says the same thing twice in different words
|
||||
□ At least one detail is surprising or specific to this place
|
||||
□ The description would make sense ONLY for this location/region
|
||||
□ "Passion," "tradition," "community" either don't appear or appear with
|
||||
specific context/evidence
|
||||
|
||||
================================================================================
|
||||
OUTPUT FORMAT
|
||||
================================================================================
|
||||
|
||||
Return ONLY a valid JSON object with exactly two keys:
|
||||
{
|
||||
"name": "Brewery Name Here",
|
||||
"description": "Full description text here..."
|
||||
}
|
||||
|
||||
Requirements:
|
||||
- name: 2-5 words, distinctive, memorable
|
||||
- description: 120-180 words, follows all guidelines above
|
||||
- Valid JSON (escaped quotes, no line breaks in strings)
|
||||
- No markdown, no backticks, no code formatting
|
||||
- No preamble before the JSON
|
||||
- No trailing text after the JSON
|
||||
- No explanations or commentary
|
||||
|
||||
================================================================================
|
||||
169
pipeline/prompts/brewery_system_prompt_expanded.txt
Normal file
@@ -0,0 +1,169 @@
|
||||
================================================================================
|
||||
BREWERY DATA GENERATION SYSTEM PROMPT
|
||||
================================================================================
|
||||
|
||||
ROLE AND OBJECTIVE
|
||||
You are an experienced brewmaster creating authentic brewery descriptions that
|
||||
feel real and grounded in specific places. Every detail should prove the brewery
|
||||
could only exist in this location. Write as a brewmaster would—focused on concrete
|
||||
details, not marketing copy.
|
||||
|
||||
================================================================================
|
||||
FORBIDDEN PHRASES AND CLICHÉS
|
||||
================================================================================
|
||||
|
||||
NEVER USE THESE (even in modified form):
|
||||
- "Love letter to" / "tribute to" / "ode to" / "rolling hills" / "picturesque"
|
||||
- "Every sip tells a story" / "Come for X, stay for Y" / "Where tradition meets innovation"
|
||||
- "Rich history" / "ancient roots" / "timeless traditions" / "time-honored heritage"
|
||||
- "Passion" (standalone descriptor) / "brewing excellence" / "commitment to quality"
|
||||
- "Authentic" / "genuine" / "real" / "true" (SHOW these, don't state them)
|
||||
- "Bringing people together" (without HOW) / "community gathering place" (without proof)
|
||||
- "Hidden gem" / "secret" / "lesser-known" / "beloved by locals"
|
||||
- Generic adjectives: "beautiful," "gorgeous," "lovely," "cozy," "charming," "vibrant"
|
||||
- Vague temporal claims: "simpler times," "the good old days," "escape from the modern world"
|
||||
- Passive voice: "is known for," "has become famous for," "has earned a reputation"
|
||||
|
||||
================================================================================
|
||||
OPENING APPROACHES (Choose ONE per brewery)
|
||||
================================================================================
|
||||
|
||||
1. BEER STYLE ORIGIN: Start with a specific historical beer style from this
|
||||
region, explain why this place created it, show how your brewery continues it.
|
||||
Key: Name specific style → why this region made it → how you continue it
|
||||
|
||||
2. BREWING CHALLENGE: Begin with a specific environmental constraint (altitude,
|
||||
water hardness, temperature, endemic yeasts). Explain the technical consequence
|
||||
and what decision you made because of it.
|
||||
Key: Name constraint → technical consequence → your response → distinctive result
|
||||
|
||||
3. FOUNDING STORY: Why did the founder return/move HERE? What did they discover?
|
||||
What specific brewing decision followed? Include a concrete artifact (logs, equipment).
|
||||
Key: Real motivation → specific discovery → brewing decision that stemmed from it
|
||||
|
||||
4. LOCAL INGREDIENT: What unique resource defines your brewery? Why is it unique?
|
||||
What brewing constraint or opportunity does it create?
|
||||
Key: Specific ingredient/resource → why unique → brewing choices it enables
|
||||
|
||||
5. CONTRADICTION: What is the region famous for? Why does your brewery do the
|
||||
opposite? Make the contradiction a strength, not an apology.
|
||||
Key: Regional identity → why you diverge → what you do instead → why it works
|
||||
|
||||
6. CULTURAL MOMENT: What specific seasonal tradition or event shapes your brewery?
|
||||
How do you connect to it? What brewing decisions follow?
|
||||
Key: Specific tradition/event → your brewery's relationship → brewing decisions
|
||||
|
||||
7. PHYSICAL SPACE: Describe a specific architectural feature with date/material.
|
||||
How does it create technical advantage? What sensory details matter? Why keep
|
||||
constraints instead of modernizing?
|
||||
Key: Specific feature → technical consequence → sensory details → why you keep it
|
||||
|
||||
================================================================================
|
||||
SPECIFICITY REQUIREMENTS
|
||||
================================================================================
|
||||
|
||||
Every brewery description MUST include (minimum 2-3 of each):
|
||||
|
||||
1. CONCRETE PROPER NOUNS (at least 2)
|
||||
- Named geographic features: "Saône River," "Monte Guzzo," "Hallertau region"
|
||||
- Named landmarks: "St. Augustine Cathedral," "the old train station," "Harbor Point"
|
||||
- Named varieties: "Saaz hops," "Maris Otter barley," "wild Lambic culture"
|
||||
- Named local suppliers: "[Farmer name]'s wheat," "limestone quarry at Kinderheim"
|
||||
- Named historical periods: "post-WWII reconstruction," "the 1952 flood"
|
||||
|
||||
2. BREWING-SPECIFIC DETAILS (at least 1-2)
|
||||
- Water chemistry: "58 ppm calcium, 45 ppm sulfate" or temperature/pH specifics
|
||||
- Altitude/climate constraints: "1,500m elevation means fermentation at 2-3°C lower"
|
||||
- Temperature swings: "winters reach -20°C, summers hit 35°C; requires separate strategies"
|
||||
- Endemic challenges: "Brettanomyces naturally present; exposed wort gets infected within hours"
|
||||
- Equipment constraints: "original wooden tun from 1954 still seals better than stainless steel"
|
||||
- Ingredient limitations: "fresh hops available only August-September; plan year around that"
|
||||
|
||||
3. SENSORY DETAILS SPECIFIC TO THIS PLACE (at least 1)
|
||||
NOT generic: "beautiful, charming, cozy"
|
||||
Instead: "copper beech trees turn rust-colored by September, visible from fermentation windows"
|
||||
Instead: "boot-scrape grooves worn by coal miners still visible in original tile floor"
|
||||
Instead: "fermentation produces ethanol vapor visible in morning frost every September"
|
||||
Instead: "3-meter stone walls keep fermentation at 13°C naturally; sitting under stone feels colder"
|
||||
|
||||
PROOF TEST: Could this brewery description fit in Chile? Germany? Japan?
|
||||
- If YES: add more place-specific details
|
||||
- If NO: you're on track. Identity should be inseparable from location.
|
||||
|
||||
|
||||
================================================================================
|
||||
TONE VARIATIONS
|
||||
================================================================================
|
||||
|
||||
Rotate tones consciously. Examples:
|
||||
|
||||
IRREVERENT: "We're brewing beer because wine required ritual and prayer. Less
|
||||
spirituality, more hops. Our ales are big, unpolished. Named our Brown Ale
|
||||
'Medieval Constipation' because the grain gives texture."
|
||||
|
||||
MATTER-OF-FACT: "Brewing is applied chemistry. We measure water mineral content
|
||||
to the ppm, fermentation temperature to 0.5°C. Our Märzen has the same gravity,
|
||||
ABV, and color every single batch. Precision is our craft."
|
||||
|
||||
WORKING-CLASS PROUD: "This isn't farm-to-table aspirational nonsense. It's a
|
||||
neighborhood beer. Four dollars a pint. No reservations, no tasting notes.
|
||||
Workers need somewhere to go."
|
||||
|
||||
MINIMALIST: "We brew three beers. They're good. That's it."
|
||||
|
||||
NOSTALGIC-GROUNDED: "My grandfather brewed in his basement. When he died in
|
||||
1995, I found his brewing logs in 2015. I copied his exact recipes. Now the
|
||||
fermentation smells like his basement."
|
||||
|
||||
|
||||
================================================================================
|
||||
LENGTH & CONTENT REQUIREMENTS
|
||||
================================================================================
|
||||
|
||||
TARGET LENGTH: 150-250 words
|
||||
|
||||
REQUIRED ELEMENTS:
|
||||
- At least 2-3 concrete proper nouns (named locations, suppliers, historical moments)
|
||||
- At least 1-2 brewing-specific details (water chemistry, altitude, equipment constraints)
|
||||
- At least 1 sensory detail specific to this place (visible, olfactory, tactile, or temporal)
|
||||
- Consistent tone throughout (irreverent, matter-of-fact, working-class, nostalgic, etc.)
|
||||
- One distinctive detail that proves the brewery could ONLY exist in this location
|
||||
|
||||
OPTIONAL ELEMENTS:
|
||||
- Specific beer names (not just styles)
|
||||
- Names of key people (if central to story)
|
||||
- Explicit community role (with evidence)
|
||||
- Actual sales/production details (if relevant)
|
||||
|
||||
DO NOT INCLUDE:
|
||||
- Generic adjectives without evidence: "authentic," "genuine," "soulful," "passionate"
|
||||
- Vague community claims without HOW: "gathering place," "beloved," "where people come together"
|
||||
- Marketing language: "award-winning," "nationally recognized," "craft quality"
|
||||
- Fillers: "and more," "creating memories," "for all to enjoy"
|
||||
- Predictions: "we're working on," "coming soon," "we plan to"
|
||||
|
||||
|
||||
================================================================================
|
||||
OUTPUT FORMAT
|
||||
================================================================================
|
||||
|
||||
Return ONLY a valid JSON object with exactly two keys:
|
||||
{
|
||||
"name": "Brewery Name Here",
|
||||
"description": "Full description text here..."
|
||||
}
|
||||
|
||||
Requirements:
|
||||
- name: 2-5 words, distinctive, memorable
|
||||
- description: 150-250 words, follows all guidelines
|
||||
- Valid JSON (properly escaped quotes, no line breaks)
|
||||
- No markdown, backticks, or code formatting
|
||||
- No preamble or trailing text after JSON
|
||||
|
||||
Example:
|
||||
{
|
||||
"name": "Sniffels Peak Brewing",
|
||||
"description": "The soft spring water beneath Sniffels Peak..."
|
||||
}
|
||||
|
||||
================================================================================
|
||||
158
pipeline/src/biergarten_data_generator.cpp
Normal file
@@ -0,0 +1,158 @@
|
||||
#include "biergarten_data_generator.h"
|
||||
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include <algorithm>
|
||||
#include <filesystem>
|
||||
#include <unordered_map>
|
||||
|
||||
#include "data_generation/data_downloader.h"
|
||||
#include "data_generation/llama_generator.h"
|
||||
#include "data_generation/mock_generator.h"
|
||||
#include "json_handling/json_loader.h"
|
||||
#include "wikipedia/wikipedia_service.h"
|
||||
|
||||
BiergartenDataGenerator::BiergartenDataGenerator(
|
||||
const ApplicationOptions& options, std::shared_ptr<WebClient> web_client,
|
||||
SqliteDatabase& database)
|
||||
: options_(options), webClient_(web_client), database_(database) {}
|
||||
|
||||
std::unique_ptr<DataGenerator> BiergartenDataGenerator::InitializeGenerator() {
|
||||
spdlog::info("Initializing brewery generator...");
|
||||
|
||||
std::unique_ptr<DataGenerator> generator;
|
||||
if (options_.model_path.empty()) {
|
||||
generator = std::make_unique<MockGenerator>();
|
||||
spdlog::info("[Generator] Using MockGenerator (no model path provided)");
|
||||
} else {
|
||||
auto llama_generator = std::make_unique<LlamaGenerator>();
|
||||
llama_generator->SetSamplingOptions(options_.temperature, options_.top_p,
|
||||
options_.seed);
|
||||
llama_generator->SetContextSize(options_.n_ctx);
|
||||
spdlog::info(
|
||||
"[Generator] Using LlamaGenerator: {} (temperature={}, top-p={}, "
|
||||
"n_ctx={}, seed={})",
|
||||
options_.model_path, options_.temperature, options_.top_p,
|
||||
options_.n_ctx, options_.seed);
|
||||
generator = std::move(llama_generator);
|
||||
}
|
||||
generator->Load(options_.model_path);
|
||||
|
||||
return generator;
|
||||
}
|
||||
|
||||
void BiergartenDataGenerator::LoadGeographicData() {
|
||||
std::string json_path = options_.cache_dir + "/countries+states+cities.json";
|
||||
std::string db_path = options_.cache_dir + "/biergarten-pipeline.db";
|
||||
|
||||
bool has_json_cache = std::filesystem::exists(json_path);
|
||||
bool has_db_cache = std::filesystem::exists(db_path);
|
||||
|
||||
spdlog::info("Initializing SQLite database at {}...", db_path);
|
||||
database_.Initialize(db_path);
|
||||
|
||||
if (has_db_cache && has_json_cache) {
|
||||
spdlog::info("[Pipeline] Cache hit: skipping download and parse");
|
||||
} else {
|
||||
spdlog::info("\n[Pipeline] Downloading geographic data from GitHub...");
|
||||
DataDownloader downloader(webClient_);
|
||||
downloader.DownloadCountriesDatabase(json_path, options_.commit);
|
||||
|
||||
JsonLoader::LoadWorldCities(json_path, database_);
|
||||
}
|
||||
}
|
||||
|
||||
std::vector<std::pair<City, std::string>>
|
||||
BiergartenDataGenerator::QueryCitiesWithCountries() {
|
||||
spdlog::info("\n=== GEOGRAPHIC DATA OVERVIEW ===");
|
||||
|
||||
auto cities = database_.QueryCities();
|
||||
|
||||
// Build a quick map of country id -> name for per-city lookups.
|
||||
auto all_countries = database_.QueryCountries(0);
|
||||
std::unordered_map<int, std::string> country_map;
|
||||
for (const auto& c : all_countries) {
|
||||
country_map[c.id] = c.name;
|
||||
}
|
||||
|
||||
spdlog::info("\nTotal records loaded:");
|
||||
spdlog::info(" Countries: {}", database_.QueryCountries(0).size());
|
||||
spdlog::info(" States: {}", database_.QueryStates(0).size());
|
||||
spdlog::info(" Cities: {}", cities.size());
|
||||
|
||||
// Cap at 30 entries.
|
||||
const size_t sample_count = std::min(size_t(30), cities.size());
|
||||
std::vector<std::pair<City, std::string>> result;
|
||||
|
||||
for (size_t i = 0; i < sample_count; i++) {
|
||||
const auto& city = cities[i];
|
||||
std::string country_name;
|
||||
const auto country_it = country_map.find(city.country_id);
|
||||
if (country_it != country_map.end()) {
|
||||
country_name = country_it->second;
|
||||
}
|
||||
result.push_back({city, country_name});
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
std::vector<BiergartenDataGenerator::EnrichedCity>
|
||||
BiergartenDataGenerator::EnrichWithWikipedia(
|
||||
const std::vector<std::pair<City, std::string>>& cities) {
|
||||
WikipediaService wikipedia_service(webClient_);
|
||||
std::vector<EnrichedCity> enriched;
|
||||
|
||||
for (const auto& [city, country_name] : cities) {
|
||||
const std::string region_context =
|
||||
wikipedia_service.GetSummary(city.name, country_name);
|
||||
spdlog::debug("[Pipeline] Region context for {}: {}", city.name,
|
||||
region_context);
|
||||
|
||||
enriched.push_back({city.id, city.name, country_name, region_context});
|
||||
}
|
||||
|
||||
return enriched;
|
||||
}
|
||||
|
||||
void BiergartenDataGenerator::GenerateBreweries(
|
||||
DataGenerator& generator, const std::vector<EnrichedCity>& cities) {
|
||||
spdlog::info("\n=== SAMPLE BREWERY GENERATION ===");
|
||||
generatedBreweries_.clear();
|
||||
|
||||
for (const auto& enriched_city : cities) {
|
||||
auto brewery = generator.GenerateBrewery(enriched_city.city_name,
|
||||
enriched_city.country_name,
|
||||
enriched_city.region_context);
|
||||
generatedBreweries_.push_back(
|
||||
{enriched_city.city_id, enriched_city.city_name, brewery});
|
||||
}
|
||||
}
|
||||
|
||||
void BiergartenDataGenerator::LogResults() const {
|
||||
spdlog::info("\n=== GENERATED DATA DUMP ===");
|
||||
for (size_t i = 0; i < generatedBreweries_.size(); i++) {
|
||||
const auto& entry = generatedBreweries_[i];
|
||||
spdlog::info("{}. city_id={} city=\"{}\"", i + 1, entry.city_id,
|
||||
entry.city_name);
|
||||
spdlog::info(" brewery_name=\"{}\"", entry.brewery.name);
|
||||
spdlog::info(" brewery_description=\"{}\"", entry.brewery.description);
|
||||
}
|
||||
}
|
||||
|
||||
int BiergartenDataGenerator::Run() {
|
||||
try {
|
||||
LoadGeographicData();
|
||||
auto generator = InitializeGenerator();
|
||||
auto cities = QueryCitiesWithCountries();
|
||||
auto enriched = EnrichWithWikipedia(cities);
|
||||
GenerateBreweries(*generator, enriched);
|
||||
LogResults();
|
||||
|
||||
spdlog::info("\nOK: Pipeline completed successfully");
|
||||
return 0;
|
||||
} catch (const std::exception& e) {
|
||||
spdlog::error("ERROR: Pipeline failed: {}", e.what());
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
44
pipeline/src/data_generation/data_downloader.cpp
Normal file
@@ -0,0 +1,44 @@
|
||||
#include "data_generation/data_downloader.h"
|
||||
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include <filesystem>
|
||||
#include <fstream>
|
||||
#include <sstream>
|
||||
#include <stdexcept>
|
||||
|
||||
#include "web_client/web_client.h"
|
||||
|
||||
DataDownloader::DataDownloader(std::shared_ptr<WebClient> web_client)
|
||||
: web_client_(std::move(web_client)) {}
|
||||
|
||||
DataDownloader::~DataDownloader() {}
|
||||
|
||||
bool DataDownloader::FileExists(const std::string& file_path) {
|
||||
return std::filesystem::exists(file_path);
|
||||
}
|
||||
|
||||
std::string DataDownloader::DownloadCountriesDatabase(
|
||||
const std::string& cache_path, const std::string& commit) {
|
||||
if (FileExists(cache_path)) {
|
||||
spdlog::info("[DataDownloader] Cache hit: {}", cache_path);
|
||||
return cache_path;
|
||||
}
|
||||
|
||||
std::string url =
|
||||
"https://raw.githubusercontent.com/dr5hn/"
|
||||
"countries-states-cities-database/" +
|
||||
commit + "/json/countries+states+cities.json";
|
||||
|
||||
spdlog::info("[DataDownloader] Downloading: {}", url);
|
||||
|
||||
web_client_->DownloadToFile(url, cache_path);
|
||||
|
||||
std::ifstream file_check(cache_path, std::ios::binary | std::ios::ate);
|
||||
std::streamsize size = file_check.tellg();
|
||||
file_check.close();
|
||||
|
||||
spdlog::info("[DataDownloader] OK: Download complete: {} ({:.2f} MB)",
|
||||
cache_path, (size / (1024.0 * 1024.0)));
|
||||
return cache_path;
|
||||
}
|
||||
31
pipeline/src/data_generation/llama/destructor.cpp
Normal file
@@ -0,0 +1,31 @@
|
||||
/**
|
||||
* Destructor Module
|
||||
* Ensures proper cleanup of llama.cpp resources (context and model) when the
|
||||
* generator is destroyed, preventing memory leaks and resource exhaustion.
|
||||
*/
|
||||
|
||||
#include "data_generation/llama_generator.h"
|
||||
#include "llama.h"
|
||||
|
||||
LlamaGenerator::~LlamaGenerator() {
|
||||
/**
|
||||
* Free the inference context (contains KV cache and computation state)
|
||||
*/
|
||||
if (context_ != nullptr) {
|
||||
llama_free(context_);
|
||||
context_ = nullptr;
|
||||
}
|
||||
|
||||
/**
|
||||
* Free the loaded model (contains weights and vocabulary)
|
||||
*/
|
||||
if (model_ != nullptr) {
|
||||
llama_model_free(model_);
|
||||
model_ = nullptr;
|
||||
}
|
||||
|
||||
/**
|
||||
* Clean up the backend (GPU/CPU acceleration resources)
|
||||
*/
|
||||
llama_backend_free();
|
||||
}
|
||||
107
pipeline/src/data_generation/llama/generate_brewery.cpp
Normal file
@@ -0,0 +1,107 @@
|
||||
/**
|
||||
* Brewery Data Generation Module
|
||||
* Uses the LLM to generate realistic brewery names and descriptions for a given
|
||||
* location. Implements retry logic with validation and error correction to
|
||||
* ensure valid JSON output conforming to the expected schema.
|
||||
*/
|
||||
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include <stdexcept>
|
||||
#include <string>
|
||||
|
||||
#include "data_generation/llama_generator.h"
|
||||
#include "data_generation/llama_generator_helpers.h"
|
||||
|
||||
BreweryResult LlamaGenerator::GenerateBrewery(
|
||||
const std::string& city_name, const std::string& country_name,
|
||||
const std::string& region_context) {
|
||||
/**
|
||||
* Preprocess and truncate region context to manageable size
|
||||
*/
|
||||
const std::string safe_region_context =
|
||||
PrepareRegionContextPublic(region_context);
|
||||
|
||||
/**
|
||||
* Load brewery system prompt from file
|
||||
* Falls back to minimal inline prompt if file not found
|
||||
* Default path: prompts/brewery_system_prompt_expanded.txt
|
||||
*/
|
||||
const std::string system_prompt =
|
||||
LoadBrewerySystemPrompt("prompts/brewery_system_prompt_expanded.txt");
|
||||
|
||||
/**
|
||||
* User prompt: provides geographic context to guide generation towards
|
||||
* culturally appropriate and locally-inspired brewery attributes
|
||||
*/
|
||||
std::string prompt =
|
||||
"Write a brewery name and place-specific long description for a craft "
|
||||
"brewery in " +
|
||||
city_name +
|
||||
(country_name.empty() ? std::string("")
|
||||
: std::string(", ") + country_name) +
|
||||
(safe_region_context.empty()
|
||||
? std::string(".")
|
||||
: std::string(". Regional context: ") + safe_region_context);
|
||||
|
||||
/**
|
||||
* Store location context for retry prompts (without repeating full context)
|
||||
*/
|
||||
const std::string retry_location =
|
||||
"Location: " + city_name +
|
||||
(country_name.empty() ? std::string("")
|
||||
: std::string(", ") + country_name);
|
||||
|
||||
/**
|
||||
* RETRY LOOP with validation and error correction
|
||||
* Attempts to generate valid brewery data up to 3 times, with feedback-based
|
||||
* refinement
|
||||
*/
|
||||
const int max_attempts = 3;
|
||||
std::string raw;
|
||||
std::string last_error;
|
||||
|
||||
// Limit output length to keep it concise and focused
|
||||
constexpr int max_tokens = 1052;
|
||||
for (int attempt = 0; attempt < max_attempts; ++attempt) {
|
||||
// Generate brewery data from LLM
|
||||
raw = Infer(system_prompt, prompt, max_tokens);
|
||||
spdlog::debug("LlamaGenerator: raw output (attempt {}): {}", attempt + 1,
|
||||
raw);
|
||||
|
||||
// Validate output: parse JSON and check required fields
|
||||
|
||||
std::string name;
|
||||
std::string description;
|
||||
const std::string validation_error =
|
||||
ValidateBreweryJsonPublic(raw, name, description);
|
||||
if (validation_error.empty()) {
|
||||
// Success: return parsed brewery data
|
||||
return {std::move(name), std::move(description)};
|
||||
}
|
||||
|
||||
// Validation failed: log error and prepare corrective feedback
|
||||
|
||||
last_error = validation_error;
|
||||
spdlog::warn("LlamaGenerator: malformed brewery JSON (attempt {}): {}",
|
||||
attempt + 1, validation_error);
|
||||
|
||||
// Update prompt with error details to guide LLM toward correct output.
|
||||
// For retries, use a compact prompt format to avoid exceeding token
|
||||
// limits.
|
||||
prompt =
|
||||
"Your previous response was invalid. Error: " + validation_error +
|
||||
"\nReturn ONLY valid JSON with this exact schema: "
|
||||
"{\"name\": \"string\", \"description\": \"string\"}."
|
||||
"\nDo not include markdown, comments, or extra keys."
|
||||
"\n\n" +
|
||||
retry_location;
|
||||
}
|
||||
|
||||
// All retry attempts exhausted: log failure and throw exception
|
||||
spdlog::error(
|
||||
"LlamaGenerator: malformed brewery response after {} attempts: "
|
||||
"{}",
|
||||
max_attempts, last_error.empty() ? raw : last_error);
|
||||
throw std::runtime_error("LlamaGenerator: malformed brewery response");
|
||||
}
|
||||
102
pipeline/src/data_generation/llama/generate_user.cpp
Normal file
@@ -0,0 +1,102 @@
|
||||
/**
|
||||
* User Profile Generation Module
|
||||
* Uses the LLM to generate realistic user profiles (username and bio) for craft
|
||||
* beer enthusiasts. Implements retry logic to handle parsing failures and
|
||||
* ensures output adheres to strict format constraints (two lines, specific
|
||||
* character limits).
|
||||
*/
|
||||
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include <algorithm>
|
||||
#include <stdexcept>
|
||||
#include <string>
|
||||
|
||||
#include "data_generation/llama_generator.h"
|
||||
#include "data_generation/llama_generator_helpers.h"
|
||||
|
||||
UserResult LlamaGenerator::GenerateUser(const std::string& locale) {
|
||||
/**
|
||||
* System prompt: specifies exact output format to minimize parsing errors
|
||||
* Constraints: 2-line output, username format, bio length bounds
|
||||
*/
|
||||
const std::string system_prompt =
|
||||
"You generate plausible social media profiles for craft beer "
|
||||
"enthusiasts. "
|
||||
"Respond with exactly two lines: "
|
||||
"the first line is a username (lowercase, no spaces, 8-20 characters), "
|
||||
"the second line is a one-sentence bio (20-40 words). "
|
||||
"The profile should feel consistent with the locale. "
|
||||
"No preamble, no labels.";
|
||||
|
||||
/**
|
||||
* User prompt: locale parameter guides cultural appropriateness of generated
|
||||
* profiles
|
||||
*/
|
||||
std::string prompt =
|
||||
"Generate a craft beer enthusiast profile. Locale: " + locale;
|
||||
|
||||
/**
|
||||
* RETRY LOOP with format validation
|
||||
* Attempts up to 3 times to generate valid user profile with correct format
|
||||
*/
|
||||
const int max_attempts = 3;
|
||||
std::string raw;
|
||||
for (int attempt = 0; attempt < max_attempts; ++attempt) {
|
||||
/**
|
||||
* Generate user profile (max 128 tokens - should fit 2 lines easily)
|
||||
*/
|
||||
raw = Infer(system_prompt, prompt, 128);
|
||||
spdlog::debug("LlamaGenerator (user): raw output (attempt {}): {}",
|
||||
attempt + 1, raw);
|
||||
|
||||
try {
|
||||
/**
|
||||
* Parse two-line response: first line = username, second line = bio
|
||||
*/
|
||||
auto [username, bio] = ParseTwoLineResponsePublic(
|
||||
raw, "LlamaGenerator: malformed user response");
|
||||
|
||||
/**
|
||||
* Remove any whitespace from username (usernames shouldn't have
|
||||
* spaces)
|
||||
*/
|
||||
username.erase(
|
||||
std::remove_if(username.begin(), username.end(),
|
||||
[](unsigned char ch) { return std::isspace(ch); }),
|
||||
username.end());
|
||||
|
||||
/**
|
||||
* Validate both fields are non-empty after processing
|
||||
*/
|
||||
if (username.empty() || bio.empty()) {
|
||||
throw std::runtime_error("LlamaGenerator: malformed user response");
|
||||
}
|
||||
|
||||
/**
|
||||
* Truncate bio if exceeds reasonable length for bio field
|
||||
*/
|
||||
if (bio.size() > 200) bio = bio.substr(0, 200);
|
||||
|
||||
/**
|
||||
* Success: return parsed user profile
|
||||
*/
|
||||
return {username, bio};
|
||||
} catch (const std::exception& e) {
|
||||
/**
|
||||
* Parsing failed: log and continue to next attempt
|
||||
*/
|
||||
spdlog::warn(
|
||||
"LlamaGenerator: malformed user response (attempt {}): {}",
|
||||
attempt + 1, e.what());
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* All retry attempts exhausted: log failure and throw exception
|
||||
*/
|
||||
spdlog::error(
|
||||
"LlamaGenerator: malformed user response after {} attempts: {}",
|
||||
max_attempts, raw);
|
||||
throw std::runtime_error("LlamaGenerator: malformed user response");
|
||||
}
|
||||
441
pipeline/src/data_generation/llama/helpers.cpp
Normal file
@@ -0,0 +1,441 @@
|
||||
/**
|
||||
* Helper Functions Module
|
||||
* Provides utility functions for text processing, parsing, and chat template
|
||||
* formatting. Functions handle whitespace normalization, response parsing, and
|
||||
* conversion of prompts to proper chat format using the model's built-in
|
||||
* template.
|
||||
*/
|
||||
|
||||
#include <algorithm>
|
||||
#include <array>
|
||||
#include <boost/json.hpp>
|
||||
#include <cctype>
|
||||
#include <sstream>
|
||||
#include <stdexcept>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
#include "data_generation/llama_generator.h"
|
||||
#include "llama.h"
|
||||
|
||||
namespace {
|
||||
|
||||
/**
|
||||
* String trimming: removes leading and trailing whitespace
|
||||
*/
|
||||
std::string Trim(std::string value) {
|
||||
auto not_space = [](unsigned char ch) { return !std::isspace(ch); };
|
||||
|
||||
value.erase(value.begin(),
|
||||
std::find_if(value.begin(), value.end(), not_space));
|
||||
value.erase(std::find_if(value.rbegin(), value.rend(), not_space).base(),
|
||||
value.end());
|
||||
|
||||
return value;
|
||||
}
|
||||
|
||||
/**
|
||||
* Normalize whitespace: collapses multiple spaces/tabs/newlines into single
|
||||
* spaces
|
||||
*/
|
||||
std::string CondenseWhitespace(std::string text) {
|
||||
std::string out;
|
||||
out.reserve(text.size());
|
||||
|
||||
bool in_whitespace = false;
|
||||
for (unsigned char ch : text) {
|
||||
if (std::isspace(ch)) {
|
||||
if (!in_whitespace) {
|
||||
out.push_back(' ');
|
||||
in_whitespace = true;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
in_whitespace = false;
|
||||
out.push_back(static_cast<char>(ch));
|
||||
}
|
||||
|
||||
return Trim(std::move(out));
|
||||
}
|
||||
|
||||
/**
|
||||
* Truncate region context to fit within max length while preserving word
|
||||
* boundaries
|
||||
*/
|
||||
std::string PrepareRegionContext(std::string_view region_context,
|
||||
std::size_t max_chars) {
|
||||
std::string normalized = CondenseWhitespace(std::string(region_context));
|
||||
if (normalized.size() <= max_chars) {
|
||||
return normalized;
|
||||
}
|
||||
|
||||
normalized.resize(max_chars);
|
||||
const std::size_t last_space = normalized.find_last_of(' ');
|
||||
if (last_space != std::string::npos && last_space > max_chars / 2) {
|
||||
normalized.resize(last_space);
|
||||
}
|
||||
|
||||
normalized += "...";
|
||||
return normalized;
|
||||
}
|
||||
|
||||
/**
|
||||
* Remove common bullet points, numbers, and field labels added by LLM in output
|
||||
*/
|
||||
std::string StripCommonPrefix(std::string line) {
|
||||
line = Trim(std::move(line));
|
||||
|
||||
if (!line.empty() && (line[0] == '-' || line[0] == '*')) {
|
||||
line = Trim(line.substr(1));
|
||||
} else {
|
||||
std::size_t i = 0;
|
||||
while (i < line.size() &&
|
||||
std::isdigit(static_cast<unsigned char>(line[i]))) {
|
||||
++i;
|
||||
}
|
||||
if (i > 0 && i < line.size() && (line[i] == '.' || line[i] == ')')) {
|
||||
line = Trim(line.substr(i + 1));
|
||||
}
|
||||
}
|
||||
|
||||
auto strip_label = [&line](const std::string& label) {
|
||||
if (line.size() >= label.size()) {
|
||||
bool matches = true;
|
||||
for (std::size_t i = 0; i < label.size(); ++i) {
|
||||
if (std::tolower(static_cast<unsigned char>(line[i])) !=
|
||||
std::tolower(static_cast<unsigned char>(label[i]))) {
|
||||
matches = false;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (matches) {
|
||||
line = Trim(line.substr(label.size()));
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
strip_label("name:");
|
||||
strip_label("brewery name:");
|
||||
strip_label("description:");
|
||||
strip_label("username:");
|
||||
strip_label("bio:");
|
||||
|
||||
return Trim(std::move(line));
|
||||
}
|
||||
|
||||
/**
|
||||
* Parse two-line response from LLM: normalize line endings, strip formatting,
|
||||
* filter spurious output, and combine remaining lines if needed
|
||||
*/
|
||||
std::pair<std::string, std::string> ParseTwoLineResponse(
|
||||
const std::string& raw, const std::string& error_message) {
|
||||
std::string normalized = raw;
|
||||
std::replace(normalized.begin(), normalized.end(), '\r', '\n');
|
||||
|
||||
std::vector<std::string> lines;
|
||||
std::stringstream stream(normalized);
|
||||
std::string line;
|
||||
while (std::getline(stream, line)) {
|
||||
line = StripCommonPrefix(std::move(line));
|
||||
if (!line.empty()) lines.push_back(std::move(line));
|
||||
}
|
||||
|
||||
std::vector<std::string> filtered;
|
||||
for (auto& l : lines) {
|
||||
std::string low = l;
|
||||
std::transform(low.begin(), low.end(), low.begin(), [](unsigned char c) {
|
||||
return static_cast<char>(std::tolower(c));
|
||||
});
|
||||
// Filter known thinking tags like <think>...</think>, but be conservative
|
||||
// to avoid removing legitimate output. Only filter specific known
|
||||
// patterns.
|
||||
if (!l.empty() && l.front() == '<' && low.back() == '>') {
|
||||
// Only filter if it's a known thinking tag: <think>, <reasoning>, etc.
|
||||
if (low.find("think") != std::string::npos ||
|
||||
low.find("reasoning") != std::string::npos ||
|
||||
low.find("reflect") != std::string::npos) {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
if (low.rfind("okay,", 0) == 0 || low.rfind("hmm", 0) == 0) continue;
|
||||
filtered.push_back(std::move(l));
|
||||
}
|
||||
|
||||
if (filtered.size() < 2) throw std::runtime_error(error_message);
|
||||
|
||||
std::string first = Trim(filtered.front());
|
||||
std::string second;
|
||||
for (size_t i = 1; i < filtered.size(); ++i) {
|
||||
if (!second.empty()) second += ' ';
|
||||
second += filtered[i];
|
||||
}
|
||||
second = Trim(std::move(second));
|
||||
|
||||
if (first.empty() || second.empty()) throw std::runtime_error(error_message);
|
||||
return {first, second};
|
||||
}
|
||||
|
||||
/**
|
||||
* Apply model's chat template to user-only prompt, formatting it for the model
|
||||
*/
|
||||
std::string ToChatPrompt(const llama_model* model,
|
||||
const std::string& user_prompt) {
|
||||
const char* tmpl = llama_model_chat_template(model, nullptr);
|
||||
if (tmpl == nullptr) {
|
||||
return user_prompt;
|
||||
}
|
||||
|
||||
const llama_chat_message message{"user", user_prompt.c_str()};
|
||||
|
||||
std::vector<char> buffer(
|
||||
std::max<std::size_t>(1024, user_prompt.size() * 4));
|
||||
int32_t required =
|
||||
llama_chat_apply_template(tmpl, &message, 1, true, buffer.data(),
|
||||
static_cast<int32_t>(buffer.size()));
|
||||
|
||||
if (required < 0) {
|
||||
throw std::runtime_error("LlamaGenerator: failed to apply chat template");
|
||||
}
|
||||
|
||||
if (required >= static_cast<int32_t>(buffer.size())) {
|
||||
buffer.resize(static_cast<std::size_t>(required) + 1);
|
||||
required =
|
||||
llama_chat_apply_template(tmpl, &message, 1, true, buffer.data(),
|
||||
static_cast<int32_t>(buffer.size()));
|
||||
if (required < 0) {
|
||||
throw std::runtime_error(
|
||||
"LlamaGenerator: failed to apply chat template");
|
||||
}
|
||||
}
|
||||
|
||||
return std::string(buffer.data(), static_cast<std::size_t>(required));
|
||||
}
|
||||
|
||||
/**
|
||||
* Apply model's chat template to system+user prompt pair, formatting for the
|
||||
* model
|
||||
*/
|
||||
std::string ToChatPrompt(const llama_model* model,
|
||||
const std::string& system_prompt,
|
||||
const std::string& user_prompt) {
|
||||
const char* tmpl = llama_model_chat_template(model, nullptr);
|
||||
if (tmpl == nullptr) {
|
||||
return system_prompt + "\n\n" + user_prompt;
|
||||
}
|
||||
|
||||
const llama_chat_message messages[2] = {{"system", system_prompt.c_str()},
|
||||
{"user", user_prompt.c_str()}};
|
||||
|
||||
std::vector<char> buffer(std::max<std::size_t>(
|
||||
1024, (system_prompt.size() + user_prompt.size()) * 4));
|
||||
int32_t required =
|
||||
llama_chat_apply_template(tmpl, messages, 2, true, buffer.data(),
|
||||
static_cast<int32_t>(buffer.size()));
|
||||
|
||||
if (required < 0) {
|
||||
throw std::runtime_error("LlamaGenerator: failed to apply chat template");
|
||||
}
|
||||
|
||||
if (required >= static_cast<int32_t>(buffer.size())) {
|
||||
buffer.resize(static_cast<std::size_t>(required) + 1);
|
||||
required =
|
||||
llama_chat_apply_template(tmpl, messages, 2, true, buffer.data(),
|
||||
static_cast<int32_t>(buffer.size()));
|
||||
if (required < 0) {
|
||||
throw std::runtime_error(
|
||||
"LlamaGenerator: failed to apply chat template");
|
||||
}
|
||||
}
|
||||
|
||||
return std::string(buffer.data(), static_cast<std::size_t>(required));
|
||||
}
|
||||
|
||||
void AppendTokenPiece(const llama_vocab* vocab, llama_token token,
|
||||
std::string& output) {
|
||||
std::array<char, 256> buffer{};
|
||||
int32_t bytes =
|
||||
llama_token_to_piece(vocab, token, buffer.data(),
|
||||
static_cast<int32_t>(buffer.size()), 0, true);
|
||||
|
||||
if (bytes < 0) {
|
||||
std::vector<char> dynamic_buffer(static_cast<std::size_t>(-bytes));
|
||||
bytes = llama_token_to_piece(vocab, token, dynamic_buffer.data(),
|
||||
static_cast<int32_t>(dynamic_buffer.size()),
|
||||
0, true);
|
||||
if (bytes < 0) {
|
||||
throw std::runtime_error(
|
||||
"LlamaGenerator: failed to decode sampled token piece");
|
||||
}
|
||||
|
||||
output.append(dynamic_buffer.data(), static_cast<std::size_t>(bytes));
|
||||
return;
|
||||
}
|
||||
|
||||
output.append(buffer.data(), static_cast<std::size_t>(bytes));
|
||||
}
|
||||
|
||||
bool ExtractFirstJsonObject(const std::string& text, std::string& json_out) {
|
||||
std::size_t start = std::string::npos;
|
||||
int depth = 0;
|
||||
bool in_string = false;
|
||||
bool escaped = false;
|
||||
|
||||
for (std::size_t i = 0; i < text.size(); ++i) {
|
||||
const char ch = text[i];
|
||||
|
||||
if (in_string) {
|
||||
if (escaped) {
|
||||
escaped = false;
|
||||
} else if (ch == '\\') {
|
||||
escaped = true;
|
||||
} else if (ch == '"') {
|
||||
in_string = false;
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
if (ch == '"') {
|
||||
in_string = true;
|
||||
continue;
|
||||
}
|
||||
|
||||
if (ch == '{') {
|
||||
if (depth == 0) {
|
||||
start = i;
|
||||
}
|
||||
++depth;
|
||||
continue;
|
||||
}
|
||||
|
||||
if (ch == '}') {
|
||||
if (depth == 0) {
|
||||
continue;
|
||||
}
|
||||
--depth;
|
||||
if (depth == 0 && start != std::string::npos) {
|
||||
json_out = text.substr(start, i - start + 1);
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
std::string ValidateBreweryJson(const std::string& raw, std::string& name_out,
|
||||
std::string& description_out) {
|
||||
auto validate_object = [&](const boost::json::value& jv,
|
||||
std::string& error_out) -> bool {
|
||||
if (!jv.is_object()) {
|
||||
error_out = "JSON root must be an object";
|
||||
return false;
|
||||
}
|
||||
|
||||
const auto& obj = jv.get_object();
|
||||
if (!obj.contains("name") || !obj.at("name").is_string()) {
|
||||
error_out = "JSON field 'name' is missing or not a string";
|
||||
return false;
|
||||
}
|
||||
|
||||
if (!obj.contains("description") || !obj.at("description").is_string()) {
|
||||
error_out = "JSON field 'description' is missing or not a string";
|
||||
return false;
|
||||
}
|
||||
|
||||
name_out = Trim(std::string(obj.at("name").as_string().c_str()));
|
||||
description_out =
|
||||
Trim(std::string(obj.at("description").as_string().c_str()));
|
||||
|
||||
if (name_out.empty()) {
|
||||
error_out = "JSON field 'name' must not be empty";
|
||||
return false;
|
||||
}
|
||||
|
||||
if (description_out.empty()) {
|
||||
error_out = "JSON field 'description' must not be empty";
|
||||
return false;
|
||||
}
|
||||
|
||||
std::string name_lower = name_out;
|
||||
std::string description_lower = description_out;
|
||||
std::transform(
|
||||
name_lower.begin(), name_lower.end(), name_lower.begin(),
|
||||
[](unsigned char c) { return static_cast<char>(std::tolower(c)); });
|
||||
std::transform(description_lower.begin(), description_lower.end(),
|
||||
description_lower.begin(), [](unsigned char c) {
|
||||
return static_cast<char>(std::tolower(c));
|
||||
});
|
||||
|
||||
if (name_lower == "string" || description_lower == "string") {
|
||||
error_out = "JSON appears to be a schema placeholder, not content";
|
||||
return false;
|
||||
}
|
||||
|
||||
error_out.clear();
|
||||
return true;
|
||||
};
|
||||
|
||||
boost::system::error_code ec;
|
||||
boost::json::value jv = boost::json::parse(raw, ec);
|
||||
std::string validation_error;
|
||||
if (ec) {
|
||||
std::string extracted;
|
||||
if (!ExtractFirstJsonObject(raw, extracted)) {
|
||||
return "JSON parse error: " + ec.message();
|
||||
}
|
||||
|
||||
ec.clear();
|
||||
jv = boost::json::parse(extracted, ec);
|
||||
if (ec) {
|
||||
return "JSON parse error: " + ec.message();
|
||||
}
|
||||
|
||||
if (!validate_object(jv, validation_error)) {
|
||||
return validation_error;
|
||||
}
|
||||
|
||||
return {};
|
||||
}
|
||||
|
||||
if (!validate_object(jv, validation_error)) {
|
||||
return validation_error;
|
||||
}
|
||||
|
||||
return {};
|
||||
}
|
||||
|
||||
} // namespace
|
||||
|
||||
// Forward declarations for helper functions exposed to other translation units
|
||||
std::string PrepareRegionContextPublic(std::string_view region_context,
|
||||
std::size_t max_chars) {
|
||||
return PrepareRegionContext(region_context, max_chars);
|
||||
}
|
||||
|
||||
std::pair<std::string, std::string> ParseTwoLineResponsePublic(
|
||||
const std::string& raw, const std::string& error_message) {
|
||||
return ParseTwoLineResponse(raw, error_message);
|
||||
}
|
||||
|
||||
std::string ToChatPromptPublic(const llama_model* model,
|
||||
const std::string& user_prompt) {
|
||||
return ToChatPrompt(model, user_prompt);
|
||||
}
|
||||
|
||||
std::string ToChatPromptPublic(const llama_model* model,
|
||||
const std::string& system_prompt,
|
||||
const std::string& user_prompt) {
|
||||
return ToChatPrompt(model, system_prompt, user_prompt);
|
||||
}
|
||||
|
||||
void AppendTokenPiecePublic(const llama_vocab* vocab, llama_token token,
|
||||
std::string& output) {
|
||||
AppendTokenPiece(vocab, token, output);
|
||||
}
|
||||
|
||||
std::string ValidateBreweryJsonPublic(const std::string& raw,
|
||||
std::string& name_out,
|
||||
std::string& description_out) {
|
||||
return ValidateBreweryJson(raw, name_out, description_out);
|
||||
}
|
||||
196
pipeline/src/data_generation/llama/infer.cpp
Normal file
@@ -0,0 +1,196 @@
|
||||
/**
|
||||
* Text Generation / Inference Module
|
||||
* Core module that performs LLM inference: converts text prompts into tokens,
|
||||
* runs the neural network forward pass, samples the next token, and converts
|
||||
* output tokens back to text. Supports both simple and system+user prompts.
|
||||
*/
|
||||
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include <algorithm>
|
||||
#include <memory>
|
||||
#include <stdexcept>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
#include "data_generation/llama_generator.h"
|
||||
#include "data_generation/llama_generator_helpers.h"
|
||||
#include "llama.h"
|
||||
|
||||
std::string LlamaGenerator::Infer(const std::string& prompt, int max_tokens) {
|
||||
return InferFormatted(ToChatPromptPublic(model_, prompt), max_tokens);
|
||||
}
|
||||
|
||||
std::string LlamaGenerator::Infer(const std::string& system_prompt,
|
||||
const std::string& prompt, int max_tokens) {
|
||||
return InferFormatted(ToChatPromptPublic(model_, system_prompt, prompt),
|
||||
max_tokens);
|
||||
}
|
||||
|
||||
std::string LlamaGenerator::InferFormatted(const std::string& formatted_prompt,
|
||||
int max_tokens) {
|
||||
/**
|
||||
* Validate that model and context are loaded
|
||||
*/
|
||||
if (model_ == nullptr || context_ == nullptr)
|
||||
throw std::runtime_error("LlamaGenerator: model not loaded");
|
||||
|
||||
/**
|
||||
* Get vocabulary for tokenization and token-to-text conversion
|
||||
*/
|
||||
const llama_vocab* vocab = llama_model_get_vocab(model_);
|
||||
if (vocab == nullptr)
|
||||
throw std::runtime_error("LlamaGenerator: vocab unavailable");
|
||||
|
||||
/**
|
||||
* Clear KV cache to ensure clean inference state (no residual context)
|
||||
*/
|
||||
llama_memory_clear(llama_get_memory(context_), true);
|
||||
|
||||
/**
|
||||
* TOKENIZATION PHASE
|
||||
* Convert text prompt into token IDs (integers) that the model understands
|
||||
*/
|
||||
std::vector<llama_token> prompt_tokens(formatted_prompt.size() + 8);
|
||||
int32_t token_count = llama_tokenize(
|
||||
vocab, formatted_prompt.c_str(),
|
||||
static_cast<int32_t>(formatted_prompt.size()), prompt_tokens.data(),
|
||||
static_cast<int32_t>(prompt_tokens.size()), true, true);
|
||||
|
||||
/**
|
||||
* If buffer too small, negative return indicates required size
|
||||
*/
|
||||
if (token_count < 0) {
|
||||
prompt_tokens.resize(static_cast<std::size_t>(-token_count));
|
||||
token_count = llama_tokenize(
|
||||
vocab, formatted_prompt.c_str(),
|
||||
static_cast<int32_t>(formatted_prompt.size()), prompt_tokens.data(),
|
||||
static_cast<int32_t>(prompt_tokens.size()), true, true);
|
||||
}
|
||||
|
||||
if (token_count < 0)
|
||||
throw std::runtime_error("LlamaGenerator: prompt tokenization failed");
|
||||
|
||||
/**
|
||||
* CONTEXT SIZE VALIDATION
|
||||
* Validate and compute effective token budgets based on context window
|
||||
* constraints
|
||||
*/
|
||||
const int32_t n_ctx = static_cast<int32_t>(llama_n_ctx(context_));
|
||||
const int32_t n_batch = static_cast<int32_t>(llama_n_batch(context_));
|
||||
if (n_ctx <= 1 || n_batch <= 0)
|
||||
throw std::runtime_error("LlamaGenerator: invalid context or batch size");
|
||||
|
||||
/**
|
||||
* Clamp generation limit to available context window, reserve space for
|
||||
* output
|
||||
*/
|
||||
const int32_t effective_max_tokens =
|
||||
std::max(1, std::min(max_tokens, n_ctx - 1));
|
||||
/**
|
||||
* Prompt can use remaining context after reserving space for generation
|
||||
*/
|
||||
int32_t prompt_budget = std::min(n_batch, n_ctx - effective_max_tokens);
|
||||
prompt_budget = std::max<int32_t>(1, prompt_budget);
|
||||
|
||||
/**
|
||||
* Truncate prompt if necessary to fit within constraints
|
||||
*/
|
||||
prompt_tokens.resize(static_cast<std::size_t>(token_count));
|
||||
if (token_count > prompt_budget) {
|
||||
spdlog::warn(
|
||||
"LlamaGenerator: prompt too long ({} tokens), truncating to {} "
|
||||
"tokens to fit n_batch/n_ctx limits",
|
||||
token_count, prompt_budget);
|
||||
prompt_tokens.resize(static_cast<std::size_t>(prompt_budget));
|
||||
token_count = prompt_budget;
|
||||
}
|
||||
|
||||
/**
|
||||
* PROMPT PROCESSING PHASE
|
||||
* Create a batch containing all prompt tokens and feed through the model
|
||||
* This computes internal representations and fills the KV cache
|
||||
*/
|
||||
const llama_batch prompt_batch = llama_batch_get_one(
|
||||
prompt_tokens.data(), static_cast<int32_t>(prompt_tokens.size()));
|
||||
if (llama_decode(context_, prompt_batch) != 0)
|
||||
throw std::runtime_error("LlamaGenerator: prompt decode failed");
|
||||
|
||||
/**
|
||||
* SAMPLER CONFIGURATION PHASE
|
||||
* Set up the probabilistic token selection pipeline (sampler chain)
|
||||
* Samplers are applied in sequence: temperature -> top-p -> distribution
|
||||
*/
|
||||
llama_sampler_chain_params sampler_params =
|
||||
llama_sampler_chain_default_params();
|
||||
using SamplerPtr =
|
||||
std::unique_ptr<llama_sampler, decltype(&llama_sampler_free)>;
|
||||
SamplerPtr sampler(llama_sampler_chain_init(sampler_params),
|
||||
&llama_sampler_free);
|
||||
if (!sampler)
|
||||
throw std::runtime_error("LlamaGenerator: failed to initialize sampler");
|
||||
|
||||
/**
|
||||
* Temperature: scales logits before softmax (controls randomness)
|
||||
*/
|
||||
llama_sampler_chain_add(sampler.get(),
|
||||
llama_sampler_init_temp(sampling_temperature_));
|
||||
/**
|
||||
* Top-P: nucleus sampling - filters to most likely tokens summing to top_p
|
||||
* probability
|
||||
*/
|
||||
llama_sampler_chain_add(sampler.get(),
|
||||
llama_sampler_init_top_p(sampling_top_p_, 1));
|
||||
/**
|
||||
* Distribution sampler: selects actual token using configured seed for
|
||||
* reproducibility
|
||||
*/
|
||||
llama_sampler_chain_add(sampler.get(),
|
||||
llama_sampler_init_dist(sampling_seed_));
|
||||
|
||||
/**
|
||||
* TOKEN GENERATION LOOP
|
||||
* Iteratively generate tokens one at a time until max_tokens or
|
||||
* end-of-sequence
|
||||
*/
|
||||
std::vector<llama_token> generated_tokens;
|
||||
generated_tokens.reserve(static_cast<std::size_t>(effective_max_tokens));
|
||||
|
||||
for (int i = 0; i < effective_max_tokens; ++i) {
|
||||
/**
|
||||
* Sample next token using configured sampler chain and model logits
|
||||
* Index -1 means use the last output position from previous batch
|
||||
*/
|
||||
const llama_token next =
|
||||
llama_sampler_sample(sampler.get(), context_, -1);
|
||||
/**
|
||||
* Stop if model predicts end-of-generation token (EOS/EOT)
|
||||
*/
|
||||
if (llama_vocab_is_eog(vocab, next)) break;
|
||||
generated_tokens.push_back(next);
|
||||
/**
|
||||
* Feed the sampled token back into model for next iteration
|
||||
* (autoregressive)
|
||||
*/
|
||||
llama_token token = next;
|
||||
const llama_batch one_token_batch = llama_batch_get_one(&token, 1);
|
||||
if (llama_decode(context_, one_token_batch) != 0)
|
||||
throw std::runtime_error(
|
||||
"LlamaGenerator: decode failed during generation");
|
||||
}
|
||||
|
||||
/**
|
||||
* DETOKENIZATION PHASE
|
||||
* Convert generated token IDs back to text using vocabulary
|
||||
*/
|
||||
std::string output;
|
||||
for (const llama_token token : generated_tokens)
|
||||
AppendTokenPiecePublic(vocab, token, output);
|
||||
|
||||
/**
|
||||
* Advance seed for next generation to improve output diversity
|
||||
*/
|
||||
sampling_seed_ = (sampling_seed_ == 0xFFFFFFFFu) ? 0 : sampling_seed_ + 1;
|
||||
|
||||
return output;
|
||||
}
|
||||
56
pipeline/src/data_generation/llama/load.cpp
Normal file
@@ -0,0 +1,56 @@
|
||||
/**
|
||||
* Model Loading Module
|
||||
* This module handles loading a pre-trained LLM model from disk and
|
||||
* initializing the llama.cpp context for inference. It performs one-time setup
|
||||
* required before any inference operations can be performed.
|
||||
*/
|
||||
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include <stdexcept>
|
||||
#include <string>
|
||||
|
||||
#include "data_generation/llama_generator.h"
|
||||
#include "llama.h"
|
||||
|
||||
void LlamaGenerator::Load(const std::string& model_path) {
|
||||
/**
|
||||
* Validate input and clean up any previously loaded model/context
|
||||
*/
|
||||
if (model_path.empty())
|
||||
throw std::runtime_error("LlamaGenerator: model path must not be empty");
|
||||
|
||||
if (context_ != nullptr) {
|
||||
llama_free(context_);
|
||||
context_ = nullptr;
|
||||
}
|
||||
if (model_ != nullptr) {
|
||||
llama_model_free(model_);
|
||||
model_ = nullptr;
|
||||
}
|
||||
|
||||
/**
|
||||
* Initialize the llama backend (one-time setup for GPU/CPU acceleration)
|
||||
*/
|
||||
llama_backend_init();
|
||||
|
||||
llama_model_params model_params = llama_model_default_params();
|
||||
model_ = llama_model_load_from_file(model_path.c_str(), model_params);
|
||||
if (model_ == nullptr) {
|
||||
throw std::runtime_error(
|
||||
"LlamaGenerator: failed to load model from path: " + model_path);
|
||||
}
|
||||
|
||||
llama_context_params context_params = llama_context_default_params();
|
||||
context_params.n_ctx = n_ctx_;
|
||||
context_params.n_batch = n_ctx_; // Set batch size equal to context window
|
||||
|
||||
context_ = llama_init_from_model(model_, context_params);
|
||||
if (context_ == nullptr) {
|
||||
llama_model_free(model_);
|
||||
model_ = nullptr;
|
||||
throw std::runtime_error("LlamaGenerator: failed to create context");
|
||||
}
|
||||
|
||||
spdlog::info("[LlamaGenerator] Loaded model: {}", model_path);
|
||||
}
|
||||
74
pipeline/src/data_generation/llama/load_brewery_prompt.cpp
Normal file
@@ -0,0 +1,74 @@
|
||||
#include <fstream>
|
||||
#include <filesystem>
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include "data_generation/llama_generator.h"
|
||||
|
||||
namespace fs = std::filesystem;
|
||||
|
||||
std::string LlamaGenerator::LoadBrewerySystemPrompt(
|
||||
const std::string& prompt_file_path) {
|
||||
// Return cached version if already loaded
|
||||
if (!brewery_system_prompt_.empty()) {
|
||||
return brewery_system_prompt_;
|
||||
}
|
||||
|
||||
// Try multiple path locations
|
||||
std::vector<std::string> paths_to_try = {
|
||||
prompt_file_path, // As provided
|
||||
"../" + prompt_file_path, // One level up
|
||||
"../../" + prompt_file_path, // Two levels up
|
||||
};
|
||||
|
||||
for (const auto& path : paths_to_try) {
|
||||
std::ifstream prompt_file(path);
|
||||
if (prompt_file.is_open()) {
|
||||
std::string prompt((std::istreambuf_iterator<char>(prompt_file)),
|
||||
std::istreambuf_iterator<char>());
|
||||
prompt_file.close();
|
||||
|
||||
if (!prompt.empty()) {
|
||||
spdlog::info(
|
||||
"LlamaGenerator: Loaded brewery system prompt from '{}' ({} chars)",
|
||||
path, prompt.length());
|
||||
brewery_system_prompt_ = prompt;
|
||||
return brewery_system_prompt_;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
spdlog::warn(
|
||||
"LlamaGenerator: Could not open brewery system prompt file at any of the "
|
||||
"expected locations. Using fallback inline prompt.");
|
||||
return GetFallbackBreweryPrompt();
|
||||
}
|
||||
|
||||
// Fallback: minimal inline prompt if file fails to load
|
||||
std::string LlamaGenerator::GetFallbackBreweryPrompt() {
|
||||
return "You are an experienced brewmaster and owner of a local craft brewery. "
|
||||
"Create a distinctive, authentic name and detailed description that "
|
||||
"genuinely reflects your specific location, brewing philosophy, local "
|
||||
"culture, and community connection. The brewery must feel real and "
|
||||
"grounded—not generic or interchangeable.\n\n"
|
||||
"AVOID REPETITIVE PHRASES - Never use:\n"
|
||||
"Love letter to, tribute to, rolling hills, picturesque, every sip "
|
||||
"tells a story, Come for X stay for Y, rich history, passion, woven "
|
||||
"into, ancient roots, timeless, where tradition meets innovation\n\n"
|
||||
"OPENING APPROACHES - Choose ONE:\n"
|
||||
"1. Start with specific beer style and its regional origins\n"
|
||||
"2. Begin with specific brewing challenge (water, altitude, climate)\n"
|
||||
"3. Open with founding story or personal motivation\n"
|
||||
"4. Lead with specific local ingredient or resource\n"
|
||||
"5. Start with unexpected angle or contradiction\n"
|
||||
"6. Open with local event, tradition, or cultural moment\n"
|
||||
"7. Begin with tangible architectural or geographic detail\n\n"
|
||||
"BE SPECIFIC - Include:\n"
|
||||
"- At least ONE concrete proper noun (landmark, river, neighborhood)\n"
|
||||
"- Specific beer styles relevant to the REGION'S culture\n"
|
||||
"- Concrete brewing challenges or advantages\n"
|
||||
"- Sensory details SPECIFIC to place—not generic adjectives\n\n"
|
||||
"LENGTH: 150-250 words. TONE: Can be soulful, irreverent, "
|
||||
"matter-of-fact, unpretentious, or minimalist.\n\n"
|
||||
"Output ONLY a raw JSON object with keys name and description. "
|
||||
"No markdown, backticks, preamble, or trailing text.";
|
||||
}
|
||||
65
pipeline/src/data_generation/llama/set_sampling_options.cpp
Normal file
@@ -0,0 +1,65 @@
|
||||
/**
|
||||
* Sampling Configuration Module
|
||||
* Configures the hyperparameters that control probabilistic token selection
|
||||
* during text generation. These settings affect the randomness, diversity, and
|
||||
* quality of generated output.
|
||||
*/
|
||||
|
||||
#include <stdexcept>
|
||||
|
||||
#include "data_generation/llama_generator.h"
|
||||
#include "llama.h"
|
||||
|
||||
void LlamaGenerator::SetSamplingOptions(float temperature, float top_p,
|
||||
int seed) {
|
||||
/**
|
||||
* Validate temperature: controls randomness in output distribution
|
||||
* 0.0 = deterministic (always pick highest probability token)
|
||||
* Higher values = more random/diverse output
|
||||
*/
|
||||
if (temperature < 0.0f) {
|
||||
throw std::runtime_error(
|
||||
"LlamaGenerator: sampling temperature must be >= 0");
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate top-p (nucleus sampling): only sample from top cumulative
|
||||
* probability e.g., top-p=0.9 means sample from tokens that make up 90% of
|
||||
* probability mass
|
||||
*/
|
||||
if (!(top_p > 0.0f && top_p <= 1.0f)) {
|
||||
throw std::runtime_error(
|
||||
"LlamaGenerator: sampling top-p must be in (0, 1]");
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate seed: for reproducible results (-1 uses random seed)
|
||||
*/
|
||||
if (seed < -1) {
|
||||
throw std::runtime_error(
|
||||
"LlamaGenerator: seed must be >= 0, or -1 for random");
|
||||
}
|
||||
|
||||
/**
|
||||
* Store sampling parameters for use during token generation
|
||||
*/
|
||||
sampling_temperature_ = temperature;
|
||||
sampling_top_p_ = top_p;
|
||||
sampling_seed_ = (seed < 0) ? static_cast<uint32_t>(LLAMA_DEFAULT_SEED)
|
||||
: static_cast<uint32_t>(seed);
|
||||
}
|
||||
|
||||
void LlamaGenerator::SetContextSize(uint32_t n_ctx) {
|
||||
/**
|
||||
* Validate context size: must be positive and reasonable for the model
|
||||
*/
|
||||
if (n_ctx == 0 || n_ctx > 32768) {
|
||||
throw std::runtime_error(
|
||||
"LlamaGenerator: context size must be in range [1, 32768]");
|
||||
}
|
||||
|
||||
/**
|
||||
* Store context size for use during model loading
|
||||
*/
|
||||
n_ctx_ = n_ctx;
|
||||
}
|
||||
65
pipeline/src/data_generation/mock/data.cpp
Normal file
@@ -0,0 +1,65 @@
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
#include "data_generation/mock_generator.h"
|
||||
|
||||
const std::vector<std::string> MockGenerator::kBreweryAdjectives = {
|
||||
"Craft", "Heritage", "Local", "Artisan", "Pioneer", "Golden",
|
||||
"Modern", "Classic", "Summit", "Northern", "Riverstone", "Barrel",
|
||||
"Hinterland", "Harbor", "Wild", "Granite", "Copper", "Maple"};
|
||||
|
||||
const std::vector<std::string> MockGenerator::kBreweryNouns = {
|
||||
"Brewing Co.", "Brewery", "Bier Haus", "Taproom", "Works",
|
||||
"House", "Fermentery", "Ale Co.", "Cellars", "Collective",
|
||||
"Project", "Foundry", "Malthouse", "Public House", "Co-op",
|
||||
"Lab", "Beer Hall", "Guild"};
|
||||
|
||||
const std::vector<std::string> MockGenerator::kBreweryDescriptions = {
|
||||
"Handcrafted pale ales and seasonal IPAs with local ingredients.",
|
||||
"Traditional lagers and experimental sours in small batches.",
|
||||
"Award-winning stouts and wildly hoppy blonde ales.",
|
||||
"Craft brewery specializing in Belgian-style triples and dark porters.",
|
||||
"Modern brewery blending tradition with bold experimental flavors.",
|
||||
"Neighborhood-focused taproom pouring crisp pilsners and citrusy pale "
|
||||
"ales.",
|
||||
"Small-batch brewery known for barrel-aged releases and smoky lagers.",
|
||||
"Independent brewhouse pairing farmhouse ales with rotating food pop-ups.",
|
||||
"Community brewpub making balanced bitters, saisons, and hazy IPAs.",
|
||||
"Experimental nanobrewery exploring local yeast and regional grains.",
|
||||
"Family-run brewery producing smooth amber ales and robust porters.",
|
||||
"Urban brewery crafting clean lagers and bright, fruit-forward sours.",
|
||||
"Riverfront brewhouse featuring oak-matured ales and seasonal blends.",
|
||||
"Modern taproom focused on sessionable lagers and classic pub styles.",
|
||||
"Brewery rooted in tradition with a lineup of malty reds and crisp lagers.",
|
||||
"Creative brewery offering rotating collaborations and limited draft-only "
|
||||
"pours.",
|
||||
"Locally inspired brewery serving approachable ales with bold hop "
|
||||
"character.",
|
||||
"Destination taproom known for balanced IPAs and cocoa-rich stouts."};
|
||||
|
||||
const std::vector<std::string> MockGenerator::kUsernames = {
|
||||
"hopseeker", "malttrail", "yeastwhisper", "lagerlane",
|
||||
"barrelbound", "foamfinder", "taphunter", "graingeist",
|
||||
"brewscout", "aleatlas", "caskcompass", "hopsandmaps",
|
||||
"mashpilot", "pintnomad", "fermentfriend", "stoutsignal",
|
||||
"sessionwander", "kettlekeeper"};
|
||||
|
||||
const std::vector<std::string> MockGenerator::kBios = {
|
||||
"Always chasing balanced IPAs and crisp lagers across local taprooms.",
|
||||
"Weekend brewery explorer with a soft spot for dark, roasty stouts.",
|
||||
"Documenting tiny brewpubs, fresh pours, and unforgettable beer gardens.",
|
||||
"Fan of farmhouse ales, food pairings, and long tasting flights.",
|
||||
"Collecting favorite pilsners one city at a time.",
|
||||
"Hops-first drinker who still saves room for classic malt-forward styles.",
|
||||
"Finding hidden tap lists and sharing the best seasonal releases.",
|
||||
"Brewery road-tripper focused on local ingredients and clean fermentation.",
|
||||
"Always comparing house lagers and ranking patio pint vibes.",
|
||||
"Curious about yeast strains, barrel programs, and cellar experiments.",
|
||||
"Believes every neighborhood deserves a great community taproom.",
|
||||
"Looking for session beers that taste great from first sip to last.",
|
||||
"Belgian ale enthusiast who never skips a new saison.",
|
||||
"Hazy IPA critic with deep respect for a perfectly clear pilsner.",
|
||||
"Visits breweries for the stories, stays for the flagship pours.",
|
||||
"Craft beer fan mapping tasting notes and favorite brew routes.",
|
||||
"Always ready to trade recommendations for underrated local breweries.",
|
||||
"Keeping a running list of must-try collab releases and tap takeovers."};
|
||||
12
pipeline/src/data_generation/mock/deterministic_hash.cpp
Normal file
@@ -0,0 +1,12 @@
|
||||
#include <string>
|
||||
|
||||
#include "data_generation/mock_generator.h"
|
||||
|
||||
std::size_t MockGenerator::DeterministicHash(const std::string& a,
|
||||
const std::string& b) {
|
||||
std::size_t seed = std::hash<std::string>{}(a);
|
||||
const std::size_t mixed = std::hash<std::string>{}(b);
|
||||
seed ^= mixed + 0x9e3779b97f4a7c15ULL + (seed << 6) + (seed >> 2);
|
||||
seed = (seed << 13) | (seed >> ((sizeof(std::size_t) * 8) - 13));
|
||||
return seed;
|
||||
}
|
||||
21
pipeline/src/data_generation/mock/generate_brewery.cpp
Normal file
@@ -0,0 +1,21 @@
|
||||
#include <functional>
|
||||
#include <string>
|
||||
|
||||
#include "data_generation/mock_generator.h"
|
||||
|
||||
BreweryResult MockGenerator::GenerateBrewery(
|
||||
const std::string& city_name, const std::string& country_name,
|
||||
const std::string& region_context) {
|
||||
const std::string location_key =
|
||||
country_name.empty() ? city_name : city_name + "," + country_name;
|
||||
const std::size_t hash =
|
||||
region_context.empty() ? std::hash<std::string>{}(location_key)
|
||||
: DeterministicHash(location_key, region_context);
|
||||
|
||||
BreweryResult result;
|
||||
result.name = kBreweryAdjectives[hash % kBreweryAdjectives.size()] + " " +
|
||||
kBreweryNouns[(hash / 7) % kBreweryNouns.size()];
|
||||
result.description =
|
||||
kBreweryDescriptions[(hash / 13) % kBreweryDescriptions.size()];
|
||||
return result;
|
||||
}
|
||||
13
pipeline/src/data_generation/mock/generate_user.cpp
Normal file
@@ -0,0 +1,13 @@
|
||||
#include <functional>
|
||||
#include <string>
|
||||
|
||||
#include "data_generation/mock_generator.h"
|
||||
|
||||
UserResult MockGenerator::GenerateUser(const std::string& locale) {
|
||||
const std::size_t hash = std::hash<std::string>{}(locale);
|
||||
|
||||
UserResult result;
|
||||
result.username = kUsernames[hash % kUsernames.size()];
|
||||
result.bio = kBios[(hash / 11) % kBios.size()];
|
||||
return result;
|
||||
}
|
||||
9
pipeline/src/data_generation/mock/load.cpp
Normal file
@@ -0,0 +1,9 @@
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include <string>
|
||||
|
||||
#include "data_generation/mock_generator.h"
|
||||
|
||||
void MockGenerator::Load(const std::string& /*modelPath*/) {
|
||||
spdlog::info("[MockGenerator] No model needed");
|
||||
}
|
||||
264
pipeline/src/database/database.cpp
Normal file
@@ -0,0 +1,264 @@
|
||||
#include "database/database.h"
|
||||
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include <stdexcept>
|
||||
|
||||
void SqliteDatabase::InitializeSchema() {
|
||||
std::lock_guard<std::mutex> lock(db_mutex_);
|
||||
|
||||
const char* schema = R"(
|
||||
CREATE TABLE IF NOT EXISTS countries (
|
||||
id INTEGER PRIMARY KEY,
|
||||
name TEXT NOT NULL,
|
||||
iso2 TEXT,
|
||||
iso3 TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS states (
|
||||
id INTEGER PRIMARY KEY,
|
||||
country_id INTEGER NOT NULL,
|
||||
name TEXT NOT NULL,
|
||||
iso2 TEXT,
|
||||
FOREIGN KEY(country_id) REFERENCES countries(id)
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS cities (
|
||||
id INTEGER PRIMARY KEY,
|
||||
state_id INTEGER NOT NULL,
|
||||
country_id INTEGER NOT NULL,
|
||||
name TEXT NOT NULL,
|
||||
latitude REAL,
|
||||
longitude REAL,
|
||||
FOREIGN KEY(state_id) REFERENCES states(id),
|
||||
FOREIGN KEY(country_id) REFERENCES countries(id)
|
||||
);
|
||||
)";
|
||||
|
||||
char* errMsg = nullptr;
|
||||
int rc = sqlite3_exec(db_, schema, nullptr, nullptr, &errMsg);
|
||||
if (rc != SQLITE_OK) {
|
||||
std::string error = errMsg ? std::string(errMsg) : "Unknown error";
|
||||
sqlite3_free(errMsg);
|
||||
throw std::runtime_error("Failed to create schema: " + error);
|
||||
}
|
||||
}
|
||||
|
||||
SqliteDatabase::~SqliteDatabase() {
|
||||
if (db_) {
|
||||
sqlite3_close(db_);
|
||||
}
|
||||
}
|
||||
|
||||
void SqliteDatabase::Initialize(const std::string& db_path) {
|
||||
int rc = sqlite3_open(db_path.c_str(), &db_);
|
||||
if (rc) {
|
||||
throw std::runtime_error("Failed to open SQLite database: " + db_path);
|
||||
}
|
||||
spdlog::info("OK: SQLite database opened: {}", db_path);
|
||||
InitializeSchema();
|
||||
}
|
||||
|
||||
void SqliteDatabase::BeginTransaction() {
|
||||
std::lock_guard<std::mutex> lock(db_mutex_);
|
||||
char* err = nullptr;
|
||||
if (sqlite3_exec(db_, "BEGIN TRANSACTION", nullptr, nullptr, &err) !=
|
||||
SQLITE_OK) {
|
||||
std::string msg = err ? err : "unknown";
|
||||
sqlite3_free(err);
|
||||
throw std::runtime_error("BeginTransaction failed: " + msg);
|
||||
}
|
||||
}
|
||||
|
||||
void SqliteDatabase::CommitTransaction() {
|
||||
std::lock_guard<std::mutex> lock(db_mutex_);
|
||||
char* err = nullptr;
|
||||
if (sqlite3_exec(db_, "COMMIT", nullptr, nullptr, &err) != SQLITE_OK) {
|
||||
std::string msg = err ? err : "unknown";
|
||||
sqlite3_free(err);
|
||||
throw std::runtime_error("CommitTransaction failed: " + msg);
|
||||
}
|
||||
}
|
||||
|
||||
void SqliteDatabase::RollbackTransaction() {
|
||||
std::lock_guard<std::mutex> lock(db_mutex_);
|
||||
char* err = nullptr;
|
||||
if (sqlite3_exec(db_, "ROLLBACK", nullptr, nullptr, &err) != SQLITE_OK) {
|
||||
std::string msg = err ? err : "unknown";
|
||||
sqlite3_free(err);
|
||||
throw std::runtime_error("RollbackTransaction failed: " + msg);
|
||||
}
|
||||
}
|
||||
|
||||
void SqliteDatabase::InsertCountry(int id, const std::string& name,
|
||||
const std::string& iso2,
|
||||
const std::string& iso3) {
|
||||
std::lock_guard<std::mutex> lock(db_mutex_);
|
||||
|
||||
const char* query = R"(
|
||||
INSERT OR IGNORE INTO countries (id, name, iso2, iso3)
|
||||
VALUES (?, ?, ?, ?)
|
||||
)";
|
||||
|
||||
sqlite3_stmt* stmt;
|
||||
int rc = sqlite3_prepare_v2(db_, query, -1, &stmt, nullptr);
|
||||
if (rc != SQLITE_OK)
|
||||
throw std::runtime_error("Failed to prepare country insert");
|
||||
|
||||
sqlite3_bind_int(stmt, 1, id);
|
||||
sqlite3_bind_text(stmt, 2, name.c_str(), -1, SQLITE_TRANSIENT);
|
||||
sqlite3_bind_text(stmt, 3, iso2.c_str(), -1, SQLITE_TRANSIENT);
|
||||
sqlite3_bind_text(stmt, 4, iso3.c_str(), -1, SQLITE_TRANSIENT);
|
||||
|
||||
if (sqlite3_step(stmt) != SQLITE_DONE) {
|
||||
throw std::runtime_error("Failed to insert country");
|
||||
}
|
||||
sqlite3_finalize(stmt);
|
||||
}
|
||||
|
||||
void SqliteDatabase::InsertState(int id, int country_id,
|
||||
const std::string& name,
|
||||
const std::string& iso2) {
|
||||
std::lock_guard<std::mutex> lock(db_mutex_);
|
||||
|
||||
const char* query = R"(
|
||||
INSERT OR IGNORE INTO states (id, country_id, name, iso2)
|
||||
VALUES (?, ?, ?, ?)
|
||||
)";
|
||||
|
||||
sqlite3_stmt* stmt;
|
||||
int rc = sqlite3_prepare_v2(db_, query, -1, &stmt, nullptr);
|
||||
if (rc != SQLITE_OK)
|
||||
throw std::runtime_error("Failed to prepare state insert");
|
||||
|
||||
sqlite3_bind_int(stmt, 1, id);
|
||||
sqlite3_bind_int(stmt, 2, country_id);
|
||||
sqlite3_bind_text(stmt, 3, name.c_str(), -1, SQLITE_TRANSIENT);
|
||||
sqlite3_bind_text(stmt, 4, iso2.c_str(), -1, SQLITE_TRANSIENT);
|
||||
|
||||
if (sqlite3_step(stmt) != SQLITE_DONE) {
|
||||
throw std::runtime_error("Failed to insert state");
|
||||
}
|
||||
sqlite3_finalize(stmt);
|
||||
}
|
||||
|
||||
void SqliteDatabase::InsertCity(int id, int state_id, int country_id,
|
||||
const std::string& name, double latitude,
|
||||
double longitude) {
|
||||
std::lock_guard<std::mutex> lock(db_mutex_);
|
||||
|
||||
const char* query = R"(
|
||||
INSERT OR IGNORE INTO cities (id, state_id, country_id, name, latitude, longitude)
|
||||
VALUES (?, ?, ?, ?, ?, ?)
|
||||
)";
|
||||
|
||||
sqlite3_stmt* stmt;
|
||||
int rc = sqlite3_prepare_v2(db_, query, -1, &stmt, nullptr);
|
||||
if (rc != SQLITE_OK)
|
||||
throw std::runtime_error("Failed to prepare city insert");
|
||||
|
||||
sqlite3_bind_int(stmt, 1, id);
|
||||
sqlite3_bind_int(stmt, 2, state_id);
|
||||
sqlite3_bind_int(stmt, 3, country_id);
|
||||
sqlite3_bind_text(stmt, 4, name.c_str(), -1, SQLITE_TRANSIENT);
|
||||
sqlite3_bind_double(stmt, 5, latitude);
|
||||
sqlite3_bind_double(stmt, 6, longitude);
|
||||
|
||||
if (sqlite3_step(stmt) != SQLITE_DONE) {
|
||||
throw std::runtime_error("Failed to insert city");
|
||||
}
|
||||
sqlite3_finalize(stmt);
|
||||
}
|
||||
|
||||
std::vector<City> SqliteDatabase::QueryCities() {
|
||||
std::lock_guard<std::mutex> lock(db_mutex_);
|
||||
std::vector<City> cities;
|
||||
sqlite3_stmt* stmt = nullptr;
|
||||
|
||||
const char* query =
|
||||
"SELECT id, name, country_id FROM cities ORDER BY RANDOM()";
|
||||
int rc = sqlite3_prepare_v2(db_, query, -1, &stmt, nullptr);
|
||||
|
||||
if (rc != SQLITE_OK) {
|
||||
throw std::runtime_error("Failed to prepare query");
|
||||
}
|
||||
|
||||
while (sqlite3_step(stmt) == SQLITE_ROW) {
|
||||
int id = sqlite3_column_int(stmt, 0);
|
||||
const char* name =
|
||||
reinterpret_cast<const char*>(sqlite3_column_text(stmt, 1));
|
||||
int country_id = sqlite3_column_int(stmt, 2);
|
||||
cities.push_back({id, name ? std::string(name) : "", country_id});
|
||||
}
|
||||
|
||||
sqlite3_finalize(stmt);
|
||||
return cities;
|
||||
}
|
||||
|
||||
std::vector<Country> SqliteDatabase::QueryCountries(int limit) {
|
||||
std::lock_guard<std::mutex> lock(db_mutex_);
|
||||
|
||||
std::vector<Country> countries;
|
||||
sqlite3_stmt* stmt = nullptr;
|
||||
|
||||
std::string query =
|
||||
"SELECT id, name, iso2, iso3 FROM countries ORDER BY name";
|
||||
if (limit > 0) {
|
||||
query += " LIMIT " + std::to_string(limit);
|
||||
}
|
||||
|
||||
int rc = sqlite3_prepare_v2(db_, query.c_str(), -1, &stmt, nullptr);
|
||||
|
||||
if (rc != SQLITE_OK) {
|
||||
throw std::runtime_error("Failed to prepare countries query");
|
||||
}
|
||||
|
||||
while (sqlite3_step(stmt) == SQLITE_ROW) {
|
||||
int id = sqlite3_column_int(stmt, 0);
|
||||
const char* name =
|
||||
reinterpret_cast<const char*>(sqlite3_column_text(stmt, 1));
|
||||
const char* iso2 =
|
||||
reinterpret_cast<const char*>(sqlite3_column_text(stmt, 2));
|
||||
const char* iso3 =
|
||||
reinterpret_cast<const char*>(sqlite3_column_text(stmt, 3));
|
||||
countries.push_back({id, name ? std::string(name) : "",
|
||||
iso2 ? std::string(iso2) : "",
|
||||
iso3 ? std::string(iso3) : ""});
|
||||
}
|
||||
|
||||
sqlite3_finalize(stmt);
|
||||
return countries;
|
||||
}
|
||||
|
||||
std::vector<State> SqliteDatabase::QueryStates(int limit) {
|
||||
std::lock_guard<std::mutex> lock(db_mutex_);
|
||||
|
||||
std::vector<State> states;
|
||||
sqlite3_stmt* stmt = nullptr;
|
||||
|
||||
std::string query =
|
||||
"SELECT id, name, iso2, country_id FROM states ORDER BY name";
|
||||
if (limit > 0) {
|
||||
query += " LIMIT " + std::to_string(limit);
|
||||
}
|
||||
|
||||
int rc = sqlite3_prepare_v2(db_, query.c_str(), -1, &stmt, nullptr);
|
||||
|
||||
if (rc != SQLITE_OK) {
|
||||
throw std::runtime_error("Failed to prepare states query");
|
||||
}
|
||||
|
||||
while (sqlite3_step(stmt) == SQLITE_ROW) {
|
||||
int id = sqlite3_column_int(stmt, 0);
|
||||
const char* name =
|
||||
reinterpret_cast<const char*>(sqlite3_column_text(stmt, 1));
|
||||
const char* iso2 =
|
||||
reinterpret_cast<const char*>(sqlite3_column_text(stmt, 2));
|
||||
int country_id = sqlite3_column_int(stmt, 3);
|
||||
states.push_back({id, name ? std::string(name) : "",
|
||||
iso2 ? std::string(iso2) : "", country_id});
|
||||
}
|
||||
|
||||
sqlite3_finalize(stmt);
|
||||
return states;
|
||||
}
|
||||
67
pipeline/src/json_handling/json_loader.cpp
Normal file
@@ -0,0 +1,67 @@
|
||||
#include "json_handling/json_loader.h"
|
||||
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include <chrono>
|
||||
|
||||
#include "json_handling/stream_parser.h"
|
||||
|
||||
void JsonLoader::LoadWorldCities(const std::string& json_path,
|
||||
SqliteDatabase& db) {
|
||||
constexpr size_t kBatchSize = 10000;
|
||||
|
||||
auto startTime = std::chrono::high_resolution_clock::now();
|
||||
spdlog::info("\nLoading {} (streaming Boost.JSON SAX)...", json_path);
|
||||
|
||||
db.BeginTransaction();
|
||||
bool transactionOpen = true;
|
||||
|
||||
size_t citiesProcessed = 0;
|
||||
try {
|
||||
StreamingJsonParser::Parse(
|
||||
json_path, db,
|
||||
[&](const CityRecord& record) {
|
||||
db.InsertCity(record.id, record.state_id, record.country_id,
|
||||
record.name, record.latitude, record.longitude);
|
||||
++citiesProcessed;
|
||||
|
||||
if (citiesProcessed % kBatchSize == 0) {
|
||||
db.CommitTransaction();
|
||||
db.BeginTransaction();
|
||||
}
|
||||
},
|
||||
[&](size_t current, size_t /*total*/) {
|
||||
if (current % kBatchSize == 0 && current > 0) {
|
||||
spdlog::info(" [Progress] Parsed {} cities...", current);
|
||||
}
|
||||
});
|
||||
|
||||
spdlog::info(" OK: Parsed all cities from JSON");
|
||||
|
||||
if (transactionOpen) {
|
||||
db.CommitTransaction();
|
||||
transactionOpen = false;
|
||||
}
|
||||
} catch (...) {
|
||||
if (transactionOpen) {
|
||||
db.RollbackTransaction();
|
||||
transactionOpen = false;
|
||||
}
|
||||
throw;
|
||||
}
|
||||
|
||||
auto endTime = std::chrono::high_resolution_clock::now();
|
||||
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(
|
||||
endTime - startTime);
|
||||
|
||||
spdlog::info("\n=== World City Data Loading Summary ===\n");
|
||||
spdlog::info("Cities inserted: {}", citiesProcessed);
|
||||
spdlog::info("Elapsed time: {} ms", duration.count());
|
||||
long long throughput =
|
||||
(citiesProcessed > 0 && duration.count() > 0)
|
||||
? (1000LL * static_cast<long long>(citiesProcessed)) /
|
||||
static_cast<long long>(duration.count())
|
||||
: 0LL;
|
||||
spdlog::info("Throughput: {} cities/sec", throughput);
|
||||
spdlog::info("=======================================\n");
|
||||
}
|
||||
289
pipeline/src/json_handling/stream_parser.cpp
Normal file
@@ -0,0 +1,289 @@
|
||||
#include "json_handling/stream_parser.h"
|
||||
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include <boost/json.hpp>
|
||||
#include <boost/json/basic_parser_impl.hpp>
|
||||
#include <cstdio>
|
||||
#include <stdexcept>
|
||||
|
||||
#include "database/database.h"
|
||||
|
||||
class CityRecordHandler {
|
||||
friend class boost::json::basic_parser<CityRecordHandler>;
|
||||
|
||||
public:
|
||||
static constexpr std::size_t max_array_size = static_cast<std::size_t>(-1);
|
||||
static constexpr std::size_t max_object_size = static_cast<std::size_t>(-1);
|
||||
static constexpr std::size_t max_string_size = static_cast<std::size_t>(-1);
|
||||
static constexpr std::size_t max_key_size = static_cast<std::size_t>(-1);
|
||||
|
||||
struct ParseContext {
|
||||
SqliteDatabase* db = nullptr;
|
||||
std::function<void(const CityRecord&)> on_city;
|
||||
std::function<void(size_t, size_t)> on_progress;
|
||||
size_t cities_emitted = 0;
|
||||
size_t total_file_size = 0;
|
||||
int countries_inserted = 0;
|
||||
int states_inserted = 0;
|
||||
};
|
||||
|
||||
explicit CityRecordHandler(ParseContext& ctx) : context(ctx) {}
|
||||
|
||||
private:
|
||||
ParseContext& context;
|
||||
|
||||
int depth = 0;
|
||||
bool in_countries_array = false;
|
||||
bool in_country_object = false;
|
||||
bool in_states_array = false;
|
||||
bool in_state_object = false;
|
||||
bool in_cities_array = false;
|
||||
bool building_city = false;
|
||||
|
||||
int current_country_id = 0;
|
||||
int current_state_id = 0;
|
||||
CityRecord current_city = {};
|
||||
std::string current_key;
|
||||
std::string current_key_val;
|
||||
std::string current_string_val;
|
||||
|
||||
std::string country_info[3];
|
||||
std::string state_info[2];
|
||||
|
||||
// Boost.JSON SAX Hooks
|
||||
bool on_document_begin(boost::system::error_code&) { return true; }
|
||||
bool on_document_end(boost::system::error_code&) { return true; }
|
||||
|
||||
bool on_array_begin(boost::system::error_code&) {
|
||||
depth++;
|
||||
if (depth == 1) {
|
||||
in_countries_array = true;
|
||||
} else if (depth == 3 && current_key == "states") {
|
||||
in_states_array = true;
|
||||
} else if (depth == 5 && current_key == "cities") {
|
||||
in_cities_array = true;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
bool on_array_end(std::size_t, boost::system::error_code&) {
|
||||
if (depth == 1) {
|
||||
in_countries_array = false;
|
||||
} else if (depth == 3) {
|
||||
in_states_array = false;
|
||||
} else if (depth == 5) {
|
||||
in_cities_array = false;
|
||||
}
|
||||
depth--;
|
||||
return true;
|
||||
}
|
||||
|
||||
bool on_object_begin(boost::system::error_code&) {
|
||||
depth++;
|
||||
if (depth == 2 && in_countries_array) {
|
||||
in_country_object = true;
|
||||
current_country_id = 0;
|
||||
country_info[0].clear();
|
||||
country_info[1].clear();
|
||||
country_info[2].clear();
|
||||
} else if (depth == 4 && in_states_array) {
|
||||
in_state_object = true;
|
||||
current_state_id = 0;
|
||||
state_info[0].clear();
|
||||
state_info[1].clear();
|
||||
} else if (depth == 6 && in_cities_array) {
|
||||
building_city = true;
|
||||
current_city = {};
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
bool on_object_end(std::size_t, boost::system::error_code&) {
|
||||
if (depth == 6 && building_city) {
|
||||
if (current_city.id > 0 && current_state_id > 0 &&
|
||||
current_country_id > 0) {
|
||||
current_city.state_id = current_state_id;
|
||||
current_city.country_id = current_country_id;
|
||||
|
||||
try {
|
||||
context.on_city(current_city);
|
||||
context.cities_emitted++;
|
||||
|
||||
if (context.on_progress && context.cities_emitted % 10000 == 0) {
|
||||
context.on_progress(context.cities_emitted,
|
||||
context.total_file_size);
|
||||
}
|
||||
} catch (const std::exception& e) {
|
||||
spdlog::warn("Record parsing failed: {}", e.what());
|
||||
}
|
||||
}
|
||||
building_city = false;
|
||||
} else if (depth == 4 && in_state_object) {
|
||||
if (current_state_id > 0 && current_country_id > 0) {
|
||||
try {
|
||||
context.db->InsertState(current_state_id, current_country_id,
|
||||
state_info[0], state_info[1]);
|
||||
context.states_inserted++;
|
||||
} catch (const std::exception& e) {
|
||||
spdlog::warn("Record parsing failed: {}", e.what());
|
||||
}
|
||||
}
|
||||
in_state_object = false;
|
||||
} else if (depth == 2 && in_country_object) {
|
||||
if (current_country_id > 0) {
|
||||
try {
|
||||
context.db->InsertCountry(current_country_id, country_info[0],
|
||||
country_info[1], country_info[2]);
|
||||
context.countries_inserted++;
|
||||
} catch (const std::exception& e) {
|
||||
spdlog::warn("Record parsing failed: {}", e.what());
|
||||
}
|
||||
}
|
||||
in_country_object = false;
|
||||
}
|
||||
|
||||
depth--;
|
||||
return true;
|
||||
}
|
||||
|
||||
bool on_key_part(boost::json::string_view s, std::size_t,
|
||||
boost::system::error_code&) {
|
||||
current_key_val.append(s.data(), s.size());
|
||||
return true;
|
||||
}
|
||||
|
||||
bool on_key(boost::json::string_view s, std::size_t,
|
||||
boost::system::error_code&) {
|
||||
current_key_val.append(s.data(), s.size());
|
||||
current_key = current_key_val;
|
||||
current_key_val.clear();
|
||||
return true;
|
||||
}
|
||||
|
||||
bool on_string_part(boost::json::string_view s, std::size_t,
|
||||
boost::system::error_code&) {
|
||||
current_string_val.append(s.data(), s.size());
|
||||
return true;
|
||||
}
|
||||
|
||||
bool on_string(boost::json::string_view s, std::size_t,
|
||||
boost::system::error_code&) {
|
||||
current_string_val.append(s.data(), s.size());
|
||||
|
||||
if (building_city && current_key == "name") {
|
||||
current_city.name = current_string_val;
|
||||
} else if (in_state_object && current_key == "name") {
|
||||
state_info[0] = current_string_val;
|
||||
} else if (in_state_object && current_key == "iso2") {
|
||||
state_info[1] = current_string_val;
|
||||
} else if (in_country_object && current_key == "name") {
|
||||
country_info[0] = current_string_val;
|
||||
} else if (in_country_object && current_key == "iso2") {
|
||||
country_info[1] = current_string_val;
|
||||
} else if (in_country_object && current_key == "iso3") {
|
||||
country_info[2] = current_string_val;
|
||||
}
|
||||
|
||||
current_string_val.clear();
|
||||
return true;
|
||||
}
|
||||
|
||||
bool on_number_part(boost::json::string_view, boost::system::error_code&) {
|
||||
return true;
|
||||
}
|
||||
|
||||
bool on_int64(int64_t i, boost::json::string_view,
|
||||
boost::system::error_code&) {
|
||||
if (building_city && current_key == "id") {
|
||||
current_city.id = static_cast<int>(i);
|
||||
} else if (in_state_object && current_key == "id") {
|
||||
current_state_id = static_cast<int>(i);
|
||||
} else if (in_country_object && current_key == "id") {
|
||||
current_country_id = static_cast<int>(i);
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
bool on_uint64(uint64_t u, boost::json::string_view,
|
||||
boost::system::error_code& ec) {
|
||||
return on_int64(static_cast<int64_t>(u), "", ec);
|
||||
}
|
||||
|
||||
bool on_double(double d, boost::json::string_view,
|
||||
boost::system::error_code&) {
|
||||
if (building_city) {
|
||||
if (current_key == "latitude") {
|
||||
current_city.latitude = d;
|
||||
} else if (current_key == "longitude") {
|
||||
current_city.longitude = d;
|
||||
}
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
bool on_bool(bool, boost::system::error_code&) { return true; }
|
||||
bool on_null(boost::system::error_code&) { return true; }
|
||||
bool on_comment_part(boost::json::string_view, boost::system::error_code&) {
|
||||
return true;
|
||||
}
|
||||
bool on_comment(boost::json::string_view, boost::system::error_code&) {
|
||||
return true;
|
||||
}
|
||||
};
|
||||
|
||||
void StreamingJsonParser::Parse(
|
||||
const std::string& file_path, SqliteDatabase& db,
|
||||
std::function<void(const CityRecord&)> on_city,
|
||||
std::function<void(size_t, size_t)> on_progress) {
|
||||
spdlog::info(" Streaming parse of {} (Boost.JSON)...", file_path);
|
||||
|
||||
FILE* file = std::fopen(file_path.c_str(), "rb");
|
||||
if (!file) {
|
||||
throw std::runtime_error("Failed to open JSON file: " + file_path);
|
||||
}
|
||||
|
||||
size_t total_size = 0;
|
||||
if (std::fseek(file, 0, SEEK_END) == 0) {
|
||||
long file_size = std::ftell(file);
|
||||
if (file_size > 0) {
|
||||
total_size = static_cast<size_t>(file_size);
|
||||
}
|
||||
std::rewind(file);
|
||||
}
|
||||
|
||||
CityRecordHandler::ParseContext ctx{&db, on_city, on_progress, 0, total_size,
|
||||
0, 0};
|
||||
boost::json::basic_parser<CityRecordHandler> parser(
|
||||
boost::json::parse_options{}, ctx);
|
||||
|
||||
char buf[65536];
|
||||
size_t bytes_read;
|
||||
boost::system::error_code ec;
|
||||
|
||||
while ((bytes_read = std::fread(buf, 1, sizeof(buf), file)) > 0) {
|
||||
char const* p = buf;
|
||||
std::size_t remain = bytes_read;
|
||||
|
||||
while (remain > 0) {
|
||||
std::size_t consumed = parser.write_some(true, p, remain, ec);
|
||||
if (ec) {
|
||||
std::fclose(file);
|
||||
throw std::runtime_error("JSON parse error: " + ec.message());
|
||||
}
|
||||
p += consumed;
|
||||
remain -= consumed;
|
||||
}
|
||||
}
|
||||
|
||||
parser.write_some(false, nullptr, 0, ec); // Signal EOF
|
||||
std::fclose(file);
|
||||
|
||||
if (ec) {
|
||||
throw std::runtime_error("JSON parse error at EOF: " + ec.message());
|
||||
}
|
||||
|
||||
spdlog::info(" OK: Parsed {} countries, {} states, {} cities",
|
||||
ctx.countries_inserted, ctx.states_inserted,
|
||||
ctx.cities_emitted);
|
||||
}
|
||||
134
pipeline/src/main.cpp
Normal file
@@ -0,0 +1,134 @@
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include <boost/program_options.hpp>
|
||||
#include <iostream>
|
||||
#include <memory>
|
||||
|
||||
#include "biergarten_data_generator.h"
|
||||
#include "database/database.h"
|
||||
#include "web_client/curl_web_client.h"
|
||||
|
||||
namespace po = boost::program_options;
|
||||
|
||||
/**
|
||||
* @brief Parse command-line arguments into ApplicationOptions.
|
||||
*
|
||||
* @param argc Command-line argument count.
|
||||
* @param argv Command-line arguments.
|
||||
* @param options Output ApplicationOptions struct.
|
||||
* @return true if parsing succeeded and should proceed, false otherwise.
|
||||
*/
|
||||
bool ParseArguments(int argc, char** argv, ApplicationOptions& options) {
|
||||
// If no arguments provided, display usage and exit
|
||||
if (argc == 1) {
|
||||
std::cout << "Biergarten Pipeline - Geographic Data Pipeline with "
|
||||
"Brewery Generation\n\n";
|
||||
std::cout << "Usage: biergarten-pipeline [options]\n\n";
|
||||
std::cout << "Options:\n";
|
||||
std::cout << " --mocked Use mocked generator for "
|
||||
"brewery/user data\n";
|
||||
std::cout << " --model, -m PATH Path to LLM model file (gguf) for "
|
||||
"generation\n";
|
||||
std::cout << " --cache-dir, -c DIR Directory for cached JSON (default: "
|
||||
"/tmp)\n";
|
||||
std::cout << " --temperature TEMP LLM sampling temperature 0.0-1.0 "
|
||||
"(default: 0.8)\n";
|
||||
std::cout << " --top-p VALUE Nucleus sampling parameter 0.0-1.0 "
|
||||
"(default: 0.92)\n";
|
||||
std::cout << " --n-ctx SIZE Context window size in tokens "
|
||||
"(default: 4096)\n";
|
||||
std::cout << " --seed SEED Random seed: -1 for random "
|
||||
"(default: -1)\n";
|
||||
std::cout << " --help, -h Show this help message\n\n";
|
||||
std::cout << "Note: --mocked and --model are mutually exclusive. Exactly "
|
||||
"one must be provided.\n";
|
||||
std::cout << "Data source is always pinned to commit c5eb7772 (stable "
|
||||
"2026-03-28).\n";
|
||||
return false;
|
||||
}
|
||||
|
||||
po::options_description desc("Pipeline Options");
|
||||
desc.add_options()("help,h", "Produce help message")(
|
||||
"mocked", po::bool_switch(),
|
||||
"Use mocked generator for brewery/user data")(
|
||||
"model,m", po::value<std::string>()->default_value(""),
|
||||
"Path to LLM model (gguf)")(
|
||||
"cache-dir,c", po::value<std::string>()->default_value("/tmp"),
|
||||
"Directory for cached JSON")(
|
||||
"temperature", po::value<float>()->default_value(0.8f),
|
||||
"Sampling temperature (higher = more random)")(
|
||||
"top-p", po::value<float>()->default_value(0.92f),
|
||||
"Nucleus sampling top-p in (0,1] (higher = more random)")(
|
||||
"n-ctx", po::value<uint32_t>()->default_value(8192),
|
||||
"Context window size in tokens (1-32768)")(
|
||||
"seed", po::value<int>()->default_value(-1),
|
||||
"Sampler seed: -1 for random, otherwise non-negative integer");
|
||||
|
||||
po::variables_map vm;
|
||||
po::store(po::parse_command_line(argc, argv, desc), vm);
|
||||
po::notify(vm);
|
||||
|
||||
if (vm.count("help")) {
|
||||
std::cout << desc << "\n";
|
||||
return false;
|
||||
}
|
||||
|
||||
// Check for mutually exclusive --mocked and --model flags
|
||||
bool use_mocked = vm["mocked"].as<bool>();
|
||||
std::string model_path = vm["model"].as<std::string>();
|
||||
|
||||
if (use_mocked && !model_path.empty()) {
|
||||
spdlog::error("ERROR: --mocked and --model are mutually exclusive");
|
||||
return false;
|
||||
}
|
||||
|
||||
if (!use_mocked && model_path.empty()) {
|
||||
spdlog::error("ERROR: Either --mocked or --model must be specified");
|
||||
return false;
|
||||
}
|
||||
|
||||
// Warn if sampling parameters are provided with --mocked
|
||||
if (use_mocked) {
|
||||
bool hasTemperature = vm["temperature"].defaulted() == false;
|
||||
bool hasTopP = vm["top-p"].defaulted() == false;
|
||||
bool hasSeed = vm["seed"].defaulted() == false;
|
||||
|
||||
if (hasTemperature || hasTopP || hasSeed) {
|
||||
spdlog::warn(
|
||||
"WARNING: Sampling parameters (--temperature, --top-p, --seed) "
|
||||
"are ignored when using --mocked");
|
||||
}
|
||||
}
|
||||
|
||||
options.use_mocked = use_mocked;
|
||||
options.model_path = model_path;
|
||||
options.cache_dir = vm["cache-dir"].as<std::string>();
|
||||
options.temperature = vm["temperature"].as<float>();
|
||||
options.top_p = vm["top-p"].as<float>();
|
||||
options.n_ctx = vm["n-ctx"].as<uint32_t>();
|
||||
options.seed = vm["seed"].as<int>();
|
||||
// commit is always pinned to c5eb7772
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
int main(int argc, char* argv[]) {
|
||||
try {
|
||||
const CurlGlobalState curl_state;
|
||||
|
||||
ApplicationOptions options;
|
||||
if (!ParseArguments(argc, argv, options)) {
|
||||
return 0;
|
||||
}
|
||||
|
||||
auto webClient = std::make_shared<CURLWebClient>();
|
||||
SqliteDatabase database;
|
||||
|
||||
BiergartenDataGenerator generator(options, webClient, database);
|
||||
return generator.Run();
|
||||
|
||||
} catch (const std::exception& e) {
|
||||
spdlog::error("ERROR: Application failed: {}", e.what());
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
141
pipeline/src/web_client/curl_web_client.cpp
Normal file
@@ -0,0 +1,141 @@
|
||||
#include "web_client/curl_web_client.h"
|
||||
|
||||
#include <curl/curl.h>
|
||||
|
||||
#include <cstdio>
|
||||
#include <fstream>
|
||||
#include <memory>
|
||||
#include <sstream>
|
||||
#include <stdexcept>
|
||||
|
||||
CurlGlobalState::CurlGlobalState() {
|
||||
if (curl_global_init(CURL_GLOBAL_DEFAULT) != CURLE_OK) {
|
||||
throw std::runtime_error(
|
||||
"[CURLWebClient] Failed to initialize libcurl globally");
|
||||
}
|
||||
}
|
||||
|
||||
CurlGlobalState::~CurlGlobalState() { curl_global_cleanup(); }
|
||||
|
||||
namespace {
|
||||
// curl write callback that appends response data into a std::string
|
||||
size_t WriteCallbackString(void* contents, size_t size, size_t nmemb,
|
||||
void* userp) {
|
||||
size_t realsize = size * nmemb;
|
||||
auto* s = static_cast<std::string*>(userp);
|
||||
s->append(static_cast<char*>(contents), realsize);
|
||||
return realsize;
|
||||
}
|
||||
|
||||
// curl write callback that writes to a file stream
|
||||
size_t WriteCallbackFile(void* contents, size_t size, size_t nmemb,
|
||||
void* userp) {
|
||||
size_t realsize = size * nmemb;
|
||||
auto* outFile = static_cast<std::ofstream*>(userp);
|
||||
outFile->write(static_cast<char*>(contents), realsize);
|
||||
return realsize;
|
||||
}
|
||||
|
||||
// RAII wrapper for CURL handle using unique_ptr
|
||||
using CurlHandle = std::unique_ptr<CURL, decltype(&curl_easy_cleanup)>;
|
||||
|
||||
CurlHandle create_handle() {
|
||||
CURL* handle = curl_easy_init();
|
||||
if (!handle) {
|
||||
throw std::runtime_error(
|
||||
"[CURLWebClient] Failed to initialize libcurl handle");
|
||||
}
|
||||
return CurlHandle(handle, &curl_easy_cleanup);
|
||||
}
|
||||
|
||||
void set_common_get_options(CURL* curl, const std::string& url,
|
||||
long connect_timeout, long total_timeout) {
|
||||
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
|
||||
curl_easy_setopt(curl, CURLOPT_USERAGENT, "biergarten-pipeline/0.1.0");
|
||||
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
|
||||
curl_easy_setopt(curl, CURLOPT_MAXREDIRS, 5L);
|
||||
curl_easy_setopt(curl, CURLOPT_CONNECTTIMEOUT, connect_timeout);
|
||||
curl_easy_setopt(curl, CURLOPT_TIMEOUT, total_timeout);
|
||||
curl_easy_setopt(curl, CURLOPT_ACCEPT_ENCODING, "gzip");
|
||||
}
|
||||
} // namespace
|
||||
|
||||
CURLWebClient::CURLWebClient() {}
|
||||
|
||||
CURLWebClient::~CURLWebClient() {}
|
||||
|
||||
void CURLWebClient::DownloadToFile(const std::string& url,
|
||||
const std::string& file_path) {
|
||||
auto curl = create_handle();
|
||||
|
||||
std::ofstream outFile(file_path, std::ios::binary);
|
||||
if (!outFile.is_open()) {
|
||||
throw std::runtime_error(
|
||||
"[CURLWebClient] Cannot open file for writing: " + file_path);
|
||||
}
|
||||
|
||||
set_common_get_options(curl.get(), url, 30L, 300L);
|
||||
curl_easy_setopt(curl.get(), CURLOPT_WRITEFUNCTION, WriteCallbackFile);
|
||||
curl_easy_setopt(curl.get(), CURLOPT_WRITEDATA,
|
||||
static_cast<void*>(&outFile));
|
||||
|
||||
CURLcode res = curl_easy_perform(curl.get());
|
||||
outFile.close();
|
||||
|
||||
if (res != CURLE_OK) {
|
||||
std::remove(file_path.c_str());
|
||||
std::string error = std::string("[CURLWebClient] Download failed: ") +
|
||||
curl_easy_strerror(res);
|
||||
throw std::runtime_error(error);
|
||||
}
|
||||
|
||||
long httpCode = 0;
|
||||
curl_easy_getinfo(curl.get(), CURLINFO_RESPONSE_CODE, &httpCode);
|
||||
|
||||
if (httpCode != 200) {
|
||||
std::remove(file_path.c_str());
|
||||
std::stringstream ss;
|
||||
ss << "[CURLWebClient] HTTP error " << httpCode << " for URL " << url;
|
||||
throw std::runtime_error(ss.str());
|
||||
}
|
||||
}
|
||||
|
||||
std::string CURLWebClient::Get(const std::string& url) {
|
||||
auto curl = create_handle();
|
||||
|
||||
std::string response_string;
|
||||
set_common_get_options(curl.get(), url, 10L, 20L);
|
||||
curl_easy_setopt(curl.get(), CURLOPT_WRITEFUNCTION, WriteCallbackString);
|
||||
curl_easy_setopt(curl.get(), CURLOPT_WRITEDATA, &response_string);
|
||||
|
||||
CURLcode res = curl_easy_perform(curl.get());
|
||||
|
||||
if (res != CURLE_OK) {
|
||||
std::string error =
|
||||
std::string("[CURLWebClient] GET failed: ") + curl_easy_strerror(res);
|
||||
throw std::runtime_error(error);
|
||||
}
|
||||
|
||||
long httpCode = 0;
|
||||
curl_easy_getinfo(curl.get(), CURLINFO_RESPONSE_CODE, &httpCode);
|
||||
|
||||
if (httpCode != 200) {
|
||||
std::stringstream ss;
|
||||
ss << "[CURLWebClient] HTTP error " << httpCode << " for URL " << url;
|
||||
throw std::runtime_error(ss.str());
|
||||
}
|
||||
|
||||
return response_string;
|
||||
}
|
||||
|
||||
std::string CURLWebClient::UrlEncode(const std::string& value) {
|
||||
// A NULL handle is fine for UTF-8 encoding according to libcurl docs.
|
||||
char* output = curl_easy_escape(nullptr, value.c_str(), 0);
|
||||
|
||||
if (output) {
|
||||
std::string result(output);
|
||||
curl_free(output);
|
||||
return result;
|
||||
}
|
||||
throw std::runtime_error("[CURLWebClient] curl_easy_escape failed");
|
||||
}
|
||||
89
pipeline/src/wikipedia/wikipedia_service.cpp
Normal file
@@ -0,0 +1,89 @@
|
||||
#include "wikipedia/wikipedia_service.h"
|
||||
|
||||
#include <spdlog/spdlog.h>
|
||||
|
||||
#include <boost/json.hpp>
|
||||
|
||||
WikipediaService::WikipediaService(std::shared_ptr<WebClient> client)
|
||||
: client_(std::move(client)) {}
|
||||
|
||||
std::string WikipediaService::FetchExtract(std::string_view query) {
|
||||
const std::string encoded = client_->UrlEncode(std::string(query));
|
||||
const std::string url =
|
||||
"https://en.wikipedia.org/w/api.php?action=query&titles=" + encoded +
|
||||
"&prop=extracts&explaintext=1&format=json";
|
||||
|
||||
const std::string body = client_->Get(url);
|
||||
|
||||
boost::system::error_code ec;
|
||||
boost::json::value doc = boost::json::parse(body, ec);
|
||||
|
||||
if (!ec && doc.is_object()) {
|
||||
try {
|
||||
auto& pages = doc.at("query").at("pages").get_object();
|
||||
if (!pages.empty()) {
|
||||
auto& page = pages.begin()->value().get_object();
|
||||
if (page.contains("extract") && page.at("extract").is_string()) {
|
||||
std::string extract(page.at("extract").as_string().c_str());
|
||||
spdlog::debug("WikipediaService fetched {} chars for '{}'",
|
||||
extract.size(), query);
|
||||
return extract;
|
||||
}
|
||||
}
|
||||
} catch (const std::exception& e) {
|
||||
spdlog::warn(
|
||||
"WikipediaService: failed to parse response structure for '{}': "
|
||||
"{}",
|
||||
query, e.what());
|
||||
return {};
|
||||
}
|
||||
} else if (ec) {
|
||||
spdlog::warn("WikipediaService: JSON parse error for '{}': {}", query,
|
||||
ec.message());
|
||||
}
|
||||
|
||||
return {};
|
||||
}
|
||||
|
||||
std::string WikipediaService::GetSummary(std::string_view city,
|
||||
std::string_view country) {
|
||||
const std::string key = std::string(city) + "|" + std::string(country);
|
||||
const auto cacheIt = cache_.find(key);
|
||||
if (cacheIt != cache_.end()) {
|
||||
return cacheIt->second;
|
||||
}
|
||||
|
||||
std::string result;
|
||||
|
||||
if (!client_) {
|
||||
cache_.emplace(key, result);
|
||||
return result;
|
||||
}
|
||||
|
||||
std::string regionQuery(city);
|
||||
if (!country.empty()) {
|
||||
regionQuery += ", ";
|
||||
regionQuery += country;
|
||||
}
|
||||
|
||||
const std::string beerQuery = "beer in " + std::string(country);
|
||||
|
||||
try {
|
||||
const std::string regionExtract = FetchExtract(regionQuery);
|
||||
const std::string beerExtract = FetchExtract(beerQuery);
|
||||
|
||||
if (!regionExtract.empty()) {
|
||||
result += regionExtract;
|
||||
}
|
||||
if (!beerExtract.empty()) {
|
||||
if (!result.empty()) result += "\n\n";
|
||||
result += beerExtract;
|
||||
}
|
||||
} catch (const std::runtime_error& e) {
|
||||
spdlog::debug("WikipediaService lookup failed for '{}': {}", regionQuery,
|
||||
e.what());
|
||||
}
|
||||
|
||||
cache_.emplace(key, result);
|
||||
return result;
|
||||
}
|
||||
@@ -1,6 +1,7 @@
|
||||
using API.Core.Contracts.Auth;
|
||||
using API.Core.Contracts.Common;
|
||||
using Domain.Entities;
|
||||
using Microsoft.AspNetCore.Authorization;
|
||||
using Microsoft.AspNetCore.Mvc;
|
||||
using Service.Auth;
|
||||
|
||||
@@ -8,6 +9,7 @@ namespace API.Core.Controllers
|
||||
{
|
||||
[ApiController]
|
||||
[Route("api/[controller]")]
|
||||
[Authorize(AuthenticationSchemes = "JWT")]
|
||||
public class AuthController(
|
||||
IRegisterService registerService,
|
||||
ILoginService loginService,
|
||||
@@ -15,6 +17,7 @@ namespace API.Core.Controllers
|
||||
ITokenService tokenService
|
||||
) : ControllerBase
|
||||
{
|
||||
[AllowAnonymous]
|
||||
[HttpPost("register")]
|
||||
public async Task<ActionResult<UserAccount>> Register(
|
||||
[FromBody] RegisterRequest req
|
||||
@@ -47,6 +50,7 @@ namespace API.Core.Controllers
|
||||
return Created("/", response);
|
||||
}
|
||||
|
||||
[AllowAnonymous]
|
||||
[HttpPost("login")]
|
||||
public async Task<ActionResult> Login([FromBody] LoginRequest req)
|
||||
{
|
||||
@@ -82,6 +86,7 @@ namespace API.Core.Controllers
|
||||
);
|
||||
}
|
||||
|
||||
[AllowAnonymous]
|
||||
[HttpPost("refresh")]
|
||||
public async Task<ActionResult> Refresh(
|
||||
[FromBody] RefreshTokenRequest req
|
||||
|
||||
@@ -6,6 +6,7 @@ Feature: User Account Confirmation
|
||||
Given the API is running
|
||||
And I have registered a new account
|
||||
And I have a valid confirmation token for my account
|
||||
And I have a valid access token for my account
|
||||
When I submit a confirmation request with the valid token
|
||||
Then the response has HTTP status 200
|
||||
And the response JSON should have "message" containing "is confirmed"
|
||||
@@ -14,6 +15,7 @@ Feature: User Account Confirmation
|
||||
Given the API is running
|
||||
And I have registered a new account
|
||||
And I have a valid confirmation token for my account
|
||||
And I have a valid access token for my account
|
||||
When I submit a confirmation request with the valid token
|
||||
And I submit the same confirmation request again
|
||||
Then the response has HTTP status 200
|
||||
@@ -21,6 +23,8 @@ Feature: User Account Confirmation
|
||||
|
||||
Scenario: Confirmation fails with invalid token
|
||||
Given the API is running
|
||||
And I have registered a new account
|
||||
And I have a valid access token for my account
|
||||
When I submit a confirmation request with an invalid token
|
||||
Then the response has HTTP status 401
|
||||
And the response JSON should have "message" containing "Invalid token"
|
||||
@@ -29,6 +33,7 @@ Feature: User Account Confirmation
|
||||
Given the API is running
|
||||
And I have registered a new account
|
||||
And I have an expired confirmation token for my account
|
||||
And I have a valid access token for my account
|
||||
When I submit a confirmation request with the expired token
|
||||
Then the response has HTTP status 401
|
||||
And the response JSON should have "message" containing "Invalid token"
|
||||
@@ -37,12 +42,15 @@ Feature: User Account Confirmation
|
||||
Given the API is running
|
||||
And I have registered a new account
|
||||
And I have a confirmation token signed with the wrong secret
|
||||
And I have a valid access token for my account
|
||||
When I submit a confirmation request with the tampered token
|
||||
Then the response has HTTP status 401
|
||||
And the response JSON should have "message" containing "Invalid token"
|
||||
|
||||
Scenario: Confirmation fails when token is missing
|
||||
Given the API is running
|
||||
And I have registered a new account
|
||||
And I have a valid access token for my account
|
||||
When I submit a confirmation request with a missing token
|
||||
Then the response has HTTP status 400
|
||||
|
||||
@@ -54,6 +62,15 @@ Feature: User Account Confirmation
|
||||
|
||||
Scenario: Confirmation fails with malformed token
|
||||
Given the API is running
|
||||
And I have registered a new account
|
||||
And I have a valid access token for my account
|
||||
When I submit a confirmation request with a malformed token
|
||||
Then the response has HTTP status 401
|
||||
And the response JSON should have "message" containing "Invalid token"
|
||||
|
||||
Scenario: Confirmation fails without an access token
|
||||
Given the API is running
|
||||
And I have registered a new account
|
||||
And I have a valid confirmation token for my account
|
||||
When I submit a confirmation request with the valid token without an access token
|
||||
Then the response has HTTP status 401
|
||||
|
||||
36
src/Core/API/API.Specs/Features/ResendConfirmation.feature
Normal file
@@ -0,0 +1,36 @@
|
||||
Feature: Resend Confirmation Email
|
||||
As a user who did not receive the confirmation email
|
||||
I want to request a resend of the confirmation email
|
||||
So that I can obtain a working confirmation link while preventing abuse
|
||||
|
||||
Scenario: Legitimate resend for an unconfirmed user
|
||||
Given the API is running
|
||||
And I have registered a new account
|
||||
And I have a valid access token for my account
|
||||
When I submit a resend confirmation request for my account
|
||||
Then the response has HTTP status 200
|
||||
And the response JSON should have "message" containing "confirmation email has been resent"
|
||||
|
||||
Scenario: Resend is a no-op for an already confirmed user
|
||||
Given the API is running
|
||||
And I have registered a new account
|
||||
And I have a valid confirmation token for my account
|
||||
And I have a valid access token for my account
|
||||
And I have confirmed my account
|
||||
When I submit a resend confirmation request for my account
|
||||
Then the response has HTTP status 200
|
||||
And the response JSON should have "message" containing "confirmation email has been resent"
|
||||
|
||||
Scenario: Resend is a no-op for a non-existent user
|
||||
Given the API is running
|
||||
And I have registered a new account
|
||||
And I have a valid access token for my account
|
||||
When I submit a resend confirmation request for a non-existent user
|
||||
Then the response has HTTP status 200
|
||||
And the response JSON should have "message" containing "confirmation email has been resent"
|
||||
|
||||
Scenario: Resend requires authentication
|
||||
Given the API is running
|
||||
And I have registered a new account
|
||||
When I submit a resend confirmation request without an access token
|
||||
Then the response has HTTP status 401
|
||||
@@ -7,6 +7,8 @@ public class MockEmailService : IEmailService
|
||||
{
|
||||
public List<RegistrationEmail> SentRegistrationEmails { get; } = new();
|
||||
|
||||
public List<ResendConfirmationEmail> SentResendConfirmationEmails { get; } = new();
|
||||
|
||||
public Task SendRegistrationEmailAsync(
|
||||
UserAccount createdUser,
|
||||
string confirmationToken
|
||||
@@ -24,9 +26,27 @@ public class MockEmailService : IEmailService
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
|
||||
public Task SendResendConfirmationEmailAsync(
|
||||
UserAccount user,
|
||||
string confirmationToken
|
||||
)
|
||||
{
|
||||
SentResendConfirmationEmails.Add(
|
||||
new ResendConfirmationEmail
|
||||
{
|
||||
UserAccount = user,
|
||||
ConfirmationToken = confirmationToken,
|
||||
SentAt = DateTime.UtcNow,
|
||||
}
|
||||
);
|
||||
|
||||
return Task.CompletedTask;
|
||||
}
|
||||
|
||||
public void Clear()
|
||||
{
|
||||
SentRegistrationEmails.Clear();
|
||||
SentResendConfirmationEmails.Clear();
|
||||
}
|
||||
|
||||
public class RegistrationEmail
|
||||
@@ -35,4 +55,11 @@ public class MockEmailService : IEmailService
|
||||
public string ConfirmationToken { get; init; } = string.Empty;
|
||||
public DateTime SentAt { get; init; }
|
||||
}
|
||||
|
||||
public class ResendConfirmationEmail
|
||||
{
|
||||
public UserAccount UserAccount { get; init; } = null!;
|
||||
public string ConfirmationToken { get; init; } = string.Empty;
|
||||
public DateTime SentAt { get; init; }
|
||||
}
|
||||
}
|
||||
|
||||
@@ -457,6 +457,32 @@ public class AuthSteps(ScenarioContext scenario)
|
||||
await GivenIAmLoggedIn();
|
||||
}
|
||||
|
||||
[Given("I have a valid access token for my account")]
|
||||
public void GivenIHaveAValidAccessTokenForMyAccount()
|
||||
{
|
||||
var userId = scenario.TryGetValue<Guid>(RegisteredUserIdKey, out var id)
|
||||
? id
|
||||
: throw new InvalidOperationException(
|
||||
"registered user ID not found in scenario"
|
||||
);
|
||||
var username = scenario.TryGetValue<string>(
|
||||
RegisteredUsernameKey,
|
||||
out var user
|
||||
)
|
||||
? user
|
||||
: throw new InvalidOperationException(
|
||||
"registered username not found in scenario"
|
||||
);
|
||||
|
||||
var secret = GetRequiredEnvVar("ACCESS_TOKEN_SECRET");
|
||||
scenario["accessToken"] = GenerateJwtToken(
|
||||
userId,
|
||||
username,
|
||||
secret,
|
||||
DateTime.UtcNow.AddMinutes(60)
|
||||
);
|
||||
}
|
||||
|
||||
[Given("I have a valid confirmation token for my account")]
|
||||
public void GivenIHaveAValidConfirmationTokenForMyAccount()
|
||||
{
|
||||
@@ -587,11 +613,16 @@ public class AuthSteps(ScenarioContext scenario)
|
||||
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
|
||||
? t
|
||||
: "valid-token";
|
||||
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
|
||||
? at
|
||||
: string.Empty;
|
||||
|
||||
var requestMessage = new HttpRequestMessage(
|
||||
HttpMethod.Post,
|
||||
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
|
||||
);
|
||||
if (!string.IsNullOrEmpty(accessToken))
|
||||
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
|
||||
|
||||
var response = await client.SendAsync(requestMessage);
|
||||
var responseBody = await response.Content.ReadAsStringAsync();
|
||||
@@ -606,11 +637,16 @@ public class AuthSteps(ScenarioContext scenario)
|
||||
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
|
||||
? t
|
||||
: "valid-token";
|
||||
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
|
||||
? at
|
||||
: string.Empty;
|
||||
|
||||
var requestMessage = new HttpRequestMessage(
|
||||
HttpMethod.Post,
|
||||
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
|
||||
);
|
||||
if (!string.IsNullOrEmpty(accessToken))
|
||||
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
|
||||
|
||||
var response = await client.SendAsync(requestMessage);
|
||||
var responseBody = await response.Content.ReadAsStringAsync();
|
||||
@@ -623,11 +659,16 @@ public class AuthSteps(ScenarioContext scenario)
|
||||
{
|
||||
var client = GetClient();
|
||||
const string token = "malformed-token-not-jwt";
|
||||
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
|
||||
? at
|
||||
: string.Empty;
|
||||
|
||||
var requestMessage = new HttpRequestMessage(
|
||||
HttpMethod.Post,
|
||||
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
|
||||
);
|
||||
if (!string.IsNullOrEmpty(accessToken))
|
||||
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
|
||||
|
||||
var response = await client.SendAsync(requestMessage);
|
||||
var responseBody = await response.Content.ReadAsStringAsync();
|
||||
@@ -883,11 +924,16 @@ public class AuthSteps(ScenarioContext scenario)
|
||||
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
|
||||
? t
|
||||
: "expired-confirmation-token";
|
||||
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
|
||||
? at
|
||||
: string.Empty;
|
||||
|
||||
var requestMessage = new HttpRequestMessage(
|
||||
HttpMethod.Post,
|
||||
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
|
||||
);
|
||||
if (!string.IsNullOrEmpty(accessToken))
|
||||
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
|
||||
|
||||
var response = await client.SendAsync(requestMessage);
|
||||
var responseBody = await response.Content.ReadAsStringAsync();
|
||||
@@ -902,11 +948,16 @@ public class AuthSteps(ScenarioContext scenario)
|
||||
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
|
||||
? t
|
||||
: "tampered-confirmation-token";
|
||||
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
|
||||
? at
|
||||
: string.Empty;
|
||||
|
||||
var requestMessage = new HttpRequestMessage(
|
||||
HttpMethod.Post,
|
||||
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
|
||||
);
|
||||
if (!string.IsNullOrEmpty(accessToken))
|
||||
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
|
||||
|
||||
var response = await client.SendAsync(requestMessage);
|
||||
var responseBody = await response.Content.ReadAsStringAsync();
|
||||
@@ -918,7 +969,13 @@ public class AuthSteps(ScenarioContext scenario)
|
||||
public async Task WhenISubmitAConfirmationRequestWithAMissingToken()
|
||||
{
|
||||
var client = GetClient();
|
||||
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
|
||||
? at
|
||||
: string.Empty;
|
||||
|
||||
var requestMessage = new HttpRequestMessage(HttpMethod.Post, "/api/auth/confirm");
|
||||
if (!string.IsNullOrEmpty(accessToken))
|
||||
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
|
||||
|
||||
var response = await client.SendAsync(requestMessage);
|
||||
var responseBody = await response.Content.ReadAsStringAsync();
|
||||
@@ -974,6 +1031,30 @@ public class AuthSteps(ScenarioContext scenario)
|
||||
{
|
||||
var client = GetClient();
|
||||
const string token = "invalid-confirmation-token";
|
||||
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
|
||||
? at
|
||||
: string.Empty;
|
||||
|
||||
var requestMessage = new HttpRequestMessage(
|
||||
HttpMethod.Post,
|
||||
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
|
||||
);
|
||||
if (!string.IsNullOrEmpty(accessToken))
|
||||
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
|
||||
|
||||
var response = await client.SendAsync(requestMessage);
|
||||
var responseBody = await response.Content.ReadAsStringAsync();
|
||||
scenario[ResponseKey] = response;
|
||||
scenario[ResponseBodyKey] = responseBody;
|
||||
}
|
||||
|
||||
[When("I submit a confirmation request with the valid token without an access token")]
|
||||
public async Task WhenISubmitAConfirmationRequestWithTheValidTokenWithoutAnAccessToken()
|
||||
{
|
||||
var client = GetClient();
|
||||
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
|
||||
? t
|
||||
: "valid-token";
|
||||
|
||||
var requestMessage = new HttpRequestMessage(
|
||||
HttpMethod.Post,
|
||||
@@ -1043,4 +1124,91 @@ public class AuthSteps(ScenarioContext scenario)
|
||||
refreshToken.Should().NotBe(previousRefreshToken);
|
||||
}
|
||||
}
|
||||
|
||||
[Given("I have confirmed my account")]
|
||||
public async Task GivenIHaveConfirmedMyAccount()
|
||||
{
|
||||
var client = GetClient();
|
||||
var token = scenario.TryGetValue<string>("confirmationToken", out var t)
|
||||
? t
|
||||
: throw new InvalidOperationException("confirmation token not found");
|
||||
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
|
||||
? at
|
||||
: string.Empty;
|
||||
|
||||
var requestMessage = new HttpRequestMessage(
|
||||
HttpMethod.Post,
|
||||
$"/api/auth/confirm?token={Uri.EscapeDataString(token)}"
|
||||
);
|
||||
if (!string.IsNullOrEmpty(accessToken))
|
||||
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
|
||||
|
||||
var response = await client.SendAsync(requestMessage);
|
||||
response.EnsureSuccessStatusCode();
|
||||
}
|
||||
|
||||
[When("I submit a resend confirmation request for my account")]
|
||||
public async Task WhenISubmitAResendConfirmationRequestForMyAccount()
|
||||
{
|
||||
var client = GetClient();
|
||||
var userId = scenario.TryGetValue<Guid>(RegisteredUserIdKey, out var id)
|
||||
? id
|
||||
: throw new InvalidOperationException("registered user ID not found");
|
||||
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
|
||||
? at
|
||||
: string.Empty;
|
||||
|
||||
var requestMessage = new HttpRequestMessage(
|
||||
HttpMethod.Post,
|
||||
$"/api/auth/confirm/resend?userId={userId}"
|
||||
);
|
||||
if (!string.IsNullOrEmpty(accessToken))
|
||||
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
|
||||
|
||||
var response = await client.SendAsync(requestMessage);
|
||||
var responseBody = await response.Content.ReadAsStringAsync();
|
||||
scenario[ResponseKey] = response;
|
||||
scenario[ResponseBodyKey] = responseBody;
|
||||
}
|
||||
|
||||
[When("I submit a resend confirmation request for a non-existent user")]
|
||||
public async Task WhenISubmitAResendConfirmationRequestForANonExistentUser()
|
||||
{
|
||||
var client = GetClient();
|
||||
var fakeUserId = Guid.NewGuid();
|
||||
var accessToken = scenario.TryGetValue<string>("accessToken", out var at)
|
||||
? at
|
||||
: string.Empty;
|
||||
|
||||
var requestMessage = new HttpRequestMessage(
|
||||
HttpMethod.Post,
|
||||
$"/api/auth/confirm/resend?userId={fakeUserId}"
|
||||
);
|
||||
if (!string.IsNullOrEmpty(accessToken))
|
||||
requestMessage.Headers.Add("Authorization", $"Bearer {accessToken}");
|
||||
|
||||
var response = await client.SendAsync(requestMessage);
|
||||
var responseBody = await response.Content.ReadAsStringAsync();
|
||||
scenario[ResponseKey] = response;
|
||||
scenario[ResponseBodyKey] = responseBody;
|
||||
}
|
||||
|
||||
[When("I submit a resend confirmation request without an access token")]
|
||||
public async Task WhenISubmitAResendConfirmationRequestWithoutAnAccessToken()
|
||||
{
|
||||
var client = GetClient();
|
||||
var userId = scenario.TryGetValue<Guid>(RegisteredUserIdKey, out var id)
|
||||
? id
|
||||
: Guid.NewGuid();
|
||||
|
||||
var requestMessage = new HttpRequestMessage(
|
||||
HttpMethod.Post,
|
||||
$"/api/auth/confirm/resend?userId={userId}"
|
||||
);
|
||||
|
||||
var response = await client.SendAsync(requestMessage);
|
||||
var responseBody = await response.Content.ReadAsStringAsync();
|
||||
scenario[ResponseKey] = response;
|
||||
scenario[ResponseBodyKey] = responseBody;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,117 @@
|
||||
@using Infrastructure.Email.Templates.Components
|
||||
|
||||
<!DOCTYPE html>
|
||||
<html lang="en" xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml"
|
||||
xmlns:o="urn:schemas-microsoft-com:office:office">
|
||||
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<meta name="viewport" content="width=device-width">
|
||||
<meta http-equiv="X-UA-Compatible" content="IE=edge">
|
||||
<meta name="x-apple-disable-message-reformatting">
|
||||
<title>Resend Confirmation - The Biergarten App</title>
|
||||
<!--[if mso]>
|
||||
<style>
|
||||
* { font-family: Arial, sans-serif !important; }
|
||||
table { border-collapse: collapse; }
|
||||
</style>
|
||||
<![endif]-->
|
||||
<!--[if !mso]><!-->
|
||||
<style>
|
||||
* {
|
||||
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif;
|
||||
}
|
||||
</style>
|
||||
<!--<![endif]-->
|
||||
</head>
|
||||
|
||||
<body style="margin:0; padding:0; background-color:#f4f4f4; width:100%;">
|
||||
<table role="presentation" border="0" cellpadding="0" cellspacing="0" width="100%" style="background-color:#f4f4f4;">
|
||||
<tr>
|
||||
<td align="center" style="padding:40px 10px;">
|
||||
<!--[if mso]>
|
||||
<table border="0" cellpadding="0" cellspacing="0" width="600" style="width:600px;">
|
||||
<tr><td>
|
||||
<![endif]-->
|
||||
<table role="presentation" border="0" cellpadding="0" cellspacing="0" width="100%"
|
||||
style="max-width:600px; background:#ffffff; border-radius:8px; box-shadow:0 2px 8px rgba(0,0,0,.08);">
|
||||
|
||||
<Header />
|
||||
|
||||
<tr>
|
||||
<td style="padding:40px 40px 16px 40px; text-align:center;">
|
||||
<h1 style="margin:0; color:#333333; font-size:26px; font-weight:700;">
|
||||
New Confirmation Link
|
||||
</h1>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td style="padding:0 40px 20px 40px; text-align:center;">
|
||||
<p style="margin:0; color:#666666; font-size:16px; line-height:24px;">
|
||||
Hi <strong style="color:#333333;">@Username</strong>, you requested another email confirmation
|
||||
link.
|
||||
Use the button below to verify your account.
|
||||
</p>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td style="padding:8px 40px;">
|
||||
<table role="presentation" border="0" cellpadding="0" cellspacing="0" width="100%">
|
||||
<tr>
|
||||
<td align="center">
|
||||
<!--[if mso]>
|
||||
<v:roundrect xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="urn:schemas-microsoft-com:office:word"
|
||||
href="@ConfirmationLink" style="height:50px;v-text-anchor:middle;width:260px;"
|
||||
arcsize="10%" stroke="f" fillcolor="#f59e0b">
|
||||
<w:anchorlock/>
|
||||
<center style="color:#ffffff;font-family:Arial,sans-serif;font-size:16px;font-weight:700;">
|
||||
Confirm Email Again
|
||||
</center>
|
||||
</v:roundrect>
|
||||
<![endif]-->
|
||||
<!--[if !mso]><!-->
|
||||
<a href="@ConfirmationLink" target="_blank" rel="noopener noreferrer"
|
||||
style="display:inline-block; padding:16px 40px; background:#d97706; color:#ffffff; text-decoration:none; border-radius:6px; font-size:16px; font-weight:700;">
|
||||
Confirm Email Again
|
||||
</a>
|
||||
<!--<![endif]-->
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td style="padding:20px 40px 8px 40px; text-align:center;">
|
||||
<p style="margin:0; color:#999999; font-size:13px; line-height:20px;">
|
||||
This replacement link expires in 24 hours.
|
||||
</p>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td style="padding:0 40px 28px 40px; text-align:center;">
|
||||
<p style="margin:0; color:#999999; font-size:13px; line-height:20px;">
|
||||
If you did not request this, you can safely ignore this email.
|
||||
</p>
|
||||
</td>
|
||||
</tr>
|
||||
|
||||
<EmailFooter FooterText="Cheers, The Biergarten App Team" />
|
||||
</table>
|
||||
<!--[if mso]></td></tr></table><![endif]-->
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
@code {
|
||||
[Parameter]
|
||||
public string Username { get; set; } = string.Empty;
|
||||
|
||||
[Parameter]
|
||||
public string ConfirmationLink { get; set; } = string.Empty;
|
||||
}
|
||||
@@ -30,6 +30,23 @@ public class EmailTemplateProvider(
|
||||
return await RenderComponentAsync<UserRegistration>(parameters);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Renders the ResendConfirmation template with the specified parameters.
|
||||
/// </summary>
|
||||
public async Task<string> RenderResendConfirmationEmailAsync(
|
||||
string username,
|
||||
string confirmationLink
|
||||
)
|
||||
{
|
||||
var parameters = new Dictionary<string, object?>
|
||||
{
|
||||
{ nameof(ResendConfirmation.Username), username },
|
||||
{ nameof(ResendConfirmation.ConfirmationLink), confirmationLink },
|
||||
};
|
||||
|
||||
return await RenderComponentAsync<ResendConfirmation>(parameters);
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// Generic method to render any Razor component to HTML.
|
||||
/// </summary>
|
||||
|
||||
@@ -15,4 +15,15 @@ public interface IEmailTemplateProvider
|
||||
string username,
|
||||
string confirmationLink
|
||||
);
|
||||
|
||||
/// <summary>
|
||||
/// Renders the ResendConfirmation template with the specified parameters.
|
||||
/// </summary>
|
||||
/// <param name="username">The username to include in the email</param>
|
||||
/// <param name="confirmationLink">The new confirmation link</param>
|
||||
/// <returns>The rendered HTML string</returns>
|
||||
Task<string> RenderResendConfirmationEmailAsync(
|
||||
string username,
|
||||
string confirmationLink
|
||||
);
|
||||
}
|
||||
|
||||
@@ -159,7 +159,7 @@ public class AuthRepository(ISqlConnectionFactory connectionFactory)
|
||||
return await GetUserByIdAsync(userAccountId);
|
||||
}
|
||||
|
||||
private async Task<bool> IsUserVerifiedAsync(Guid userAccountId)
|
||||
public async Task<bool> IsUserVerifiedAsync(Guid userAccountId)
|
||||
{
|
||||
await using var connection = await CreateConnection();
|
||||
await using var command = connection.CreateCommand();
|
||||
|
||||
@@ -75,4 +75,11 @@ public interface IAuthRepository
|
||||
/// <param name="userAccountId">ID of the user account</param>
|
||||
/// <returns>UserAccount if found, null otherwise</returns>
|
||||
Task<Domain.Entities.UserAccount?> GetUserByIdAsync(Guid userAccountId);
|
||||
|
||||
/// <summary>
|
||||
/// Checks whether a user account has been verified.
|
||||
/// </summary>
|
||||
/// <param name="userAccountId">ID of the user account</param>
|
||||
/// <returns>True if the user has a verification record, false otherwise</returns>
|
||||
Task<bool> IsUserVerifiedAsync(Guid userAccountId);
|
||||
}
|
||||
|
||||
@@ -5,6 +5,7 @@ using Domain.Exceptions;
|
||||
using FluentAssertions;
|
||||
using Infrastructure.Repository.Auth;
|
||||
using Moq;
|
||||
using Service.Emails;
|
||||
|
||||
namespace Service.Auth.Tests;
|
||||
|
||||
@@ -12,16 +13,19 @@ public class ConfirmationServiceTest
|
||||
{
|
||||
private readonly Mock<IAuthRepository> _authRepositoryMock;
|
||||
private readonly Mock<ITokenService> _tokenServiceMock;
|
||||
private readonly Mock<IEmailService> _emailServiceMock;
|
||||
private readonly ConfirmationService _confirmationService;
|
||||
|
||||
public ConfirmationServiceTest()
|
||||
{
|
||||
_authRepositoryMock = new Mock<IAuthRepository>();
|
||||
_tokenServiceMock = new Mock<ITokenService>();
|
||||
_emailServiceMock = new Mock<IEmailService>();
|
||||
|
||||
_confirmationService = new ConfirmationService(
|
||||
_authRepositoryMock.Object,
|
||||
_tokenServiceMock.Object
|
||||
_tokenServiceMock.Object,
|
||||
_emailServiceMock.Object
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@@ -1,12 +1,14 @@
|
||||
|
||||
using Domain.Exceptions;
|
||||
using Infrastructure.Repository.Auth;
|
||||
using Service.Emails;
|
||||
|
||||
namespace Service.Auth;
|
||||
|
||||
public class ConfirmationService(
|
||||
IAuthRepository authRepository,
|
||||
ITokenService tokenService
|
||||
ITokenService tokenService,
|
||||
IEmailService emailService
|
||||
) : IConfirmationService
|
||||
{
|
||||
public async Task<ConfirmationServiceReturn> ConfirmUserAsync(
|
||||
@@ -31,4 +33,21 @@ public class ConfirmationService(
|
||||
user.UserAccountId
|
||||
);
|
||||
}
|
||||
|
||||
public async Task ResendConfirmationEmailAsync(Guid userId)
|
||||
{
|
||||
var user = await authRepository.GetUserByIdAsync(userId);
|
||||
if (user == null)
|
||||
{
|
||||
return; // Silent return to prevent user enumeration
|
||||
}
|
||||
|
||||
if (await authRepository.IsUserVerifiedAsync(userId))
|
||||
{
|
||||
return; // Already confirmed, no-op
|
||||
}
|
||||
|
||||
var confirmationToken = tokenService.GenerateConfirmationToken(user);
|
||||
await emailService.SendResendConfirmationEmailAsync(user, confirmationToken);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -8,4 +8,6 @@ public record ConfirmationServiceReturn(DateTime ConfirmedAt, Guid UserId);
|
||||
public interface IConfirmationService
|
||||
{
|
||||
Task<ConfirmationServiceReturn> ConfirmUserAsync(string confirmationToken);
|
||||
Task ResendConfirmationEmailAsync(Guid userId);
|
||||
|
||||
}
|
||||
|
||||
@@ -10,6 +10,11 @@ public interface IEmailService
|
||||
UserAccount createdUser,
|
||||
string confirmationToken
|
||||
);
|
||||
|
||||
public Task SendResendConfirmationEmailAsync(
|
||||
UserAccount user,
|
||||
string confirmationToken
|
||||
);
|
||||
}
|
||||
|
||||
public class EmailService(
|
||||
@@ -17,13 +22,17 @@ public class EmailService(
|
||||
IEmailTemplateProvider emailTemplateProvider
|
||||
) : IEmailService
|
||||
{
|
||||
private static readonly string WebsiteBaseUrl =
|
||||
Environment.GetEnvironmentVariable("WEBSITE_BASE_URL")
|
||||
?? throw new InvalidOperationException("WEBSITE_BASE_URL environment variable is not set");
|
||||
|
||||
public async Task SendRegistrationEmailAsync(
|
||||
UserAccount createdUser,
|
||||
string confirmationToken
|
||||
)
|
||||
{
|
||||
var confirmationLink =
|
||||
$"https://thebiergarten.app/confirm?token={confirmationToken}";
|
||||
$"{WebsiteBaseUrl}/users/confirm?token={confirmationToken}";
|
||||
|
||||
var emailHtml =
|
||||
await emailTemplateProvider.RenderUserRegisteredEmailAsync(
|
||||
@@ -38,4 +47,26 @@ public class EmailService(
|
||||
isHtml: true
|
||||
);
|
||||
}
|
||||
|
||||
public async Task SendResendConfirmationEmailAsync(
|
||||
UserAccount user,
|
||||
string confirmationToken
|
||||
)
|
||||
{
|
||||
var confirmationLink =
|
||||
$"{WebsiteBaseUrl}/users/confirm?token={confirmationToken}";
|
||||
|
||||
var emailHtml =
|
||||
await emailTemplateProvider.RenderResendConfirmationEmailAsync(
|
||||
user.FirstName,
|
||||
confirmationLink
|
||||
);
|
||||
|
||||
await emailProvider.SendAsync(
|
||||
user.Email,
|
||||
"Confirm Your Email - The Biergarten App",
|
||||
emailHtml,
|
||||
isHtml: true
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
10881
src/Website-v1/package-lock.json
generated
Normal file
98
src/Website-v1/package.json
Normal file
@@ -0,0 +1,98 @@
|
||||
{
|
||||
"name": "biergarten",
|
||||
"version": "0.1.0",
|
||||
"private": true,
|
||||
"scripts": {
|
||||
"dev": "next dev",
|
||||
"build": "next build",
|
||||
"prestart": "npm run build",
|
||||
"start": "next start",
|
||||
"lint": "next lint",
|
||||
"clear-db": "npx ts-node ./src/prisma/seed/clear/index.ts",
|
||||
"format": "npx prettier . --write; npx prisma format;",
|
||||
"format-watch": "npx onchange \"**/*\" -- prettier --write --ignore-unknown {{changed}}",
|
||||
"seed": "npx --max-old-space-size=4096 ts-node ./src/prisma/seed/index.ts"
|
||||
},
|
||||
"dependencies": {
|
||||
"@hapi/iron": "^7.0.1",
|
||||
"@headlessui/react": "^1.7.15",
|
||||
"@headlessui/tailwindcss": "^0.2.0",
|
||||
"@hookform/resolvers": "^3.3.1",
|
||||
"@mapbox/mapbox-sdk": "^0.15.2",
|
||||
"@mapbox/search-js-core": "^1.0.0-beta.17",
|
||||
"@mapbox/search-js-react": "^1.0.0-beta.17",
|
||||
"@next/bundle-analyzer": "^14.0.3",
|
||||
"@prisma/client": "^5.7.0",
|
||||
"@react-email/components": "^0.0.11",
|
||||
"@react-email/render": "^0.0.9",
|
||||
"@react-email/tailwind": "^0.0.12",
|
||||
"@vercel/analytics": "^1.1.0",
|
||||
"argon2": "^0.31.1",
|
||||
"classnames": "^2.5.1",
|
||||
"cloudinary": "^1.41.0",
|
||||
"cookie": "^0.7.0",
|
||||
"date-fns": "^2.30.0",
|
||||
"dotenv": "^16.3.1",
|
||||
"jsonwebtoken": "^9.0.1",
|
||||
"lodash": "^4.17.21",
|
||||
"mapbox-gl": "^3.4.0",
|
||||
"multer": "^1.4.5-lts.1",
|
||||
"next": "^14.2.22",
|
||||
"next-cloudinary": "^5.10.0",
|
||||
"next-connect": "^1.0.0-next.3",
|
||||
"passport": "^0.6.0",
|
||||
"passport-local": "^1.0.0",
|
||||
"pino": "^10.0.0",
|
||||
"react": "^18.2.0",
|
||||
"react-daisyui": "^5.0.0",
|
||||
"react-dom": "^18.2.0",
|
||||
"react-email": "^1.9.5",
|
||||
"react-hook-form": "^7.45.2",
|
||||
"react-hot-toast": "^2.4.1",
|
||||
"react-icons": "^4.10.1",
|
||||
"react-intersection-observer": "^9.5.2",
|
||||
"react-map-gl": "^7.1.7",
|
||||
"react-responsive-carousel": "^3.2.23",
|
||||
"swr": "^2.2.0",
|
||||
"theme-change": "^2.5.0",
|
||||
"zod": "^3.21.4"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@faker-js/faker": "^8.3.1",
|
||||
"@types/cookie": "^0.5.1",
|
||||
"@types/express": "^4.17.21",
|
||||
"@types/jsonwebtoken": "^9.0.2",
|
||||
"@types/lodash": "^4.14.195",
|
||||
"@types/mapbox__mapbox-sdk": "^0.13.4",
|
||||
"@types/multer": "^1.4.7",
|
||||
"@types/node": "^20.4.2",
|
||||
"@types/passport-local": "^1.0.35",
|
||||
"@types/react": "^18.2.15",
|
||||
"@types/react-dom": "^18.2.7",
|
||||
"@vercel/fetch": "^7.0.0",
|
||||
"autoprefixer": "^10.4.14",
|
||||
"daisyui": "^4.7.2",
|
||||
"dotenv-cli": "^7.2.1",
|
||||
"eslint": "^8.51.0",
|
||||
"eslint-config-airbnb-base": "15.0.0",
|
||||
"eslint-config-airbnb-typescript": "17.1.0",
|
||||
"eslint-config-next": "^13.5.4",
|
||||
"eslint-config-prettier": "^9.0.0",
|
||||
"eslint-plugin-react": "^7.33.2",
|
||||
"generate-password": "^1.7.1",
|
||||
"onchange": "^7.1.0",
|
||||
"postcss": "^8.4.26",
|
||||
"prettier": "^3.0.0",
|
||||
"prettier-plugin-jsdoc": "^1.0.2",
|
||||
"prettier-plugin-tailwindcss": "^0.5.7",
|
||||
"prisma": "^5.7.0",
|
||||
"tailwindcss": "^3.4.1",
|
||||
"tailwindcss-animated": "^1.0.1",
|
||||
"ts-node": "^10.9.1",
|
||||
"typescript": "^5.3.2"
|
||||
},
|
||||
"prisma": {
|
||||
"schema": "./src/prisma/schema.prisma",
|
||||
"seed": "npm run seed"
|
||||
}
|
||||
}
|
||||
6
src/Website-v1/postcss.config.js
Normal file
@@ -0,0 +1,6 @@
|
||||
module.exports = {
|
||||
plugins: {
|
||||
tailwindcss: {},
|
||||
autoprefixer: {},
|
||||
},
|
||||
};
|
||||
|
Before Width: | Height: | Size: 203 KiB After Width: | Height: | Size: 203 KiB |
|
Before Width: | Height: | Size: 7.3 KiB After Width: | Height: | Size: 7.3 KiB |
|
Before Width: | Height: | Size: 26 KiB After Width: | Height: | Size: 26 KiB |
|
Before Width: | Height: | Size: 6.5 KiB After Width: | Height: | Size: 6.5 KiB |
|
Before Width: | Height: | Size: 515 B After Width: | Height: | Size: 515 B |
|
Before Width: | Height: | Size: 961 B After Width: | Height: | Size: 961 B |
|
Before Width: | Height: | Size: 15 KiB After Width: | Height: | Size: 15 KiB |
|
Before Width: | Height: | Size: 256 KiB After Width: | Height: | Size: 256 KiB |