mirror of
https://github.com/aaronpo97/the-biergarten-app.git
synced 2026-04-05 18:09:04 +00:00
225 lines
4.8 KiB
Markdown
225 lines
4.8 KiB
Markdown
Biergarten Pipeline
|
|
|
|
Overview
|
|
|
|
The pipeline orchestrates five key stages:
|
|
|
|
Download: Fetches countries+states+cities.json from a pinned GitHub commit with optional local caching.
|
|
|
|
Parse: Streams JSON using Boost.JSON's basic_parser to extract country/state/city records without loading the entire file into memory.
|
|
|
|
Buffer: Routes city records through a bounded concurrent queue to decouple parsing from writes.
|
|
|
|
Store: Inserts records with concurrent thread safety using an in-memory SQLite database.
|
|
|
|
Generate: Produces mock brewery metadata for a sample of cities (mockup for future LLM integration).
|
|
|
|
Architecture
|
|
|
|
Data Sources and Formats
|
|
|
|
Hierarchical structure: countries array → states per country → cities per state.
|
|
|
|
Fields: id (integer), name (string), iso2 / iso3 (codes), latitude / longitude.
|
|
|
|
Sourced from: dr5hn/countries-states-cities-database on GitHub.
|
|
|
|
Output: Structured SQLite in-memory database + console logs via spdlog.
|
|
|
|
Concurrency Architecture
|
|
|
|
The pipeline splits work across parsing and writing phases:
|
|
|
|
Main Thread:
|
|
parse_sax() -> Insert countries (direct)
|
|
-> Insert states (direct)
|
|
-> Push CityRecord to WorkQueue
|
|
|
|
Worker Threads (implicit; pthread pool via sqlite3):
|
|
Pop CityRecord from WorkQueue
|
|
-> InsertCity(db) with mutex protection
|
|
|
|
Key synchronization primitives:
|
|
|
|
WorkQueue<T>: Bounded (default 1024 items) concurrent queue with blocking push/pop, guarded by mutex + condition variables.
|
|
|
|
SqliteDatabase::dbMutex: Serializes all SQLite operations to avoid SQLITE_BUSY and ensure write safety.
|
|
|
|
Backpressure: When the WorkQueue fills (≥1024 city records pending), the parser thread blocks until workers drain items.
|
|
|
|
Component Responsibilities
|
|
|
|
Component
|
|
|
|
Purpose
|
|
|
|
Thread Safety
|
|
|
|
DataDownloader
|
|
|
|
GitHub fetch with curl; optional filesystem cache; handles retries and ETags.
|
|
|
|
Blocking I/O; safe for single-threaded startup.
|
|
|
|
StreamingJsonParser
|
|
|
|
Subclasses boost::json::basic_parser; emits country/state/city via callbacks; tracking parse depth.
|
|
|
|
Single-threaded parse phase; thread-safe callbacks.
|
|
|
|
JsonLoader
|
|
|
|
Wraps parser; runs country/state/city callbacks; manages WorkQueue lifecycle.
|
|
|
|
Produces to WorkQueue; consumes from callbacks.
|
|
|
|
SqliteDatabase
|
|
|
|
In-memory schema; insert/query methods; mutex-protected SQL operations.
|
|
|
|
Mutex-guarded; thread-safe concurrent inserts.
|
|
|
|
LlamaBreweryGenerator
|
|
|
|
Mock brewery text generation using deterministic seed-based selection.
|
|
|
|
Stateless; thread-safe method calls.
|
|
|
|
Database Schema
|
|
|
|
SQLite in-memory database with three core tables:
|
|
|
|
Countries
|
|
|
|
CREATE TABLE countries (
|
|
id INTEGER PRIMARY KEY,
|
|
name TEXT NOT NULL,
|
|
iso2 TEXT,
|
|
iso3 TEXT
|
|
);
|
|
CREATE INDEX idx_countries_iso2 ON countries(iso2);
|
|
|
|
States
|
|
|
|
CREATE TABLE states (
|
|
id INTEGER PRIMARY KEY,
|
|
country_id INTEGER NOT NULL,
|
|
name TEXT NOT NULL,
|
|
iso2 TEXT,
|
|
FOREIGN KEY (country_id) REFERENCES countries(id)
|
|
);
|
|
CREATE INDEX idx_states_country ON states(country_id);
|
|
|
|
Cities
|
|
|
|
CREATE TABLE cities (
|
|
id INTEGER PRIMARY KEY,
|
|
state_id INTEGER NOT NULL,
|
|
country_id INTEGER NOT NULL,
|
|
name TEXT NOT NULL,
|
|
latitude REAL,
|
|
longitude REAL,
|
|
FOREIGN KEY (state_id) REFERENCES states(id),
|
|
FOREIGN KEY (country_id) REFERENCES countries(id)
|
|
);
|
|
CREATE INDEX idx_cities_state ON cities(state_id);
|
|
CREATE INDEX idx_cities_country ON cities(country_id);
|
|
|
|
Configuration and Extensibility
|
|
|
|
Command-Line Arguments
|
|
|
|
Boost.Program_options provides named CLI arguments:
|
|
|
|
./biergarten-pipeline [options]
|
|
|
|
Arg
|
|
|
|
Default
|
|
|
|
Purpose
|
|
|
|
--model, -m
|
|
|
|
""
|
|
|
|
Path to LLM model (mock implementation used if left blank).
|
|
|
|
--cache-dir, -c
|
|
|
|
/tmp
|
|
|
|
Directory for cached JSON DB.
|
|
|
|
--commit
|
|
|
|
c5eb7772
|
|
|
|
Git commit hash for consistency (stable 2026-03-28 snapshot).
|
|
|
|
--help, -h
|
|
|
|
-
|
|
|
|
Show help menu.
|
|
|
|
Examples:
|
|
|
|
./biergarten-pipeline
|
|
./biergarten-pipeline --model ./models/llama.gguf --cache-dir /var/cache
|
|
./biergarten-pipeline -c /tmp --commit v1.2.3
|
|
|
|
Building and Running
|
|
|
|
Prerequisites
|
|
|
|
C++23 compiler (g++, clang, MSVC).
|
|
|
|
CMake 3.20+.
|
|
|
|
curl (for HTTP downloads).
|
|
|
|
sqlite3.
|
|
|
|
Boost 1.75+ (requires Boost.JSON and Boost.Program_options).
|
|
|
|
spdlog (fetched via CMake FetchContent).
|
|
|
|
Build
|
|
|
|
mkdir -p build
|
|
cd build
|
|
cmake ..
|
|
cmake --build . --target biergarten-pipeline -- -j
|
|
|
|
Run
|
|
|
|
./biergarten-pipeline
|
|
|
|
Output: Logs to console; caches JSON in /tmp/countries+states+cities.json.
|
|
|
|
Code Style and Static Analysis
|
|
|
|
This project is configured to use:
|
|
|
|
- clang-format with the Google C++ style guide (via .clang-format)
|
|
- clang-tidy checks focused on Google, modernize, performance, and bug-prone rules (via .clang-tidy)
|
|
|
|
After configuring CMake, use:
|
|
|
|
cmake --build . --target format
|
|
|
|
to apply formatting, and:
|
|
|
|
cmake --build . --target format-check
|
|
|
|
to validate formatting without modifying files.
|
|
|
|
clang-tidy runs automatically on the biergarten-pipeline target when available. You can disable it at configure time:
|
|
|
|
cmake -DENABLE_CLANG_TIDY=OFF ..
|
|
|
|
You can also disable format helper targets:
|
|
|
|
cmake -DENABLE_CLANG_FORMAT_TARGETS=OFF ..
|