Files
the-biergarten-app/pipeline

Biergarten Pipeline

Overview

The pipeline orchestrates five key stages:

Download: Fetches countries+states+cities.json from a pinned GitHub commit with optional local caching.

Parse: Streams JSON using Boost.JSON's basic_parser to extract country/state/city records without loading the entire file into memory.

Buffer: Routes city records through a bounded concurrent queue to decouple parsing from writes.

Store: Inserts records with concurrent thread safety using an in-memory SQLite database.

Generate: Produces mock brewery metadata for a sample of cities (mockup for future LLM integration).

Architecture

Data Sources and Formats

Hierarchical structure: countries array → states per country → cities per state.

Fields: id (integer), name (string), iso2 / iso3 (codes), latitude / longitude.

Sourced from: dr5hn/countries-states-cities-database on GitHub.

Output: Structured SQLite in-memory database + console logs via spdlog.

Concurrency Architecture

The pipeline splits work across parsing and writing phases:

Main Thread: parse_sax() -> Insert countries (direct) -> Insert states (direct) -> Push CityRecord to WorkQueue

Worker Threads (implicit; pthread pool via sqlite3): Pop CityRecord from WorkQueue -> InsertCity(db) with mutex protection

Key synchronization primitives:

WorkQueue: Bounded (default 1024 items) concurrent queue with blocking push/pop, guarded by mutex + condition variables.

SqliteDatabase::dbMutex: Serializes all SQLite operations to avoid SQLITE_BUSY and ensure write safety.

Backpressure: When the WorkQueue fills (≥1024 city records pending), the parser thread blocks until workers drain items.

Component Responsibilities

Component

Purpose

Thread Safety

DataDownloader

GitHub fetch with curl; optional filesystem cache; handles retries and ETags.

Blocking I/O; safe for single-threaded startup.

StreamingJsonParser

Subclasses boost::json::basic_parser; emits country/state/city via callbacks; tracking parse depth.

Single-threaded parse phase; thread-safe callbacks.

JsonLoader

Wraps parser; runs country/state/city callbacks; manages WorkQueue lifecycle.

Produces to WorkQueue; consumes from callbacks.

SqliteDatabase

In-memory schema; insert/query methods; mutex-protected SQL operations.

Mutex-guarded; thread-safe concurrent inserts.

LlamaBreweryGenerator

Mock brewery text generation using deterministic seed-based selection.

Stateless; thread-safe method calls.

Database Schema

SQLite in-memory database with three core tables:

Countries

CREATE TABLE countries ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, iso2 TEXT, iso3 TEXT ); CREATE INDEX idx_countries_iso2 ON countries(iso2);

States

CREATE TABLE states ( id INTEGER PRIMARY KEY, country_id INTEGER NOT NULL, name TEXT NOT NULL, iso2 TEXT, FOREIGN KEY (country_id) REFERENCES countries(id) ); CREATE INDEX idx_states_country ON states(country_id);

Cities

CREATE TABLE cities ( id INTEGER PRIMARY KEY, state_id INTEGER NOT NULL, country_id INTEGER NOT NULL, name TEXT NOT NULL, latitude REAL, longitude REAL, FOREIGN KEY (state_id) REFERENCES states(id), FOREIGN KEY (country_id) REFERENCES countries(id) ); CREATE INDEX idx_cities_state ON cities(state_id); CREATE INDEX idx_cities_country ON cities(country_id);

Configuration and Extensibility

Command-Line Arguments

Boost.Program_options provides named CLI arguments:

./biergarten-pipeline [options]

Arg

Default

Purpose

--model, -m

""

Path to LLM model (mock implementation used if left blank).

--cache-dir, -c

/tmp

Directory for cached JSON DB.

--commit

c5eb7772

Git commit hash for consistency (stable 2026-03-28 snapshot).

--help, -h

Show help menu.

Examples:

./biergarten-pipeline ./biergarten-pipeline --model ./models/llama.gguf --cache-dir /var/cache ./biergarten-pipeline -c /tmp --commit v1.2.3

Building and Running

Prerequisites

C++23 compiler (g++, clang, MSVC).

CMake 3.20+.

curl (for HTTP downloads).

sqlite3.

Boost 1.75+ (requires Boost.JSON and Boost.Program_options).

spdlog (fetched via CMake FetchContent).

Build

mkdir -p build cd build cmake .. cmake --build . --target biergarten-pipeline -- -j

Run

./biergarten-pipeline

Output: Logs to console; caches JSON in /tmp/countries+states+cities.json.

Code Style and Static Analysis

This project is configured to use:

  • clang-format with the Google C++ style guide (via .clang-format)
  • clang-tidy checks focused on Google, modernize, performance, and bug-prone rules (via .clang-tidy)

After configuring CMake, use:

cmake --build . --target format

to apply formatting, and:

cmake --build . --target format-check

to validate formatting without modifying files.

clang-tidy runs automatically on the biergarten-pipeline target when available. You can disable it at configure time:

cmake -DENABLE_CLANG_TIDY=OFF ..

You can also disable format helper targets:

cmake -DENABLE_CLANG_FORMAT_TARGETS=OFF ..