Begin work on Runpod docker config

2026-07-16 17:47:22 +00:00 · 2026-05-03 23:32:08 -04:00
parent 26635ace84
commit 97b2ffeae4
16 changed files with 402 additions and 90 deletions
--- a/docs/pipeline/README.md
+++ b/docs/pipeline/README.md
@@ -18,6 +18,7 @@ descriptions via a local GGUF model or a deterministic mock.
  - [Build](#build)
  - [Model](#model)
  - [Run](#run)
 - [Docker / RunPod](#docker--runpod)
 - [Architecture](#architecture)
  - [Pipeline Stages](#pipeline-stages)
  - [Key Components](#key-components)
@@ -51,7 +52,7 @@ step.
 ### Build
-Requirements: C++20 compiler, CMake 3.24+, libcurl, Boost (JSON and
+Requirements: C++20 compiler, CMake 3.31+, OpenSSL, Boost (JSON and
 ProgramOptions). SQLite is fetched from the upstream amalgamation, so no system
 SQLite package is required.
@@ -60,6 +61,16 @@ cmake -S . -B build
 cmake --build build
 ```
 CMake automatically detects whether a compatible llama.cpp installation is
 present on the system (`libllama`, `libggml`, `libggml-base`, and `llama.h`
 visible on the default search paths). If found, it links against those
 libraries and skips the FetchContent build. If not found, it fetches and builds
 llama.cpp from source at tag `b9012`. No additional flags are required in
 either case.
 Metal is enabled automatically on Apple Silicon. CUDA or HIP/ROCm is detected
 automatically on Linux when the relevant toolkit is present.
 ### Model
 > Skip this step if you only need `--mocked`.
@@ -74,33 +85,124 @@ curl -L \
 ### Run
 Run from `build/` so the copied `locations.json` and `prompts/` are available.
-Each run also writes a fresh dated SQLite file such as
+Each run writes a fresh dated SQLite file such as
 `biergarten_seed_2026-04-19T15-30-45.123456Z.sqlite` into the working directory.
 ```bash
 ./biergarten-pipeline --mocked
-./biergarten-pipeline --model models/google_gemma-4-E4B-it-Q6_K.gguf --temperature 1.0 --top-p 0.95 --top-k 64 --n-ctx 8192 --seed -1
+
 ./biergarten-pipeline \
  --model ../models/google_gemma-4-E4B-it-Q6_K.gguf \
  --prompt-dir prompts \
  --temperature 1.0 --top-p 0.95 --top-k 64 --n-ctx 8192 --seed -1
 ```
 #### CLI Flags
-| Flag            | Purpose                                                 |
+| Flag            | Purpose                                                                                              |
-| --------------- | ------------------------------------------------------- |
+| --------------- | ---------------------------------------------------------------------------------------------------- |
-| `--mocked`      | Deterministic mock generator, no model required.        |
+| `--mocked`      | Deterministic mock generator, no model required.                                                     |
-| `--model, -m`   | Path to a GGUF file. Required unless `--mocked` is set. |
+| `--model, -m`   | Path to a GGUF file. Required unless `--mocked` is set.                                              |
-| `--temperature` | Sampling temperature. Default: `1.0`.                   |
+| `--prompt-dir`  | Directory containing prompt files (e.g. `BREWERY_GENERATION.md`). Required unless `--mocked` is set. |
-| `--top-p`       | Nucleus sampling. Default: `0.95`.                      |
+| `--output, -o`  | Directory for generated SQLite artifacts. Default: `output`.                                         |
-| `--top-k`       | Top-k sampling. Default: `64`.                          |
+| `--log-path`    | Path for application logs. Default: `pipeline.log`.                                                  |
-| `--n-ctx`       | Context window size. Default: `8192`.                   |
+| `--temperature` | Sampling temperature. Default: `1.0`.                                                                |
-| `--seed`        | Random seed. Default: `-1` (random at runtime).         |
+| `--top-p`       | Nucleus sampling. Default: `0.95`.                                                                   |
-| `--help, -h`    | Print usage and exit.                                   |
+| `--top-k`       | Top-k sampling. Default: `64`.                                                                       |
 | `--n-ctx`       | Context window size. Default: `8192`.                                                                |
 | `--seed`        | Random seed. Default: `-1` (random at runtime).                                                      |
 | `--help, -h`    | Print usage and exit.                                                                                |
 `--mocked` and `--model` are mutually exclusive. Omitting both exits with an
 error before the pipeline starts. Sampling flags are ignored when `--mocked` is
 set.
 The post-build step copies `prompts/` into `build/prompts/`. Rebuild after
-editing `prompts/system.md`.
+editing any prompt file.
 ---
 ## Docker / RunPod
 The `tooling/pipeline/runpod/` directory contains a GPU-ready container
 configuration for running the pipeline on RunPod or any Docker host with an
 NVIDIA GPU.
 ### How it works
 The container uses a two-stage build. The first stage pulls prebuilt
 `libllama`, `libggml`, and backend plugin libraries (including `libggml-cuda.so`
 and the CPU variant plugins) from `ghcr.io/ggml-org/llama.cpp:full-cuda`. The
 second stage copies those libraries into `/usr/local/lib` and runs `ldconfig` so
 the dynamic linker and `dlopen` calls from `ggml_backend_load_all()` can resolve
 the CUDA backend plugin at runtime. llama.cpp headers are cloned at the matching
 tag and installed into `/usr/local/include`. CMake auto-detects both and skips
 the FetchContent source build entirely, keeping image build times short.
 `GGML_BACKEND_PATH` is set to `/usr/local/lib` so llama.cpp knows where to scan
 for backend plugins.
 ### Build the image
 Run from the `tooling/pipeline/` directory (the CMake project root), not from
 inside `runpod/`, so the `COPY . .` step picks up the full project context.
 ```bash
 docker build -t biergarten-pipeline:latest -f runpod/Dockerfile .
 ```
 To monitor the full build output and confirm CMake selects the system llama.cpp:
 ```bash
 docker build \
  --progress=plain \
  --no-cache \
  -t biergarten-pipeline:latest \
  -f runpod/Dockerfile \
  . 2>&1 | tee build.log
 ```
 Look for `[biergarten] Found system llama.cpp — skipping FetchContent` in the
 output to confirm the fast path was taken.
 ### Run in mocked mode
 No model or GPU required. Useful for validating the pipeline logic and SQLite
 export path.
 ```bash
 docker run --rm \
  -e BIERGARTEN_MODE=mocked \
  -v "$PWD/output:/workspace/output" \
  -v "$PWD/logs:/workspace/logs" \
  biergarten-pipeline:latest
 ```
 ### Run in live mode
 Mount your GGUF model before starting. The container validates the model path
 before launching the binary.
 ```bash
 docker run --rm \
  --runtime=nvidia \
  -e BIERGARTEN_MODE=live \
  -e GGML_BACKEND_PATH="/usr/local/lib/libggml-cuda.so" \
  -v "$PWD/models:/workspace/models" \
  -v "$PWD/output:/workspace/output" \
  -v "$PWD/logs:/workspace/logs" \
  biergarten-pipeline:latest
 ```
 The model must be present at `./models/google_gemma-4-E4B-it-Q6_K.gguf` on the
 host. See [Model](#model) above for the download command.
 ### RunPod deployment
 Use a GPU pod template. Mount persistent storage for `/workspace/models`,
 `/workspace/output`, and `/workspace/logs`. Set `BIERGARTEN_MODE=live` in the
 template environment. See `tooling/pipeline/runpod/pod-template.yaml` for a
 starter template.
 ---
@@ -197,16 +299,18 @@ code, latitude, and longitude for each entry.
 ## Tech Stack
 - C++20
- CMake 3.24+
+- CMake 3.31+
 - Boost.JSON, Boost.ProgramOptions, Boost.DI
 - spdlog
- libcurl
+- cpp-httplib (with OpenSSL)
 - SQLite amalgamation fetched and compiled via CMake FetchContent
- llama.cpp
+- llama.cpp (auto-detected from system install or fetched via FetchContent)
 - Docker with NVIDIA CUDA 12.6 base image for GPU container builds
 - RunPod for cloud GPU inference
-The build fetches Boost.DI, spdlog, llama.cpp, and SQLite via CMake. Metal is
+The build fetches Boost.DI, spdlog, and SQLite via CMake. llama.cpp is fetched
-enabled on Apple Silicon; CUDA or HIP/ROCm is detected on Linux when the toolkit
+only when a system installation is not detected. Metal is enabled on Apple
-is present.
+Silicon; CUDA or HIP/ROCm is detected on Linux when the toolkit is present.
 > **Code Style:** Modern C++20 throughout — RAII for ownership,
 > `std::unique_ptr` for injected dependencies, `std::optional` for parse
@@ -218,7 +322,7 @@ is present.
 ## Tested Hardware
-### ARM macOS - M1 Pro
+### ARM macOS — M1 Pro
 |           |                                   |
 | --------- | --------------------------------- |
@@ -229,7 +333,7 @@ is present.
 | Model     | Gemma 4 E4B                       |
 | Inference | llama.cpp with Metal              |
-### x86_64 Linux - NVIDIA RTX 2000
+### x86_64 Linux — NVIDIA RTX 2000
 |           |                                |
 | --------- | ------------------------------ |
@@ -240,6 +344,15 @@ is present.
 | Model     | Gemma 4 E4B                    |
 | Inference | llama.cpp with CUDA 12.x       |
 ### x86_64 Linux — Docker / RunPod (NVIDIA CUDA)
 |           |                                             |
 | --------- | ------------------------------------------- |
 | Host      | RunPod GPU pod                              |
 | Base      | nvidia/cuda:12.6.3-devel-ubuntu24.04        |
 | Model     | Gemma 4 E4B Q6_K                            |
 | Inference | llama.cpp prebuilt CUDA backends via dlopen |
 ---
 ## Fixture Strategy
@@ -260,8 +373,9 @@ is present.
 | `includes/`                  | Public headers and shared models.                  |
 | `src/`                       | Implementation files.                              |
 | `locations.json`             | Curated city input copied into the build tree.     |
-| `prompts/`                   | System prompt used by the model-backed path.       |
+| `prompts/`                   | System prompts used by the model-backed path.      |
 | `diagrams/`                  | Architecture and pipeline diagrams.                |
 | `tooling/pipeline/runpod/`   | Dockerfile, launcher, and RunPod pod template.     |
 | `ETHICS-AND-KNOWN-ISSUES.md` | Ethics, bias, hallucination analysis, mitigations. |
 ---
@@ -276,6 +390,7 @@ is present.
 - `src/data_generation/llama/` — local inference, prompt loading, output
  validation.
 - `src/data_generation/mock/` — deterministic fallback.
 - `tooling/pipeline/runpod/` — container build and runtime launcher.
 ---
--- a/docs/pipeline/diagrams/current/activity.puml
+++ b/docs/pipeline/diagrams/current/activity.puml
@@ -29,7 +29,7 @@ if (Are arguments valid?) then (no)
 else (yes)
 endif
-:Init CurlGlobalState & LlamaBackendState;
+:Init OpenSSL global state & LlamaBackendState;
 :di::make_injector(...);
 :injector.create<std::unique_ptr<BiergartenDataGenerator>>();
 :BiergartenDataGenerator::Run();
--- a/docs/pipeline/diagrams/current/class.puml
+++ b/docs/pipeline/diagrams/current/class.puml
@@ -52,7 +52,7 @@ interface WebClient <<interface>> {
  + UrlEncode(value : const std::string&) : std::string
 }
-class CURLWebClient {
+class HttpWebClient {
  + Get(url : const std::string&) : std::string
  + UrlEncode(value : const std::string&) : std::string
 }
@@ -130,7 +130,7 @@ BiergartenDataGenerator *-- IExportService : owns
 IEnrichmentService <|.. WikipediaService : implements
 WikipediaService *-- WebClient : owns
-WebClient <|.. CURLWebClient : implements
+WebClient <|.. HttpWebClient : implements
 DataGenerator <|.. MockGenerator : implements
 DataGenerator <|.. LlamaGenerator : implements
--- a/docs/pipeline/diagrams/planned/activity.puml
+++ b/docs/pipeline/diagrams/planned/activity.puml
@@ -13,7 +13,7 @@ if (Invalid args?) then (yes)
  stop
 else (no)
 endif
-:Init CurlGlobalState & LlamaBackendState;
+:Init OpenSSL global state & LlamaBackendState;
 :Build DI injector;
 :Initialize SqliteExportService;
--- a/docs/pipeline/diagrams/planned/class.puml
+++ b/docs/pipeline/diagrams/planned/class.puml
@@ -356,7 +356,7 @@ package "Infrastructure: Enrichment" {
    + UrlEncode(value : const std::string&) : std::string
  }
-  class CURLWebClient {
+  class HttpWebClient {
    + Get(url : const std::string&) : std::string
    + UrlEncode(value : const std::string&) : std::string
  }
@@ -520,7 +520,7 @@ CheckinDistributionStrategy    <|.. RandomCheckinStrategy
 FollowGenerationStrategy       <|.. RandomFollowStrategy
 FollowGenerationStrategy       <|.. ActivityWeightedFollowStrategy
 EnrichmentService              <|.. WikipediaService
-WebClient                      <|.. CURLWebClient
+WebClient                      <|.. HttpWebClient
 DataGenerator                  <|.. MockGenerator
 DataGenerator                  <|.. LlamaGenerator
 PromptFormatter                <|.. Gemma4JinjaPromptFormatter
--- a/tooling/pipeline/.dockerignore
+++ b/tooling/pipeline/.dockerignore
@@ -0,0 +1,9 @@
 build/
 cmake-build-debug/
 .git/
 .idea/
 **/*.sqlite
 **/*.log
 **/*.sqlite3
 **/*.db
--- a/tooling/pipeline/CMakeLists.txt
+++ b/tooling/pipeline/CMakeLists.txt
@@ -1,41 +1,45 @@
 cmake_minimum_required(VERSION 3.31)
 project(biergarten-pipeline)
 # Set policy to allow FetchContent_Populate for header-only libraries
 # that have outdated CMakeLists.txt files
 cmake_policy(SET CMP0169 OLD)
 # 1. Build Options
 option(BIERGARTEN_MOCK_ONLY "Build with mock data generators only — skips llama.cpp" OFF)
-if (BIERGARTEN_MOCK_ONLY)
+if(BIERGARTEN_MOCK_ONLY)
-    message(STATUS "[biergarten] MOCK_ONLY build — llama.cpp will not be compiled.")
+        message(STATUS "[biergarten] MOCK_ONLY build — llama.cpp will not be compiled.")
-endif ()
+endif()
 # 2. Platform & GPU Detection
-if (NOT UNIX)
+if(NOT UNIX)
-    message(FATAL_ERROR "[biergarten] Windows is not supported. Please use Linux (Fedora 43) or macOS (M1 Pro).")
+        message(FATAL_ERROR "[biergarten] Windows is not supported. Please use Linux (Fedora 43) or macOS (M1 Pro).")
-endif ()
+endif()
-if (APPLE)
+if(APPLE)
-    if (CMAKE_SYSTEM_PROCESSOR MATCHES "arm64")
+        if(CMAKE_SYSTEM_PROCESSOR MATCHES "arm64")
-        message(STATUS "[biergarten] Apple Silicon detected — enabling Metal acceleration.")
+                message(STATUS "[biergarten] Apple Silicon detected — enabling Metal acceleration.")
-        set(GGML_METAL ON CACHE BOOL "Enable Metal for Apple Silicon" FORCE)
+                set(GGML_METAL ON CACHE BOOL "Enable Metal for Apple Silicon" FORCE)
-    else ()
+        else()
-        message(STATUS "[biergarten] Intel Mac detected — using CPU / Accelerate framework.")
+                message(STATUS "[biergarten] Intel Mac detected — using CPU / Accelerate framework.")
-        set(GGML_METAL OFF CACHE BOOL "Disable Metal for Intel Macs" FORCE)
+                set(GGML_METAL OFF CACHE BOOL "Disable Metal for Intel Macs" FORCE)
-    endif ()
+        endif()
-else ()
+else()
-    find_package(CUDAToolkit QUIET)
+        find_package(CUDAToolkit QUIET)
-    find_package(hip CONFIG QUIET)
+        find_package(hip CONFIG QUIET)
-    if (CUDAToolkit_FOUND)
+        if(CUDAToolkit_FOUND)
-        message(STATUS "[biergarten] NVIDIA GPU detected — enabling CUDA acceleration.")
+                message(STATUS "[biergarten] NVIDIA GPU detected — enabling CUDA acceleration.")
-        set(GGML_CUDA ON CACHE BOOL "Enable CUDA for NVIDIA GPUs" FORCE)
+                set(GGML_CUDA ON CACHE BOOL "Enable CUDA for NVIDIA GPUs" FORCE)
-        set(CMAKE_CUDA_ARCHITECTURES native)
+                set(CMAKE_CUDA_ARCHITECTURES native)
-    elseif (hip_FOUND OR DEFINED ENV{ROCM_PATH} OR EXISTS "/opt/rocm")
+        elseif(hip_FOUND OR DEFINED ENV{ROCM_PATH} OR EXISTS "/opt/rocm")
-        message(STATUS "[biergarten] AMD GPU detected — enabling HIP/ROCm acceleration.")
+                message(STATUS "[biergarten] AMD GPU detected — enabling HIP/ROCm acceleration.")
-        set(GGML_HIPBLAS ON CACHE BOOL "Enable HIP for AMD GPUs" FORCE)
+                set(GGML_HIPBLAS ON CACHE BOOL "Enable HIP for AMD GPUs" FORCE)
-    else ()
+        else()
-        message(STATUS "[biergarten] No NVIDIA or AMD GPU found — falling back to CPU.")
+                message(STATUS "[biergarten] No NVIDIA or AMD GPU found — falling back to CPU.")
-    endif ()
+        endif()
-endif ()
+endif()
 # 3. Project-wide Settings
 set(CMAKE_CXX_STANDARD 20)
@@ -51,16 +55,23 @@ include(FetchContent)
 find_package(Boost REQUIRED COMPONENTS json program_options)
 # Boost.DI (unofficial Boost extension, must declare separately from main Boost dependency)
 # Header-only library, so we only fetch without invoking its CMakeLists.txt
 FetchContent_Declare(
        boost-di
        GIT_REPOSITORY https://github.com/boost-ext/di.git
        GIT_TAG v1.3.0
        GIT_SHALLOW TRUE
 )
-FetchContent_MakeAvailable(boost-di)
+FetchContent_GetProperties(boost-di)
-if (TARGET Boost.DI AND NOT TARGET boost::di)
+if(NOT boost-di_POPULATED)
-    add_library(boost::di ALIAS Boost.DI)
+        FetchContent_Populate(boost-di)
-endif ()
+endif()
 add_library(boost_di INTERFACE)
 add_library(boost::di ALIAS boost_di)
 target_include_directories(boost_di INTERFACE
        $<BUILD_INTERFACE:${boost-di_SOURCE_DIR}/include>
 )
 # SQLite amalgamation
 FetchContent_Declare(
        sqlite_amalgamation
@@ -69,21 +80,38 @@ FetchContent_Declare(
        EXCLUDE_FROM_ALL
 )
 FetchContent_MakeAvailable(sqlite_amalgamation)
-if (NOT TARGET sqlite3)
+if(NOT TARGET sqlite3)
-    add_library(sqlite3 STATIC ${sqlite_amalgamation_SOURCE_DIR}/sqlite3.c)
+        add_library(sqlite3 STATIC ${sqlite_amalgamation_SOURCE_DIR}/sqlite3.c)
-    target_include_directories(sqlite3 PUBLIC ${sqlite_amalgamation_SOURCE_DIR})
+        target_include_directories(sqlite3 PUBLIC ${sqlite_amalgamation_SOURCE_DIR})
-    target_compile_definitions(sqlite3 PUBLIC SQLITE_THREADSAFE=1)
+        target_compile_definitions(sqlite3 PUBLIC SQLITE_THREADSAFE=1)
-endif ()
+endif()
 # llama.cpp — skipped for mock-only builds
-if (NOT BIERGARTEN_MOCK_ONLY)
+if(NOT BIERGARTEN_MOCK_ONLY)
-    FetchContent_Declare(
+        find_library(LLAMA_LIB NAMES llama)
-            llama-cpp
+        find_library(GGML_LIB NAMES ggml)
-            GIT_REPOSITORY https://github.com/ggml-org/llama.cpp.git
+        find_library(GGML_BASE_LIB NAMES ggml-base)
-            GIT_TAG b8742
+        find_path(LLAMA_INC_DIR NAMES llama.h PATH_SUFFIXES include)
-    )
+
-    FetchContent_MakeAvailable(llama-cpp)
+        if(LLAMA_LIB AND GGML_LIB AND GGML_BASE_LIB AND LLAMA_INC_DIR)
-endif ()
+                message(STATUS "[biergarten] Found system llama.cpp — skipping FetchContent")
                add_library(llama SHARED IMPORTED)
                set_target_properties(llama PROPERTIES
                        IMPORTED_LOCATION "${LLAMA_LIB}"
                        INTERFACE_INCLUDE_DIRECTORIES "${LLAMA_INC_DIR}"
                        INTERFACE_LINK_LIBRARIES "${GGML_LIB};${GGML_BASE_LIB}"
                )
        else()
                message(STATUS "[biergarten] System llama.cpp not found — fetching via FetchContent")
                FetchContent_Declare(
                        llama-cpp
                        GIT_REPOSITORY https://github.com/ggml-org/llama.cpp.git
                        GIT_TAG b9012
                )
                FetchContent_MakeAvailable(llama-cpp)
        endif()
 endif()
 # spdlog
 FetchContent_Declare(
@@ -153,16 +181,16 @@ target_sources(${PROJECT_NAME} PRIVATE
 )
 # --- data_generation: llama (skipped for mock-only builds) ---
-if (NOT BIERGARTEN_MOCK_ONLY)
+if(NOT BIERGARTEN_MOCK_ONLY)
-    target_sources(${PROJECT_NAME} PRIVATE
+        target_sources(${PROJECT_NAME} PRIVATE
-            src/data_generation/llama/load.cc
+                src/data_generation/llama/load.cc
-            src/data_generation/llama/helpers.cc
+                src/data_generation/llama/helpers.cc
-            src/data_generation/llama/generate_brewery.cc
+                src/data_generation/llama/generate_brewery.cc
-            src/data_generation/llama/infer.cc
+                src/data_generation/llama/infer.cc
-            src/data_generation/llama/llama_generator.cc
+                src/data_generation/llama/llama_generator.cc
-            src/data_generation/llama/generate_user.cc
+                src/data_generation/llama/generate_user.cc
-    )
+        )
-endif ()
+endif()
 # --- services: wikipedia ---
 target_sources(${PROJECT_NAME} PRIVATE
@@ -189,8 +217,6 @@ target_sources(${PROJECT_NAME} PRIVATE
 # 6. Include Directories, Link Libraries & Compile Definitions
 target_include_directories(${PROJECT_NAME} PRIVATE
        includes
        $<$<NOT:$<BOOL:${BIERGARTEN_MOCK_ONLY}>>:${llama-cpp_SOURCE_DIR}/include>
        $<$<NOT:$<BOOL:${BIERGARTEN_MOCK_ONLY}>>:${llama-cpp_SOURCE_DIR}/common>
 )
 target_link_libraries(${PROJECT_NAME} PRIVATE
--- a/tooling/pipeline/includes/data_generation/llama_generator.h
+++ b/tooling/pipeline/includes/data_generation/llama_generator.h
@@ -14,10 +14,10 @@
 #include <string>
 #include <string_view>
 #include "../services/prompting/prompt_directory.h"
 #include "data_generation/data_generator.h"
 #include "data_generation/prompt_formatting/prompt_formatter.h"
 #include "data_model/models.h"
 #include "../services/prompting/prompt_directory.h"
 struct llama_model;
 struct llama_context;
@@ -129,6 +129,7 @@ class LlamaGenerator final : public DataGenerator {
  uint32_t sampling_top_k_ = kDefaultSamplingTopK;
  std::mt19937 rng_;
  uint32_t n_ctx_ = kDefaultContextSize;
  int n_gpu_layers_ = 0;
  std::unique_ptr<IPromptFormatter> prompt_formatter_;
  std::unique_ptr<IPromptDirectory> prompt_directory_;
 };
--- a/tooling/pipeline/includes/data_model/models.h
+++ b/tooling/pipeline/includes/data_model/models.h
@@ -3,7 +3,8 @@
 /**
 * @file data_model/models.h
- * @brief Core data models: locations, application configuration, and generation inputs.
+ * @brief Core data models: locations, application configuration, and generation
 * inputs.
 */
 #include <boost/program_options.hpp>
@@ -94,6 +95,9 @@ struct GeneratorOptions {
  /// @brief Use mocked generator instead of actual LLM inference.
  bool use_mocked = false;
  /// @brief Number of layers to offload to GPU.
  int n_gpu_layers = 0;
  /// @brief Specific sampling parameters for this generator.
  /// If nullopt, the application should use global defaults.
  std::optional<SamplingOptions> sampling;
--- a/tooling/pipeline/runpod/Dockerfile
+++ b/tooling/pipeline/runpod/Dockerfile
@@ -0,0 +1,67 @@
 # Phase 1: Pull prebuilt binaries
 FROM ghcr.io/ggml-org/llama.cpp:full-cuda AS llama-bin
 # Phase 2: Building environment
 FROM nvidia/cuda:12.6.3-devel-ubuntu24.04
 ENV DEBIAN_FRONTEND=noninteractive \
  CMAKE_GENERATOR=Ninja \
  APP_ROOT=/workspace/app \
  BUILD_DIR=/workspace/app/build
 RUN apt-get update && apt-get install -y --no-install-recommends \
  build-essential \
  ca-certificates \
  curl \
  git \
  libboost-json-dev \
  libboost-program-options-dev \
  libssl-dev \
  ninja-build \
  pkg-config \
  zlib1g-dev \
  && rm -rf /var/lib/apt/lists/*
 # Install modern CMake via curl (Ubuntu 24.04 'apt' version can be laggy)
 RUN curl -L https://github.com/Kitware/CMake/releases/download/v3.31.0/cmake-3.31.0-linux-x86_64.sh -o cmake.sh && \
  sh cmake.sh --skip-license --prefix=/usr/local && rm cmake.sh
 # Copy backends to /usr/local/lib and register with ldconfig so the
 # runtime linker can resolve libllama.so, libggml.so, libggml-base.so etc.
 COPY --from=llama-bin /app/lib*.so* /usr/local/lib/
 RUN ldconfig
 # Headers for C++ Build
 RUN curl -L https://github.com/ggml-org/llama.cpp/archive/refs/tags/b9012.tar.gz -o /tmp/llama-src.tar.gz && \
  tar -xzf /tmp/llama-src.tar.gz -C /tmp && \
  cp -r /tmp/llama.cpp-b9012/include/* /usr/local/include/ && \
  cp -r /tmp/llama.cpp-b9012/ggml/include/* /usr/local/include/ && \
  rm -rf /tmp/llama-src.tar.gz /tmp/llama.cpp-b9012
 ENV LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}"
 WORKDIR /workspace/app
 COPY . .
 # Build the C++ pipeline
 RUN cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release && \
  cmake --build build -j$(nproc)
 # Co-locate GGML backend plugins with the executable.
 # ggml_backend_load_all() searches the executable directory first when
 # GGML_BACKEND_DIR is not set. Copying the ggml-*.so plugin files here
 # ensures the loader finds them without any environment variable.
 # libllama.so, libggml.so, and libggml-base.so are NOT copied here —
 # those are proper shared libraries resolved via ldconfig/LD_LIBRARY_PATH.
 RUN cp /usr/local/lib/libggml-cuda.so  /workspace/app/build/ 2>/dev/null || true && \
  cp /usr/local/lib/libggml-cpu*.so   /workspace/app/build/ 2>/dev/null || true && \
  cp /usr/local/lib/libggml-blas*.so  /workspace/app/build/ 2>/dev/null || true && \
  cp /usr/local/lib/libggml-rpc*.so   /workspace/app/build/ 2>/dev/null || true
 # Setup Start Script
 COPY ./runpod/start.sh /usr/local/bin/biergarten-start
 RUN chmod +x /usr/local/bin/biergarten-start
 WORKDIR /workspace/app/build
 ENTRYPOINT ["/usr/local/bin/biergarten-start"]
--- a/tooling/pipeline/runpod/README.md
+++ b/tooling/pipeline/runpod/README.md
@@ -0,0 +1,8 @@
 ```bash
 touch runpod/start.sh
 docker build \
 --progress=plain \
 -t biergarten-pipeline:latest \
 -f runpod/Dockerfile \
 . 2>&1 | tee build.log
 ```
--- a/tooling/pipeline/runpod/pod-template.yaml
+++ b/tooling/pipeline/runpod/pod-template.yaml
@@ -0,0 +1,22 @@
 name: biergarten-pipeline-live
 imageName: biergarten-pipeline:latest
 category: NVIDIA
 containerDiskInGb: 50
 volumeInGb: 50
 volumeMountPath: /workspace
 dockerEntrypoint:
  - /usr/local/bin/biergarten-start
 dockerStartCmd: []
 isPublic: false
 isServerless: false
 env:
  BIERGARTEN_MODE: live
  BIERGARTEN_MODEL_PATH: /workspace/models/google_gemma-4-E4B-it-Q6_K.gguf
  BIERGARTEN_PROMPT_DIR: /workspace/app/build/prompts
  BIERGARTEN_OUTPUT_DIR: /workspace/output
  BIERGARTEN_LOG_PATH: /workspace/logs/pipeline.log
  BIERGARTEN_TEMPERATURE: "1.0"
  BIERGARTEN_TOP_P: "0.95"
  BIERGARTEN_TOP_K: "64"
  BIERGARTEN_N_CTX: "8192"
  BIERGARTEN_SEED: "-1"
--- a/tooling/pipeline/runpod/start.sh
+++ b/tooling/pipeline/runpod/start.sh
@@ -0,0 +1,49 @@
 #!/bin/bash
 set -e
 # Configuration / Defaults
 MODEL_PATH="${BIERGARTEN_MODEL_PATH:-/workspace/models/google_gemma-4-E4B-it-Q6_K.gguf}"
 OUTPUT_DIR="${BIERGARTEN_OUTPUT_DIR:-/workspace/output}"
 LOG_PATH="${BIERGARTEN_LOG_PATH:-/workspace/logs/pipeline.log}"
 EXECUTABLE="/workspace/app/build/biergarten-pipeline"
 PROMPT_DIR="/workspace/app/build/prompts"
 echo "--- Starting Biergarten Pipeline Environment Check ---"
 # 1. Ensure volume mount directories exist
 mkdir -p "$OUTPUT_DIR"
 mkdir -p "$(dirname "$LOG_PATH")"
 # 2. Check for model file
 if [ ! -f "$MODEL_PATH" ]; then
    echo "ERROR: Model not found at $MODEL_PATH"
    echo "Current /workspace/models contents:"
    ls -lh /workspace/models 2>/dev/null || echo "(directory does not exist)"
    exit 1
 fi
 # 3. Build the command arguments
 ARGS=(
    "--model"      "$MODEL_PATH"
    "--prompt-dir" "$PROMPT_DIR"
    "--output"     "$OUTPUT_DIR"
    "--log-path"   "$LOG_PATH"
 )
 # Optional hyperparameters
 [[ -n "$BIERGARTEN_TEMPERATURE" ]] && ARGS+=("--temperature"  "$BIERGARTEN_TEMPERATURE")
 [[ -n "$BIERGARTEN_TOP_P" ]]       && ARGS+=("--top-p"        "$BIERGARTEN_TOP_P")
 [[ -n "$BIERGARTEN_TOP_K" ]]       && ARGS+=("--top-k"        "$BIERGARTEN_TOP_K")
 [[ -n "$BIERGARTEN_N_CTX" ]]       && ARGS+=("--n-ctx"        "$BIERGARTEN_N_CTX")
 [[ -n "$BIERGARTEN_SEED" ]]        && ARGS+=("--seed"         "$BIERGARTEN_SEED")
 [[ -n "$BIERGARTEN_GL_LAYERS" ]]   && ARGS+=("--n-gpu-layers" "$BIERGARTEN_GL_LAYERS")
 # Append any extra custom args
 if [[ -n "$BIERGARTEN_EXTRA_ARGS" ]]; then
    ARGS+=($BIERGARTEN_EXTRA_ARGS)
 fi
 echo "--- Executing: $EXECUTABLE ${ARGS[*]} ---"
 # Execute the binary directly, replacing the shell process
 exec "$EXECUTABLE" "${ARGS[@]}"
--- a/tooling/pipeline/src/application_options/parse_arguments.cc
+++ b/tooling/pipeline/src/application_options/parse_arguments.cc
@@ -50,6 +50,8 @@ std::optional<ApplicationOptions> ParseArguments(const int argc, char** argv) {
    opt("prompt-dir", prog_opts::value<std::string>()->default_value(""),
        "Directory containing named prompt files (e.g. BREWERY_GENERATION.md)."
        " Required when not using --mocked.");
    opt("n-gpu-layers", prog_opts::value<int>()->default_value(0),
        "Number of layers to offload to GPU");
  };
  add_sampling_options();
@@ -85,6 +87,7 @@ std::optional<ApplicationOptions> ParseArguments(const int argc, char** argv) {
    const bool use_mocked = var_map["mocked"].as<bool>();
    const std::string model_path = var_map["model"].as<std::string>();
    const int n_gpu_layers = var_map["n-gpu-layers"].as<int>();
    // Enforce mutual exclusivity before any further configuration is applied.
    if (use_mocked && !model_path.empty()) {
@@ -110,6 +113,7 @@ std::optional<ApplicationOptions> ParseArguments(const int argc, char** argv) {
    options.generator.use_mocked = use_mocked;
    options.generator.model_path = model_path;
    options.generator.n_gpu_layers = n_gpu_layers;
    // Only populate sampling config when the user explicitly overrides at
    // least one value. Leaving it as std::nullopt lets LlamaGenerator fall
--- a/tooling/pipeline/src/data_generation/llama/llama_generator.cc
+++ b/tooling/pipeline/src/data_generation/llama/llama_generator.cc
@@ -89,6 +89,7 @@ LlamaGenerator::LlamaGenerator(
  }
  n_ctx_ = sampling.n_ctx;
  n_gpu_layers_ = options.generator.n_gpu_layers;
  this->Load(model_path);
 }
--- a/tooling/pipeline/src/data_generation/llama/load.cc
+++ b/tooling/pipeline/src/data_generation/llama/load.cc
@@ -12,6 +12,7 @@
 #include <utility>
 #include "data_generation/llama_generator.h"
 #include "ggml-backend.h"
 #include "llama.h"
 // Maximum batch size for decode operations. Capping the batch prevents
@@ -22,7 +23,12 @@ void LlamaGenerator::Load(const std::string& model_path) {
  context_.reset();
  model_.reset();
-  const llama_model_params model_params = llama_model_default_params();
+  // Specifically load dynamic ggml backends (like CUDA) that are provided
  // externally before attempting to load a model.
  ggml_backend_load_all();
  llama_model_params model_params = llama_model_default_params();
  model_params.n_gpu_layers = n_gpu_layers_;
  LlamaGenerator::ModelHandle loaded_model(
      llama_model_load_from_file(model_path.c_str(), model_params));
  if (!loaded_model) {