Add mock enrichment process

Add location count to application options and as a cli arg
Add timeout for enrichment, refactor json deserialization
2026-06-01 01:54:00 +00:00 · 2026-05-14 13:49:59 -04:00 · 2026-05-13 22:04:48 -04:00 · 2026-05-13 12:44:30 -04:00 · 2026-05-12 01:05:07 -04:00 · 2026-05-12 00:44:09 -04:00
33 changed files with 681 additions and 232 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -1 +1 @@
-archive/* linguist-vendored
+archive/** linguist-vendored
--- a/docs/pipeline/README.md
+++ b/docs/pipeline/README.md
@@ -18,6 +18,7 @@ descriptions via a local GGUF model or a deterministic mock.
  - [Build](#build)
  - [Model](#model)
  - [Run](#run)
 - [Docker / RunPod](#docker--runpod)
 - [Architecture](#architecture)
  - [Pipeline Stages](#pipeline-stages)
  - [Key Components](#key-components)
@@ -51,7 +52,7 @@ step.
 ### Build
-Requirements: C++20 compiler, CMake 3.24+, libcurl, Boost (JSON and
+Requirements: C++20 compiler, CMake 3.31+, OpenSSL, Boost (JSON and
 ProgramOptions). SQLite is fetched from the upstream amalgamation, so no system
 SQLite package is required.
@@ -60,6 +61,16 @@ cmake -S . -B build
 cmake --build build
 ```
 CMake automatically detects whether a compatible llama.cpp installation is
 present on the system (`libllama`, `libggml`, `libggml-base`, and `llama.h`
 visible on the default search paths). If found, it links against those
 libraries and skips the FetchContent build. If not found, it fetches and builds
 llama.cpp from source at tag `b9012`. No additional flags are required in
 either case.
 Metal is enabled automatically on Apple Silicon. CUDA or HIP/ROCm is detected
 automatically on Linux when the relevant toolkit is present.
 ### Model
 > Skip this step if you only need `--mocked`.
@@ -74,20 +85,27 @@ curl -L \
 ### Run
 Run from `build/` so the copied `locations.json` and `prompts/` are available.
-Each run also writes a fresh dated SQLite file such as
+Each run writes a fresh dated SQLite file such as
 `biergarten_seed_2026-04-19T15-30-45.123456Z.sqlite` into the working directory.
 ```bash
 ./biergarten-pipeline --mocked
-./biergarten-pipeline --model models/google_gemma-4-E4B-it-Q6_K.gguf --temperature 1.0 --top-p 0.95 --top-k 64 --n-ctx 8192 --seed -1
+
 ./biergarten-pipeline \
  --model ../models/google_gemma-4-E4B-it-Q6_K.gguf \
  --prompt-dir prompts \
  --temperature 1.0 --top-p 0.95 --top-k 64 --n-ctx 8192 --seed -1
 ```
 #### CLI Flags
 | Flag            | Purpose                                                                                              |
-| --------------- | ------------------------------------------------------- |
+| --------------- | ---------------------------------------------------------------------------------------------------- |
 | `--mocked`      | Deterministic mock generator, no model required.                                                     |
 | `--model, -m`   | Path to a GGUF file. Required unless `--mocked` is set.                                              |
 | `--prompt-dir`  | Directory containing prompt files (e.g. `BREWERY_GENERATION.md`). Required unless `--mocked` is set. |
 | `--output, -o`  | Directory for generated SQLite artifacts. Default: `output`.                                         |
 | `--log-path`    | Path for application logs. Default: `pipeline.log`.                                                  |
 | `--temperature` | Sampling temperature. Default: `1.0`.                                                                |
 | `--top-p`       | Nucleus sampling. Default: `0.95`.                                                                   |
 | `--top-k`       | Top-k sampling. Default: `64`.                                                                       |
@@ -100,7 +118,91 @@ error before the pipeline starts. Sampling flags are ignored when `--mocked` is
 set.
 The post-build step copies `prompts/` into `build/prompts/`. Rebuild after
-editing `prompts/system.md`.
+editing any prompt file.
 ---
 ## Docker / RunPod
 The `tooling/pipeline/runpod/` directory contains a GPU-ready container
 configuration for running the pipeline on RunPod or any Docker host with an
 NVIDIA GPU.
 ### How it works
 The container uses a two-stage build. The first stage pulls prebuilt
 `libllama`, `libggml`, and backend plugin libraries (including `libggml-cuda.so`
 and the CPU variant plugins) from `ghcr.io/ggml-org/llama.cpp:full-cuda`. The
 second stage copies those libraries into `/usr/local/lib` and runs `ldconfig` so
 the dynamic linker and `dlopen` calls from `ggml_backend_load_all()` can resolve
 the CUDA backend plugin at runtime. llama.cpp headers are cloned at the matching
 tag and installed into `/usr/local/include`. CMake auto-detects both and skips
 the FetchContent source build entirely, keeping image build times short.
 `GGML_BACKEND_PATH` is set to `/usr/local/lib` so llama.cpp knows where to scan
 for backend plugins.
 ### Build the image
 Run from the `tooling/pipeline/` directory (the CMake project root), not from
 inside `runpod/`, so the `COPY . .` step picks up the full project context.
 ```bash
 docker build -t biergarten-pipeline:latest -f runpod/Dockerfile .
 ```
 To monitor the full build output and confirm CMake selects the system llama.cpp:
 ```bash
 docker build \
  --progress=plain \
  --no-cache \
  -t biergarten-pipeline:latest \
  -f runpod/Dockerfile \
  . 2>&1 | tee build.log
 ```
 Look for `[biergarten] Found system llama.cpp — skipping FetchContent` in the
 output to confirm the fast path was taken.
 ### Run in mocked mode
 No model or GPU required. Useful for validating the pipeline logic and SQLite
 export path.
 ```bash
 docker run --rm \
  -e BIERGARTEN_MODE=mocked \
  -v "$PWD/output:/workspace/output" \
  -v "$PWD/logs:/workspace/logs" \
  biergarten-pipeline:latest
 ```
 ### Run in live mode
 Mount your GGUF model before starting. The container validates the model path
 before launching the binary.
 ```bash
 docker run --rm \
  --runtime=nvidia \
  -e BIERGARTEN_MODE=live \
  -e GGML_BACKEND_PATH="/usr/local/lib/libggml-cuda.so" \
  -v "$PWD/models:/workspace/models" \
  -v "$PWD/output:/workspace/output" \
  -v "$PWD/logs:/workspace/logs" \
  biergarten-pipeline:latest
 ```
 The model must be present at `./models/google_gemma-4-E4B-it-Q6_K.gguf` on the
 host. See [Model](#model) above for the download command.
 ### RunPod deployment
 Use a GPU pod template. Mount persistent storage for `/workspace/models`,
 `/workspace/output`, and `/workspace/logs`. Set `BIERGARTEN_MODE=live` in the
 template environment. See `tooling/pipeline/runpod/pod-template.yaml` for a
 starter template.
 ---
@@ -197,16 +299,18 @@ code, latitude, and longitude for each entry.
 ## Tech Stack
 - C++20
- CMake 3.24+
+- CMake 3.31+
 - Boost.JSON, Boost.ProgramOptions, Boost.DI
 - spdlog
- libcurl
+- cpp-httplib (with OpenSSL)
 - SQLite amalgamation fetched and compiled via CMake FetchContent
- llama.cpp
+- llama.cpp (auto-detected from system install or fetched via FetchContent)
 - Docker with NVIDIA CUDA 12.6 base image for GPU container builds
 - RunPod for cloud GPU inference
-The build fetches Boost.DI, spdlog, llama.cpp, and SQLite via CMake. Metal is
+The build fetches Boost.DI, spdlog, and SQLite via CMake. llama.cpp is fetched
-enabled on Apple Silicon; CUDA or HIP/ROCm is detected on Linux when the toolkit
+only when a system installation is not detected. Metal is enabled on Apple
-is present.
+Silicon; CUDA or HIP/ROCm is detected on Linux when the toolkit is present.
 > **Code Style:** Modern C++20 throughout — RAII for ownership,
 > `std::unique_ptr` for injected dependencies, `std::optional` for parse
@@ -218,7 +322,7 @@ is present.
 ## Tested Hardware
-### ARM macOS - M1 Pro
+### ARM macOS — M1 Pro
 |           |                                   |
 | --------- | --------------------------------- |
@@ -229,7 +333,7 @@ is present.
 | Model     | Gemma 4 E4B                       |
 | Inference | llama.cpp with Metal              |
-### x86_64 Linux - NVIDIA RTX 2000
+### x86_64 Linux — NVIDIA RTX 2000
 |           |                                |
 | --------- | ------------------------------ |
@@ -240,6 +344,15 @@ is present.
 | Model     | Gemma 4 E4B                    |
 | Inference | llama.cpp with CUDA 12.x       |
 ### x86_64 Linux — Docker / RunPod (NVIDIA CUDA)
 |           |                                             |
 | --------- | ------------------------------------------- |
 | Host      | RunPod GPU pod                              |
 | Base      | nvidia/cuda:12.6.3-devel-ubuntu24.04        |
 | Model     | Gemma 4 E4B Q6_K                            |
 | Inference | llama.cpp prebuilt CUDA backends via dlopen |
 ---
 ## Fixture Strategy
@@ -260,8 +373,9 @@ is present.
 | `includes/`                  | Public headers and shared models.                  |
 | `src/`                       | Implementation files.                              |
 | `locations.json`             | Curated city input copied into the build tree.     |
-| `prompts/`                   | System prompt used by the model-backed path.       |
+| `prompts/`                   | System prompts used by the model-backed path.      |
 | `diagrams/`                  | Architecture and pipeline diagrams.                |
 | `tooling/pipeline/runpod/`   | Dockerfile, launcher, and RunPod pod template.     |
 | `ETHICS-AND-KNOWN-ISSUES.md` | Ethics, bias, hallucination analysis, mitigations. |
 ---
@@ -276,6 +390,7 @@ is present.
 - `src/data_generation/llama/` — local inference, prompt loading, output
  validation.
 - `src/data_generation/mock/` — deterministic fallback.
 - `tooling/pipeline/runpod/` — container build and runtime launcher.
 ---
--- a/docs/pipeline/diagrams/current/activity.puml
+++ b/docs/pipeline/diagrams/current/activity.puml
@@ -29,7 +29,7 @@ if (Are arguments valid?) then (no)
 else (yes)
 endif
-:Init CurlGlobalState & LlamaBackendState;
+:Init OpenSSL global state & LlamaBackendState;
 :di::make_injector(...);
 :injector.create<std::unique_ptr<BiergartenDataGenerator>>();
 :BiergartenDataGenerator::Run();
--- a/docs/pipeline/diagrams/current/class.puml
+++ b/docs/pipeline/diagrams/current/class.puml
@@ -52,7 +52,7 @@ interface WebClient <<interface>> {
  + UrlEncode(value : const std::string&) : std::string
 }
-class CURLWebClient {
+class HttpWebClient {
  + Get(url : const std::string&) : std::string
  + UrlEncode(value : const std::string&) : std::string
 }
@@ -130,7 +130,7 @@ BiergartenDataGenerator *-- IExportService : owns
 IEnrichmentService <|.. WikipediaService : implements
 WikipediaService *-- WebClient : owns
-WebClient <|.. CURLWebClient : implements
+WebClient <|.. HttpWebClient : implements
 DataGenerator <|.. MockGenerator : implements
 DataGenerator <|.. LlamaGenerator : implements
--- a/docs/pipeline/diagrams/planned/activity.puml
+++ b/docs/pipeline/diagrams/planned/activity.puml
@@ -13,7 +13,7 @@ if (Invalid args?) then (yes)
  stop
 else (no)
 endif
-:Init CurlGlobalState & LlamaBackendState;
+:Init OpenSSL global state & LlamaBackendState;
 :Build DI injector;
 :Initialize SqliteExportService;
--- a/docs/pipeline/diagrams/planned/class.puml
+++ b/docs/pipeline/diagrams/planned/class.puml
@@ -356,7 +356,7 @@ package "Infrastructure: Enrichment" {
    + UrlEncode(value : const std::string&) : std::string
  }
-  class CURLWebClient {
+  class HttpWebClient {
    + Get(url : const std::string&) : std::string
    + UrlEncode(value : const std::string&) : std::string
  }
@@ -520,7 +520,7 @@ CheckinDistributionStrategy    <|.. RandomCheckinStrategy
 FollowGenerationStrategy       <|.. RandomFollowStrategy
 FollowGenerationStrategy       <|.. ActivityWeightedFollowStrategy
 EnrichmentService              <|.. WikipediaService
-WebClient                      <|.. CURLWebClient
+WebClient                      <|.. HttpWebClient
 DataGenerator                  <|.. MockGenerator
 DataGenerator                  <|.. LlamaGenerator
 PromptFormatter                <|.. Gemma4JinjaPromptFormatter
--- a/tooling/pipeline/.dockerignore
+++ b/tooling/pipeline/.dockerignore
@@ -0,0 +1,9 @@
 build/
 cmake-build-debug/
 .git/
 .idea/
 **/*.sqlite
 **/*.log
 **/*.sqlite3
 **/*.db
--- a/tooling/pipeline/CMakeLists.txt
+++ b/tooling/pipeline/CMakeLists.txt
@@ -1,6 +1,10 @@
 cmake_minimum_required(VERSION 3.31)
 project(biergarten-pipeline)
 # Set policy to allow FetchContent_Populate for header-only libraries
 # that have outdated CMakeLists.txt files
 cmake_policy(SET CMP0169 OLD)
 # 1. Build Options
 option(BIERGARTEN_MOCK_ONLY "Build with mock data generators only — skips llama.cpp" OFF)
@@ -51,16 +55,23 @@ include(FetchContent)
 find_package(Boost REQUIRED COMPONENTS json program_options)
 # Boost.DI (unofficial Boost extension, must declare separately from main Boost dependency)
 # Header-only library, so we only fetch without invoking its CMakeLists.txt
 FetchContent_Declare(
        boost-di
        GIT_REPOSITORY https://github.com/boost-ext/di.git
        GIT_TAG v1.3.0
        GIT_SHALLOW TRUE
 )
-FetchContent_MakeAvailable(boost-di)
+FetchContent_GetProperties(boost-di)
-if (TARGET Boost.DI AND NOT TARGET boost::di)
+if(NOT boost-di_POPULATED)
-    add_library(boost::di ALIAS Boost.DI)
+        FetchContent_Populate(boost-di)
 endif()
 add_library(boost_di INTERFACE)
 add_library(boost::di ALIAS boost_di)
 target_include_directories(boost_di INTERFACE
        $<BUILD_INTERFACE:${boost-di_SOURCE_DIR}/include>
 )
 # SQLite amalgamation
 FetchContent_Declare(
        sqlite_amalgamation
@@ -77,13 +88,30 @@ endif ()
 # llama.cpp — skipped for mock-only builds
 if(NOT BIERGARTEN_MOCK_ONLY)
        find_library(LLAMA_LIB NAMES llama)
        find_library(GGML_LIB NAMES ggml)
        find_library(GGML_BASE_LIB NAMES ggml-base)
        find_path(LLAMA_INC_DIR NAMES llama.h PATH_SUFFIXES include)
        if(LLAMA_LIB AND GGML_LIB AND GGML_BASE_LIB AND LLAMA_INC_DIR)
                message(STATUS "[biergarten] Found system llama.cpp — skipping FetchContent")
                add_library(llama SHARED IMPORTED)
                set_target_properties(llama PROPERTIES
                        IMPORTED_LOCATION "${LLAMA_LIB}"
                        INTERFACE_INCLUDE_DIRECTORIES "${LLAMA_INC_DIR}"
                        INTERFACE_LINK_LIBRARIES "${GGML_LIB};${GGML_BASE_LIB}"
                )
        else()
                message(STATUS "[biergarten] System llama.cpp not found — fetching via FetchContent")
                FetchContent_Declare(
                        llama-cpp
                        GIT_REPOSITORY https://github.com/ggml-org/llama.cpp.git
-            GIT_TAG b8742
+                        GIT_TAG b9012
                )
                FetchContent_MakeAvailable(llama-cpp)
        endif()
 endif()
 # spdlog
 FetchContent_Declare(
@@ -109,7 +137,8 @@ set(HTTPLIB_REQUIRE_OPENSSL ON CACHE BOOL "Require OpenSSL for cpp-httplib" FORC
 FetchContent_MakeAvailable(cpp-httplib)
 # 5. Executable & Sources
-add_executable(${PROJECT_NAME})
+add_executable(${PROJECT_NAME}
        includes/services/enrichment/mock_enrichment.h)
 # --- Entry point ---
 target_sources(${PROJECT_NAME} PRIVATE
@@ -166,9 +195,9 @@ endif ()
 # --- services: wikipedia ---
 target_sources(${PROJECT_NAME} PRIVATE
-        src/services/wikipedia/wikipedia_service.cc
+        src/services/enrichment/wikipedia/wikipedia_service.cc
-        src/services/wikipedia/fetch_extract.cc
+        src/services/enrichment/wikipedia/fetch_extract.cc
-        src/services/wikipedia/get_summary.cc
+        src/services/enrichment/wikipedia/get_summary.cc
 )
 # --- services: sqlite ---
@@ -189,8 +218,6 @@ target_sources(${PROJECT_NAME} PRIVATE
 # 6. Include Directories, Link Libraries & Compile Definitions
 target_include_directories(${PROJECT_NAME} PRIVATE
        includes
        $<$<NOT:$<BOOL:${BIERGARTEN_MOCK_ONLY}>>:${llama-cpp_SOURCE_DIR}/include>
        $<$<NOT:$<BOOL:${BIERGARTEN_MOCK_ONLY}>>:${llama-cpp_SOURCE_DIR}/common>
 )
 target_link_libraries(${PROJECT_NAME} PRIVATE
--- a/tooling/pipeline/includes/biergarten_data_generator.h
+++ b/tooling/pipeline/includes/biergarten_data_generator.h
@@ -12,8 +12,8 @@
 #include "data_generation/data_generator.h"
 #include "data_model/generated_models.h"
 #include "services/enrichment/enrichment_service.h"
 #include "services/database/export_service.h"
 #include "services/enrichment/enrichment_service.h"
 /**
 * @brief Main data generator class for the Biergarten pipeline.
@@ -32,7 +32,8 @@ class BiergartenDataGenerator {
   */
  BiergartenDataGenerator(std::unique_ptr<IEnrichmentService> context_service,
                          std::unique_ptr<DataGenerator> generator,
-                          std::unique_ptr<IExportService> exporter);
+                          std::unique_ptr<IExportService> exporter,
                          const ApplicationOptions& application_options);
  /**
   * @brief Run the data generation pipeline.
@@ -56,12 +57,14 @@ class BiergartenDataGenerator {
  /// @brief Storage backend for generated brewery records.
  std::unique_ptr<IExportService> exporter_;
  const ApplicationOptions application_options_;
  /**
   * @brief Load locations from JSON and sample cities.
   *
   * @return Vector of sampled locations capped at 50 entries.
   */
-  static std::vector<Location> QueryCitiesWithCountries();
+  std::vector<Location> QueryCitiesWithCountries();
  /**
   * @brief Generate breweries for enriched cities.
--- a/tooling/pipeline/includes/data_generation/llama_generator.h
+++ b/tooling/pipeline/includes/data_generation/llama_generator.h
@@ -14,10 +14,10 @@
 #include <string>
 #include <string_view>
 #include "../services/prompting/prompt_directory.h"
 #include "data_generation/data_generator.h"
 #include "data_generation/prompt_formatting/prompt_formatter.h"
 #include "data_model/models.h"
 #include "../services/prompting/prompt_directory.h"
 struct llama_model;
 struct llama_context;
@@ -129,6 +129,7 @@ class LlamaGenerator final : public DataGenerator {
  uint32_t sampling_top_k_ = kDefaultSamplingTopK;
  std::mt19937 rng_;
  uint32_t n_ctx_ = kDefaultContextSize;
  int n_gpu_layers_ = 0;
  std::unique_ptr<IPromptFormatter> prompt_formatter_;
  std::unique_ptr<IPromptDirectory> prompt_directory_;
 };
--- a/tooling/pipeline/includes/data_model/models.h
+++ b/tooling/pipeline/includes/data_model/models.h
@@ -3,7 +3,8 @@
 /**
 * @file data_model/models.h
- * @brief Core data models: locations, application configuration, and generation inputs.
+ * @brief Core data models: locations, application configuration, and generation
 * inputs.
 */
 #include <boost/program_options.hpp>
@@ -82,6 +83,9 @@ struct SamplingOptions {
  /// @brief Random seed (-1 for random, otherwise non-negative).
  int seed = -1;
  /// @brief Number of layers to offload to GPU.
  int n_gpu_layers = 0;
 };
 /**
@@ -94,6 +98,8 @@ struct GeneratorOptions {
  /// @brief Use mocked generator instead of actual LLM inference.
  bool use_mocked = false;
  /// @brief Specific sampling parameters for this generator.
  /// If nullopt, the application should use global defaults.
  std::optional<SamplingOptions> sampling;
@@ -112,6 +118,10 @@ struct PipelineOptions {
  /// @brief Path for application logs.
  std::filesystem::path log_path;
  /// @brief Number of locations to sample from the dataset
  /// More locations -> more users/more breweries
  uint32_t location_count;
 };
 /**
--- a/tooling/pipeline/includes/services/enrichment/mock_enrichment.h
+++ b/tooling/pipeline/includes/services/enrichment/mock_enrichment.h
@@ -0,0 +1,17 @@
 //
 // Created by aaronpo on 13/05/2026.
 //
 #ifndef BIERGARTEN_PIPELINE_INCLUDES_SERVICES_ENRICHMENT_MOCK_ENRICHMENT_H_
 #define BIERGARTEN_PIPELINE_INCLUDES_SERVICES_ENRICHMENT_MOCK_ENRICHMENT_H_
 #include <string>
 #include "enrichment_service.h"
 class MockEnrichmentService final : public IEnrichmentService {
 public:
  std::string GetLocationContext(const Location& /*loc*/) override {
    return {};
  }
 };
 #endif  // BIERGARTEN_PIPELINE_INCLUDES_SERVICES_ENRICHMENT_MOCK_ENRICHMENT_H_
--- a/tooling/pipeline/includes/services/enrichment/wikipedia_service.h
+++ b/tooling/pipeline/includes/services/enrichment/wikipedia_service.h
@@ -15,10 +15,10 @@
 #include "web_client/web_client.h"
 /// @brief Provides Wikipedia summary lookups backed by cached raw extracts.
-class WikipediaService final : public IEnrichmentService {
+class WikipediaEnrichmentService final : public IEnrichmentService {
 public:
  /// @brief Creates a new Wikipedia service with the provided web client.
-  explicit WikipediaService(std::unique_ptr<WebClient> client);
+  explicit WikipediaEnrichmentService(std::unique_ptr<WebClient> client);
  /// @brief Returns the Wikipedia-derived context for a location.
  [[nodiscard]] std::string GetLocationContext(const Location& loc) override;
--- a/tooling/pipeline/includes/web_client/http_web_client.h
+++ b/tooling/pipeline/includes/web_client/http_web_client.h
@@ -42,7 +42,7 @@ public:
   * @param value Raw string to encode.
   * @return Percent-encoded string safe for use in a URL.
   */
-  std::string UrlEncode(const std::string& value) override;
+  std::string EncodeURL(const std::string& value) override;
 };
--- a/tooling/pipeline/includes/web_client/web_client.h
+++ b/tooling/pipeline/includes/web_client/web_client.h
@@ -30,7 +30,7 @@ class WebClient {
   * @param value Raw string value.
   * @return Encoded value safe for URL usage.
   */
-  virtual std::string UrlEncode(const std::string& value) = 0;
+  virtual std::string EncodeURL(const std::string& value) = 0;
 };
 #endif  // BIERGARTEN_PIPELINE_INCLUDES_WEB_CLIENT_WEB_CLIENT_H_
--- a/tooling/pipeline/runpod/.dockerignore
+++ b/tooling/pipeline/runpod/.dockerignore
@@ -0,0 +1,9 @@
 # Ignore model files!
 *.gguf
 *.bin
 models/
 weights/
 # Ignore local build folders
 build/
 .git/
--- a/tooling/pipeline/runpod/Dockerfile
+++ b/tooling/pipeline/runpod/Dockerfile
@@ -0,0 +1,72 @@
 # --- Stage 1: Build Environment (The "Heavy" Stage) ---
 FROM nvidia/cuda:12.6.3-devel-ubuntu24.04 AS builder
 ENV DEBIAN_FRONTEND=noninteractive \
  CMAKE_GENERATOR=Ninja
 RUN apt-get update && apt-get install -y --no-install-recommends \
  build-essential ca-certificates curl git libboost-json-dev \
  libboost-program-options-dev libssl-dev ninja-build pkg-config zlib1g-dev \
  && rm -rf /var/lib/apt/lists/*
 # Install modern CMake
 RUN curl -L https://github.com/Kitware/CMake/releases/download/v3.31.0/cmake-3.31.0-linux-x86_64.sh -o cmake.sh && \
  sh cmake.sh --skip-license --prefix=/usr/local && rm cmake.sh
 # Get headers for C++ build
 RUN curl -L https://github.com/ggml-org/llama.cpp/archive/refs/tags/b9012.tar.gz -o /tmp/llama-src.tar.gz && \
  tar -xzf /tmp/llama-src.tar.gz -C /tmp && \
  cp -r /tmp/llama.cpp-b9012/include/* /usr/local/include/ && \
  cp -r /tmp/llama.cpp-b9012/ggml/include/* /usr/local/include/
 # Pull llama.cpp binaries to use during build if needed
 COPY --from=ghcr.io/ggml-org/llama.cpp:full-cuda /app/lib*.so* /usr/local/lib/
 WORKDIR /app
 COPY . .
 # Build the C++ pipeline
 RUN cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release && \
  cmake --build build -j$(nproc)
 # --- Stage 2: Runtime Environment (The "Slim" Stage) ---
 FROM nvidia/cuda:12.6.3-runtime-ubuntu24.04 AS runtime
 # Install only necessary runtime shared libraries
 RUN apt-get update && apt-get install -y --no-install-recommends \
  curl \
  ca-certificates \
  libboost-json1.83.0 \
  libboost-program-options1.83.0 \
  libgomp1 \
  libssl3 \
  zlib1g \
  && rm -rf /var/lib/apt/lists/*
 ENV APP_ROOT=/app \
  LD_LIBRARY_PATH="/usr/local/lib:${LD_LIBRARY_PATH}"
 WORKDIR /app/build
 # Copy only the compiled binaries from the builder
 COPY --from=builder /app/build/biergarten-pipeline ./
 # Copy required config files
 COPY locations.json /app/build/
 COPY beer-styles.json /app/build/
 # Copy prompt templates
 COPY prompts /app/prompts
 # Copy only the necessary shared libraries from builder/llama-bin
 COPY --from=ghcr.io/ggml-org/llama.cpp:full-cuda /app/lib*.so* /usr/local/lib/
 # Co-locate plugins
 RUN cp /usr/local/lib/libggml-cuda.so . 2>/dev/null || true && \
  cp /usr/local/lib/libggml-cpu*.so . 2>/dev/null || true
 # Setup Start Script
 COPY ./runpod/start.sh /usr/local/bin/biergarten-start
 RUN chmod +x /usr/local/bin/biergarten-start
 ENTRYPOINT ["/usr/local/bin/biergarten-start"]
--- a/tooling/pipeline/runpod/README.md
+++ b/tooling/pipeline/runpod/README.md
@@ -0,0 +1,8 @@
 ```bash
 touch runpod/start.sh
 docker build \
 --progress=plain \
 -t biergarten-pipeline:latest \
 -f runpod/Dockerfile \
 . 2>&1 | tee build.log
 ```
--- a/tooling/pipeline/runpod/pod-template.yaml
+++ b/tooling/pipeline/runpod/pod-template.yaml
@@ -0,0 +1,22 @@
 name: biergarten-pipeline-live
 imageName: biergarten-pipeline:latest
 category: NVIDIA
 containerDiskInGb: 50
 volumeInGb: 50
 volumeMountPath: /workspace
 dockerEntrypoint:
  - /usr/local/bin/biergarten-start
 dockerStartCmd: []
 isPublic: false
 isServerless: false
 env:
  BIERGARTEN_MODE: live
  BIERGARTEN_MODEL_PATH: /workspace/models/google_gemma-4-E4B-it-Q6_K.gguf
  BIERGARTEN_PROMPT_DIR: /workspace/app/build/prompts
  BIERGARTEN_OUTPUT_DIR: /workspace/output
  BIERGARTEN_LOG_PATH: /workspace/logs/pipeline.log
  BIERGARTEN_TEMPERATURE: "1.0"
  BIERGARTEN_TOP_P: "0.95"
  BIERGARTEN_TOP_K: "64"
  BIERGARTEN_N_CTX: "8192"
  BIERGARTEN_SEED: "-1"
--- a/tooling/pipeline/runpod/start.sh
+++ b/tooling/pipeline/runpod/start.sh
@@ -0,0 +1,58 @@
 #!/bin/bash
 set -e
 MODEL_PATH="${BIERGARTEN_MODEL_PATH:-/workspace/models/google_gemma-4-E4B-it-Q6_K.gguf}"
 OUTPUT_DIR="${BIERGARTEN_OUTPUT_DIR:-/workspace/output}"
 LOG_PATH="${BIERGARTEN_LOG_PATH:-/workspace/logs/pipeline.log}"
 EXECUTABLE="/app/build/biergarten-pipeline"
 PROMPT_DIR="/app/prompts"
 echo "--- Starting Biergarten Pipeline Environment Check ---"
 # Ensure directories exist
 mkdir -p "$OUTPUT_DIR"
 mkdir -p "$(dirname "$LOG_PATH")"
 mkdir -p "$(dirname "$MODEL_PATH")"
 # Download model if missing
 if [ ! -f "$MODEL_PATH" ]; then
    echo "Model not found. Downloading (this may take a while)..."
    curl -L -C - \
      -o "$MODEL_PATH" \
      "https://huggingface.co/bartowski/google_gemma-4-E4B-it-GGUF/resolve/main/google_gemma-4-E4B-it-Q6_K.gguf?download=true"
    echo "Download complete."
 fi
 # Verify model exists
 if [ ! -f "$MODEL_PATH" ]; then
    echo "ERROR: Model still not found after download attempt."
    exit 1
 fi
 # Default GPU layers
 GL_LAYERS="${BIERGARTEN_GL_LAYERS:-40}"
 # Build args
 ARGS=(
    "--model"      "$MODEL_PATH"
    "--prompt-dir" "$PROMPT_DIR"
    "--output"     "$OUTPUT_DIR"
    "--log-path"   "$LOG_PATH"
    "--n-gpu-layers" "$GL_LAYERS"
 )
 # Optional params
 [[ -n "$BIERGARTEN_TEMPERATURE" ]] && ARGS+=("--temperature"  "$BIERGARTEN_TEMPERATURE")
 [[ -n "$BIERGARTEN_TOP_P" ]]       && ARGS+=("--top-p"        "$BIERGARTEN_TOP_P")
 [[ -n "$BIERGARTEN_TOP_K" ]]       && ARGS+=("--top-k"        "$BIERGARTEN_TOP_K")
 [[ -n "$BIERGARTEN_N_CTX" ]]       && ARGS+=("--n-ctx"        "$BIERGARTEN_N_CTX")
 [[ -n "$BIERGARTEN_SEED" ]]        && ARGS+=("--seed"         "$BIERGARTEN_SEED")
 # Extra args
 [[ -n "$BIERGARTEN_EXTRA_ARGS" ]] && ARGS+=($BIERGARTEN_EXTRA_ARGS)
 echo "--- Executing: $EXECUTABLE ${ARGS[*]} ---"
 exec "$EXECUTABLE" "${ARGS[@]}"
--- a/tooling/pipeline/src/application_options/parse_arguments.cc
+++ b/tooling/pipeline/src/application_options/parse_arguments.cc
@@ -30,6 +30,8 @@ std::optional<ApplicationOptions> ParseArguments(const int argc, char** argv) {
        "Context window size in tokens");
    opt("seed", prog_opts::value<int>()->default_value(sampling_defaults.seed),
        "Sampler seed: -1 for random, otherwise non-negative integer");
    opt("n-gpu-layers", prog_opts::value<int>()->default_value(0),
        "Number of layers to offload to GPU");
  };
  // --mocked and --model are mutually exclusive; validation is enforced below
@@ -50,6 +52,7 @@ std::optional<ApplicationOptions> ParseArguments(const int argc, char** argv) {
    opt("prompt-dir", prog_opts::value<std::string>()->default_value(""),
        "Directory containing named prompt files (e.g. BREWERY_GENERATION.md)."
        " Required when not using --mocked.");
    opt("location-count", prog_opts::value<uint32_t>()->default_value(10));
  };
  add_sampling_options();
@@ -82,9 +85,12 @@ std::optional<ApplicationOptions> ParseArguments(const int argc, char** argv) {
    options.pipeline.output_path = var_map["output"].as<std::string>();
    options.pipeline.log_path = var_map["log-path"].as<std::string>();
    options.pipeline.prompt_dir = var_map["prompt-dir"].as<std::string>();
    options.pipeline.location_count =
      var_map["location-count"].as<uint32_t>();
    const bool use_mocked = var_map["mocked"].as<bool>();
    const std::string model_path = var_map["model"].as<std::string>();
    const int n_gpu_layers = var_map["n-gpu-layers"].as<int>();
    // Enforce mutual exclusivity before any further configuration is applied.
    if (use_mocked && !model_path.empty()) {
@@ -110,6 +116,7 @@ std::optional<ApplicationOptions> ParseArguments(const int argc, char** argv) {
    options.generator.use_mocked = use_mocked;
    options.generator.model_path = model_path;
    // options.generator.n_gpu_layers = n_gpu_layers;
    // Only populate sampling config when the user explicitly overrides at
    // least one value. Leaving it as std::nullopt lets LlamaGenerator fall
@@ -118,7 +125,7 @@ std::optional<ApplicationOptions> ParseArguments(const int argc, char** argv) {
    const bool user_provided_sampling =
        !var_map["temperature"].defaulted() || !var_map["top-p"].defaulted() ||
        !var_map["top-k"].defaulted() || !var_map["n-ctx"].defaulted() ||
-        !var_map["seed"].defaulted();
+        !var_map["seed"].defaulted() || !var_map["n_gpu_layers"].defaulted();
    if (user_provided_sampling) {
      // Warn but do not fail — the run is still valid, the flags are just
@@ -132,6 +139,7 @@ std::optional<ApplicationOptions> ParseArguments(const int argc, char** argv) {
        sampling.top_k = var_map["top-k"].as<uint32_t>();
        sampling.n_ctx = var_map["n-ctx"].as<uint32_t>();
        sampling.seed = var_map["seed"].as<int>();
        sampling.n_gpu_layers = var_map["n-gpu-layers"].as<int>();
        options.generator.sampling = sampling;
      }
--- a/tooling/pipeline/src/biergarten_data_generator/biergarten_data_generator.cc
+++ b/tooling/pipeline/src/biergarten_data_generator/biergarten_data_generator.cc
@@ -10,7 +10,9 @@
 BiergartenDataGenerator::BiergartenDataGenerator(
    std::unique_ptr<IEnrichmentService> context_service,
    std::unique_ptr<DataGenerator> generator,
-    std::unique_ptr<IExportService> exporter)
+    std::unique_ptr<IExportService> exporter,
    const ApplicationOptions &app_options)
    : context_service_(std::move(context_service)),
      generator_(std::move(generator)),
-      exporter_(std::move(exporter)) {}
+      exporter_(std::move(exporter)),
      application_options_(app_options) {}
--- a/tooling/pipeline/src/biergarten_data_generator/query_cities_with_countries.cc
+++ b/tooling/pipeline/src/biergarten_data_generator/query_cities_with_countries.cc
@@ -13,8 +13,6 @@
 #include "biergarten_data_generator.h"
 #include "json_handling/json_loader.h"
 static constexpr size_t kBreweryAmount = 50;
 std::vector<Location> BiergartenDataGenerator::QueryCitiesWithCountries() {
  spdlog::info("\n=== GEOGRAPHIC DATA OVERVIEW ===");
@@ -23,7 +21,9 @@ std::vector<Location> BiergartenDataGenerator::QueryCitiesWithCountries() {
  auto all_locations = JsonLoader::LoadLocations(locations_path);
  spdlog::info("  Locations available: {}", all_locations.size());
-  const size_t sample_count = std::min(kBreweryAmount, all_locations.size());
+  const size_t sample_count = std::min(
      static_cast<size_t>(application_options_.pipeline.location_count),
      all_locations.size());
  const auto sample_count_signed =
      static_cast<std::iter_difference_t<decltype(all_locations.cbegin())>>(
--- a/tooling/pipeline/src/biergarten_data_generator/run.cc
+++ b/tooling/pipeline/src/biergarten_data_generator/run.cc
@@ -21,8 +21,8 @@ bool BiergartenDataGenerator::Run() {
    for (auto& city : cities) {
      try {
        std::string region_context = context_service_->GetLocationContext(city);
-        spdlog::debug("[Pipeline] Context for '{}' ({}) gathered:\n{}",
+        // spdlog::debug("[Pipeline] Context for '{}' ({}) gathered:\n{}",
-                      city.city, city.country, region_context);
+        //               city.city, city.iso3166_2, region_context);
        enriched.push_back(
            EnrichedCity{.location = std::move(city),
--- a/tooling/pipeline/src/data_generation/llama/llama_generator.cc
+++ b/tooling/pipeline/src/data_generation/llama/llama_generator.cc
@@ -89,6 +89,7 @@ LlamaGenerator::LlamaGenerator(
  }
  n_ctx_ = sampling.n_ctx;
  n_gpu_layers_ = sampling.n_gpu_layers;
  this->Load(model_path);
 }
--- a/tooling/pipeline/src/data_generation/llama/load.cc
+++ b/tooling/pipeline/src/data_generation/llama/load.cc
@@ -12,6 +12,7 @@
 #include <utility>
 #include "data_generation/llama_generator.h"
 #include "ggml-backend.h"
 #include "llama.h"
 // Maximum batch size for decode operations. Capping the batch prevents
@@ -22,7 +23,12 @@ void LlamaGenerator::Load(const std::string& model_path) {
  context_.reset();
  model_.reset();
-  const llama_model_params model_params = llama_model_default_params();
+  // Specifically load dynamic ggml backends (like CUDA) that are provided
  // externally before attempting to load a model.
  ggml_backend_load_all();
  llama_model_params model_params = llama_model_default_params();
  model_params.n_gpu_layers = n_gpu_layers_;
  LlamaGenerator::ModelHandle loaded_model(
      llama_model_load_from_file(model_path.c_str(), model_params));
  if (!loaded_model) {
--- a/tooling/pipeline/src/main.cc
+++ b/tooling/pipeline/src/main.cc
@@ -8,11 +8,9 @@
 #include <boost/di.hpp>
 #include <boost/program_options.hpp>
 #include <exception>
 #include <memory>
 #include <optional>
 #include <string>
 #include "biergarten_data_generator.h"
@@ -21,12 +19,13 @@
 #include "data_generation/prompt_formatting/gemma4_jinja_prompt_formatter.h"
 #include "data_model/models.h"
 #include "llama_backend_state.h"
 #include "services/enrichment/enrichment_service.h"
 #include "services/database/export_service.h"
 #include "services/prompting/prompt_directory.h"
 #include "services/database/sqlite_export_service.h"
 #include "services/datetime/timer.h"
 #include "services/enrichment/enrichment_service.h"
 #include "services/enrichment/mock_enrichment.h"
 #include "services/enrichment/wikipedia_service.h"
 #include "services/prompting/prompt_directory.h"
 #include "web_client/http_web_client.h"
 namespace di = boost::di;
@@ -43,7 +42,9 @@ int main(const int argc, char** argv) {
    spdlog::set_level(spdlog::level::debug);
 #endif
-    const auto parsed_options = ParseArguments(argc, argv);
+    const std::optional<ApplicationOptions> parsed_options =
        ParseArguments(argc, argv);
    if (!parsed_options.has_value()) {
      return 0;
    }
@@ -65,12 +66,20 @@ int main(const int argc, char** argv) {
    }
    const auto injector = di::make_injector(
        di::bind<WebClient>().to<HttpWebClient>(),
        di::bind<ApplicationOptions>().to(options),
-        di::bind<IEnrichmentService>().to<WikipediaService>(),
+        di::bind<std::string>().to(model_path),
        di::bind<WebClient>().to<HttpWebClient>(),
        di::bind<IExportService>().to<SqliteExportService>(),
        di::bind<IPromptFormatter>().to<Gemma4JinjaPromptFormatter>(),
-        di::bind<std::string>().to(model_path),
+        di::bind<IEnrichmentService>().to(
            [options](const auto& inj) -> std::unique_ptr<IEnrichmentService> {
              if (options.generator.use_mocked) {
                return std::make_unique<MockEnrichmentService>();
              }
              return std::make_unique<WikipediaEnrichmentService>(
                  inj.template create<std::unique_ptr<WebClient>>());
            }),
        di::bind<DataGenerator>().to(
            [options, model_path, sampling, &prompt_directory](
                const auto& inj) -> std::unique_ptr<DataGenerator> {
@@ -89,9 +98,11 @@ int main(const int argc, char** argv) {
                  options, model_path,
                  inj.template create<std::unique_ptr<IPromptFormatter>>(),
                  std::move(prompt_directory));
-            }));
+            })
-    auto generator =
+    );
    const auto generator =
        injector.create<std::unique_ptr<BiergartenDataGenerator>>();
    if (!generator->Run()) {
--- a/tooling/pipeline/src/services/enrichment/wikipedia/fetch_extract.cc
+++ b/tooling/pipeline/src/services/enrichment/wikipedia/fetch_extract.cc
@@ -0,0 +1,112 @@
 /**
 * @file wikipedia/fetch_extract.cc
 */
 #include <spdlog/spdlog.h>
 #include <boost/json.hpp>
 #include <chrono>
 #include <format>
 #include <string>
 #include <string_view>
 #include <thread>
 #include "services/enrichment/wikipedia_service.h"
 using namespace boost;
 std::string WikipediaEnrichmentService::FetchExtract(std::string_view query) {
  const std::string cache_key(query);
  // 1. Cache Lookup
  if (const auto cache_it = this->extract_cache_.find(cache_key);
      cache_it != this->extract_cache_.end()) {
    spdlog::debug("Wikipedia: Cache hit for {}!", cache_key);
    return cache_it->second;
  }
  const std::string encoded = this->client_->EncodeURL(cache_key);
  const std::string url = std::format(
      "https://en.wikipedia.org/w/"
      "api.php?action=query&titles={}&prop=extracts&explaintext=1&format=json",
      encoded);
  const std::string body = this->client_->Get(url);
  {
    using namespace std::literals::chrono_literals;
    std::this_thread::sleep_for(1s);
  }
  // 2. Parse JSON
  system::error_code ec;
  json::value doc = json::parse(body, ec);
  if (ec) {
    spdlog::warn("WikipediaService: JSON parse error for '{}': {}", query,
                 ec.message());
    return {};
  }
  // 3. Safe Extraction
  const json::object* obj = doc.if_object();
  if (obj == nullptr) {
    spdlog::warn("WikipediaService: Expected root object for '{}'", query);
    return {};
  }
  const json::value* query_ptr = obj->if_contains("query");
  const json::value* pages_ptr =
      ((query_ptr != nullptr) && query_ptr->is_object())
          ? query_ptr->get_object().if_contains("pages")
          : nullptr;
  if ((pages_ptr == nullptr) || !pages_ptr->is_object()) {
    spdlog::warn("WikipediaService: Missing query.pages for '{}'", query);
    return {};
  }
  const json::object& pages = pages_ptr->get_object();
  if (pages.empty()) {
    spdlog::warn("WikipediaService: No pages returned for '{}'", query);
    this->extract_cache_.emplace(cache_key, "");
    return {};
  }
  // Wikipedia returns the page under a dynamic ID key; we just want the first
  // one
  const json::value& page_val = pages.begin()->value();
  if (!page_val.is_object()) {
    spdlog::warn("WikipediaService: Unexpected page format for '{}'", query);
    return {};
  }
  const json::object& page = page_val.get_object();
  // Handle 404/Missing status
  if (page.contains("missing")) {
    spdlog::warn("WikipediaService: Page '{}' does not exist", query);
    this->extract_cache_.emplace(cache_key, "");
    return {};
  }
  const json::value* extract_ptr = page.if_contains("extract");
  if ((extract_ptr == nullptr) || !extract_ptr->is_string()) {
    spdlog::warn("WikipediaService: No extract string found for '{}'", query);
    this->extract_cache_.emplace(cache_key, "");
    return {};
  }
  // 4. Success
  std::string extract(extract_ptr->as_string());
  spdlog::info("WikipediaService: Fetched {} chars for '{}'", extract.size(),
                query);
  this->extract_cache_.insert_or_assign(cache_key, extract);
  return extract;
 }
--- a/tooling/pipeline/src/services/enrichment/wikipedia/get_summary.cc
+++ b/tooling/pipeline/src/services/enrichment/wikipedia/get_summary.cc
@@ -0,0 +1,58 @@
 /**
 * @file wikipedia/get_summary.cc
 * @brief WikipediaService::GetLocationContext() implementation.
 */
 #include <spdlog/spdlog.h>
 #include <chrono>
 #include <format>
 #include <string>
 #include <thread>
 #include "services/enrichment/wikipedia_service.h"
 std::string WikipediaEnrichmentService::GetLocationContext(const Location& loc) {
  using namespace std::literals::chrono_literals;
  if (!this->client_) {
    spdlog::warn("Client is nullptr.");
    return {};
  }
  std::string result;
  // std::string region_query(loc.city);
  // if (!loc.country.empty()) {
  //   region_query += loc.state_province,
  //   region_query += ", ";
  //   region_query += loc.country;
  // }
  constexpr std::string_view brewing_query = "brewing";
  const std::string location_query =
      std::format("{}, {}", loc.city, loc.iso3166_2);
  const std::string beer_query = std::format("beer in {}", loc.country);
  auto append_extract = [&result](const std::string& extract) -> void {
    if (extract.empty()) {
      return;
    }
    if (!result.empty()) {
      result += "\n\n";
    }
    result += extract;
  };
  try {
    append_extract(FetchExtract(brewing_query));
    append_extract(FetchExtract(beer_query));
    spdlog::info("Done fetching for {}. Sleeping for 10 seconds.",
                 location_query);
    std::this_thread::sleep_for(10s);
  } catch (const std::runtime_error& e) {
    spdlog::debug("WikipediaService lookup failed for '{}': {}", location_query,
                  e.what());
  }
  return result;
 }
--- a/tooling/pipeline/src/services/enrichment/wikipedia/wikipedia_service.cc
+++ b/tooling/pipeline/src/services/enrichment/wikipedia/wikipedia_service.cc
@@ -7,5 +7,6 @@
 #include <utility>
-WikipediaService::WikipediaService(std::unique_ptr<WebClient> client)
+WikipediaEnrichmentService::WikipediaEnrichmentService(
    std::unique_ptr<WebClient> client)
    : client_(std::move(client)) {}
--- a/tooling/pipeline/src/services/wikipedia/fetch_extract.cc
+++ b/tooling/pipeline/src/services/wikipedia/fetch_extract.cc
@@ -1,61 +0,0 @@
 /**
 * @file wikipedia/fetch_extract.cc
 * @brief WikipediaService::FetchExtract() implementation.
 */
 #include <spdlog/spdlog.h>
 #include <boost/json.hpp>
 #include <string>
 #include <string_view>
 #include "services/enrichment/wikipedia_service.h"
 std::string WikipediaService::FetchExtract(std::string_view query) {
  const std::string cache_key(query);
  const auto cache_it = this->extract_cache_.find(cache_key);
  if (cache_it != this->extract_cache_.end()) {
    return cache_it->second;
  }
  const std::string encoded = this->client_->UrlEncode(cache_key);
  const std::string url =
      "https://en.wikipedia.org/w/api.php?action=query&titles=" + encoded +
      "&prop=extracts&explaintext=1&format=json";
  const std::string body = this->client_->Get(url);
  boost::system::error_code parse_error;
  boost::json::value doc = boost::json::parse(body, parse_error);
  if (!parse_error && doc.is_object()) {
    try {
      auto& pages = doc.at("query").at("pages").get_object();
      if (!pages.empty()) {
        auto& page = pages.begin()->value().get_object();
        if (page.contains("extract") && page.at("extract").is_string()) {
          const std::string_view extract_view = page.at("extract").as_string();
          std::string extract(extract_view);
          spdlog::debug("WikipediaService fetched {} chars for '{}'",
                        extract.size(), query);
          this->extract_cache_.emplace(cache_key, extract);
          return extract;
        }
      }
      this->extract_cache_.emplace(cache_key, std::string{});
    } catch (const std::exception& e) {
      spdlog::warn(
          "WikipediaService: failed to parse response structure for '{}': "
          "{}",
          query, e.what());
      return {};
    }
  } else if (parse_error) {
    spdlog::warn("WikipediaService: JSON parse error for '{}': {}", query,
                 parse_error.message());
  }
  return {};
 }
--- a/tooling/pipeline/src/services/wikipedia/get_summary.cc
+++ b/tooling/pipeline/src/services/wikipedia/get_summary.cc
@@ -1,47 +0,0 @@
 /**
 * @file wikipedia/get_summary.cc
 * @brief WikipediaService::GetLocationContext() implementation.
 */
 #include <spdlog/spdlog.h>
 #include <string>
 #include "services/enrichment/wikipedia_service.h"
 std::string WikipediaService::GetLocationContext(const Location& loc) {
  if (!client_) {
    return {};
  }
  std::string result;
  std::string region_query(loc.city);
  if (!loc.country.empty()) {
    region_query += ", ";
    region_query += loc.country;
  }
  const std::string beer_query = "beer in " + loc.country;
  const std::string city_beer_query = "beer in " + loc.city;
  auto append_extract = [&result](const std::string& extract) -> void {
    if (extract.empty()) {
      return;
    }
    if (!result.empty()) {
      result += "\n\n";
    }
    result += extract;
  };
  try {
    append_extract(FetchExtract(region_query));
    append_extract(FetchExtract(beer_query));
    append_extract(FetchExtract(city_beer_query));
  } catch (const std::runtime_error& e) {
    spdlog::debug("WikipediaService lookup failed for '{}': {}", region_query,
                  e.what());
  }
  return result;
 }
--- a/tooling/pipeline/src/web_client/http_web_client.cc
+++ b/tooling/pipeline/src/web_client/http_web_client.cc
@@ -12,6 +12,8 @@
 #include <string>
 #include <utility>
 #include "spdlog/spdlog.h"
 namespace {
 constexpr time_t kConnectionTimeoutSeconds = 5;
 constexpr time_t kReadTimeoutSeconds = 10;
@@ -38,8 +40,12 @@ std::string HttpWebClient::Get(const std::string& url) {
  client.set_follow_location(true);
  client.set_connection_timeout(kConnectionTimeoutSeconds);
  client.set_read_timeout(kReadTimeoutSeconds);
  client.set_default_headers({
     {"Accept", "application/json"},
     {"User-Agent", "biergarten-pipeline/1.0"}
 });
-  const auto result = client.Get(path);
+  const httplib::Result result = client.Get(path);
  if (!result) {
    throw std::runtime_error(
@@ -48,6 +54,7 @@ std::string HttpWebClient::Get(const std::string& url) {
  }
  if (result->status < kSuccessMin || result->status >= kSuccessMax) {
    spdlog::error("[HttpWebClient] Request failed for URL: " + url);
    throw std::runtime_error(
        "[HttpWebClient] HTTP " + std::to_string(result->status) +
        " for URL: " + url);
@@ -56,6 +63,6 @@ std::string HttpWebClient::Get(const std::string& url) {
  return result->body;
 }
-std::string HttpWebClient::UrlEncode(const std::string& value) {
+std::string HttpWebClient::EncodeURL(const std::string& value) {
  return httplib::encode_uri_component(value);
 }
Author	SHA1	Message	Date
Aaron Po	5abb3f2e24	Add mock enrichment process	2026-05-14 13:49:59 -04:00
Aaron Po	a057b9197f	Add location count to application options and as a cli arg	2026-05-13 22:04:48 -04:00
Aaron Po	773e7c774b	Add timeout for enrichment, refactor json deserialization	2026-05-13 12:44:30 -04:00
Aaron Po	b7c0b1c8d4	Fix mistake in .gitattributes archive/* is incorrect as it will ignore sub-dirs	2026-05-12 01:05:07 -04:00
Aaron Po	b8ebe03921	Pipeline: Add Runpod docker configuration (#222 ) * Begin work on Runpod docker config * Reduce docker image size * Create .dockerignore	2026-05-12 00:44:09 -04:00
`@@ -1 +1 @@`
	`archive/* linguist-vendored`	`archive/** linguist-vendored`