Add pipeline guide and enhance CMake configuration for llama integration

This commit is contained in:
Aaron Po
2026-03-28 14:16:31 -04:00
parent ad1adfeb62
commit 7f1ca2050c
4 changed files with 651 additions and 76 deletions

128
pipeline/README.md Normal file
View File

@@ -0,0 +1,128 @@
# Pipeline Guide
This guide documents the end-to-end pipeline workflow for:
- Building the C++ pipeline executable
- Installing a lightweight GGUF model for llama.cpp
- Running the pipeline with either default or explicit model path
- Re-running from a clean build directory
## Prerequisites
- CMake 3.20+
- A C++ compiler (Apple Clang on macOS works)
- Internet access to download model files
- Hugging Face CLI (`hf`) from `huggingface_hub`
## Build
From repository root:
```bash
cmake -S pipeline -B pipeline/dist
cmake --build pipeline/dist -j4
```
Expected executable:
- `pipeline/dist/biergarten-pipeline`
## Install Hugging Face CLI
Recommended on macOS:
```bash
brew install pipx
pipx ensurepath
pipx install huggingface_hub
```
If your shell cannot find `hf`, use the full path:
- `~/.local/bin/hf`
## Install a Lightweight Model (POC)
The recommended proof-of-concept model is:
- `Qwen/Qwen2.5-0.5B-Instruct-GGUF`
- File: `qwen2.5-0.5b-instruct-q4_k_m.gguf`
From `pipeline/dist`:
```bash
cd pipeline/dist
mkdir -p models
~/.local/bin/hf download Qwen/Qwen2.5-0.5B-Instruct-GGUF qwen2.5-0.5b-instruct-q4_k_m.gguf --local-dir models
```
## Run
### Option A: Explicit model path (recommended)
```bash
cd pipeline/dist
./biergarten-pipeline --model models/qwen2.5-0.5b-instruct-q4_k_m.gguf
```
### Option B: Default model path
If you want to use default startup behavior, place a model at:
- `pipeline/dist/models/llama-2-7b-chat.gguf`
Then run:
```bash
cd pipeline/dist
./biergarten-pipeline
```
## Output Files
The pipeline writes output to:
- `pipeline/dist/output/breweries.json`
- `pipeline/dist/output/beer-styles.json`
- `pipeline/dist/output/beer-posts.json`
## Clean Re-run Process
If you want to redo from a clean dist state:
```bash
rm -rf pipeline/dist
cmake -S pipeline -B pipeline/dist
cmake --build pipeline/dist -j4
cd pipeline/dist
mkdir -p models
~/.local/bin/hf download Qwen/Qwen2.5-0.5B-Instruct-GGUF qwen2.5-0.5b-instruct-q4_k_m.gguf --local-dir models
./biergarten-pipeline --model models/qwen2.5-0.5b-instruct-q4_k_m.gguf
```
## Troubleshooting
### `zsh: command not found: huggingface-cli`
The app name from `huggingface_hub` is `hf`, not `huggingface-cli`.
Use:
```bash
~/.local/bin/hf --help
```
### `Model file not found ...`
- Confirm you are running from `pipeline/dist`.
- Confirm the file path passed to `--model` exists.
- If not using `--model`, ensure the default file exists at `models/llama-2-7b-chat.gguf` relative to current working directory.
### CMake cache/path mismatch
Use explicit source/build paths:
```bash
cmake -S /absolute/path/to/pipeline -B /absolute/path/to/pipeline/dist
cmake --build /absolute/path/to/pipeline/dist -j4
```