mirror of
https://github.com/aaronpo97/the-biergarten-app.git
synced 2026-06-01 10:04:00 +00:00
fix: stabilize Gemma 4 brewery generation
remove misleading turn-token output guidance from the brewery prompt extract the last balanced JSON object before validation keep README model setup and run instructions aligned preserve Gemma 4 sampling defaults and local model usage
This commit is contained in:
@@ -1,8 +1,8 @@
|
|||||||
# Biergarten Pipeline
|
# Biergarten Pipeline
|
||||||
|
|
||||||
Biergarten Pipeline is a C++23 command-line tool that reads a local city list, resolves contextual enrichment for each sampled city through an injected service, and generates brewery names and descriptions. The current code samples up to four locations per run, then uses either a local GGUF model or the mock generator to produce the output.
|
Biergarten Pipeline is a C++23 command-line tool that reads a local city list, resolves contextual enrichment for each sampled city through an injected service, and generates brewery names and descriptions. The current code samples up to four locations per run, then uses either Gemma 4 or the mock generator to produce the output.
|
||||||
|
|
||||||
## Hardware & GPU Config
|
## Tested Hardware & OS
|
||||||
|
|
||||||
### x86/64 Linux, NVIDIA RTX 2000
|
### x86/64 Linux, NVIDIA RTX 2000
|
||||||
|
|
||||||
@@ -10,7 +10,7 @@ Biergarten Pipeline is a C++23 command-line tool that reads a local city list, r
|
|||||||
- **CPU**: Intel Core Ultra 7 155H
|
- **CPU**: Intel Core Ultra 7 155H
|
||||||
- **GPU**: NVIDIA RTX 2000 Ada Generation
|
- **GPU**: NVIDIA RTX 2000 Ada Generation
|
||||||
- **Memory**: 32GB
|
- **Memory**: 32GB
|
||||||
- **Model**: Qwen3-8B-Q6-K
|
- **Model**: Gemma 4 E4B: efficient local reasoning; released Apr 2, 2026.
|
||||||
- **Inference**: llama.cpp with CUDA 12.x support
|
- **Inference**: llama.cpp with CUDA 12.x support
|
||||||
|
|
||||||
### ARM MacOS, M1 Pro
|
### ARM MacOS, M1 Pro
|
||||||
@@ -19,7 +19,7 @@ Biergarten Pipeline is a C++23 command-line tool that reads a local city list, r
|
|||||||
- **CPU**: Apple M1 Pro (8-core)
|
- **CPU**: Apple M1 Pro (8-core)
|
||||||
- **GPU**: Apple M1 Pro (14-core) [Integrated]
|
- **GPU**: Apple M1 Pro (14-core) [Integrated]
|
||||||
- **Memory**: 16GB
|
- **Memory**: 16GB
|
||||||
- **Model**: gemma-4-E4B-it-Q6_K.gguf
|
- **Model**: Gemma 4 E4B: efficient local reasoning; released Apr 2, 2026.
|
||||||
- **Inference**: llama.cpp with Metal (MPS) support
|
- **Inference**: llama.cpp with Metal (MPS) support
|
||||||
|
|
||||||
## Pipeline
|
## Pipeline
|
||||||
@@ -54,7 +54,7 @@ If an enrichment lookup throws, the pipeline skips that city and keeps going. If
|
|||||||
| libcurl | Required for Wikipedia requests. |
|
| libcurl | Required for Wikipedia requests. |
|
||||||
| Optional GPU tooling | CUDA on NVIDIA, HIP/ROCm on supported AMD systems, Metal on Apple Silicon. |
|
| Optional GPU tooling | CUDA on NVIDIA, HIP/ROCm on supported AMD systems, Metal on Apple Silicon. |
|
||||||
|
|
||||||
Boost, Boost.DI, spdlog, and llama.cpp are fetched by CMake. On Apple Silicon, Metal is enabled automatically. On Linux, the build looks for CUDA or HIP/ROCm when the matching toolkit is present. Windows is not supported.
|
Boost, Boost.DI, spdlog, and llama.cpp are fetched by CMake. On Apple Silicon, Metal is enabled automatically. On Linux, the build looks for CUDA or HIP/ROCm when the matching toolkit is present. There are no plans to support Windows.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cmake -S . -B build
|
cmake -S . -B build
|
||||||
@@ -63,19 +63,30 @@ cmake --build build
|
|||||||
|
|
||||||
If the dependency build fails on macOS, check the repo build notes.
|
If the dependency build fails on macOS, check the repo build notes.
|
||||||
|
|
||||||
|
## Model
|
||||||
|
|
||||||
|
Create a `models/` directory and download the GGUF file there before running the app.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p models
|
||||||
|
curl -L \
|
||||||
|
-o models/google_gemma-4-E4B-it-Q6_K.gguf \
|
||||||
|
https://huggingface.co/bartowski/google_gemma-4-E4B-it-GGUF/resolve/main/google_gemma-4-E4B-it-Q6_K.gguf?download=true
|
||||||
|
```
|
||||||
|
|
||||||
## Run
|
## Run
|
||||||
|
|
||||||
Run the executable from the build directory so the copied `locations.json` is available.
|
Run the executable from the build directory so the copied `locations.json` is available.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
./biergarten-pipeline --mocked
|
./biergarten-pipeline --mocked
|
||||||
./biergarten-pipeline --model /path/to/model.gguf --temperature 1.0 --top-p 0.95 --top-k 64 --n-ctx 8192 --seed -1
|
./biergarten-pipeline --model models/google_gemma-4-E4B-it-Q6_K.gguf --temperature 1.0 --top-p 0.95 --top-k 64 --n-ctx 8192 --seed -1
|
||||||
```
|
```
|
||||||
|
|
||||||
| Flag | Purpose |
|
| Flag | Purpose |
|
||||||
| --------------- | -------------------------------------------- |
|
| --------------- | ---------------------------------------------------------------------------- |
|
||||||
| `--mocked` | Uses the mock generator instead of a model. |
|
| `--mocked` | Uses the mock generator instead of a model. |
|
||||||
| `--model, -m` | Path to a GGUF model file. |
|
| `--model, -m` | Path to a GGUF model file, such as `models/google_gemma-4-E4B-it-Q6_K.gguf`. |
|
||||||
| `--temperature` | Sampling temperature. Default: `1.0`. |
|
| `--temperature` | Sampling temperature. Default: `1.0`. |
|
||||||
| `--top-p` | Nucleus sampling parameter. Default: `0.95`. |
|
| `--top-p` | Nucleus sampling parameter. Default: `0.95`. |
|
||||||
| `--top-k` | Top-k sampling parameter. Default: `64`. |
|
| `--top-k` | Top-k sampling parameter. Default: `64`. |
|
||||||
|
|||||||
@@ -77,4 +77,12 @@ std::string ValidateBreweryJsonPublic(const std::string& raw,
|
|||||||
std::string& name_out,
|
std::string& name_out,
|
||||||
std::string& description_out);
|
std::string& description_out);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief Extracts the last balanced JSON object from text.
|
||||||
|
*
|
||||||
|
* @param text Input text.
|
||||||
|
* @return Extracted JSON object or an empty string if none exists.
|
||||||
|
*/
|
||||||
|
std::string ExtractLastJsonObjectPublic(const std::string& text);
|
||||||
|
|
||||||
#endif // BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_HELPERS_H_
|
#endif // BIERGARTEN_PIPELINE_DATA_GENERATION_LLAMA_GENERATOR_HELPERS_H_
|
||||||
|
|||||||
@@ -1,10 +1,7 @@
|
|||||||
<|think|>
|
<|think|>
|
||||||
CRITICAL INSTRUCTION: You must use the <|think|> token to reason through the brewery's details before providing the final JSON output. Inside the think block, verify that you are not using blacklisted terms and that your technical and architectural details are unique.
|
Think through the brewery details internally before answering.
|
||||||
|
Return only one raw JSON object as the final answer, with exactly two keys: "name" and "description".
|
||||||
Structure your response as follows:
|
No markdown, code fences, preamble, or extra keys.
|
||||||
[Your reasoning and constraint checklist go here]
|
|
||||||
<|turn|>
|
|
||||||
{"name": "...", "description": "..."}
|
|
||||||
|
|
||||||
# FULL SYSTEM PROMPT
|
# FULL SYSTEM PROMPT
|
||||||
|
|
||||||
@@ -28,7 +25,7 @@ $$Information about local beer culture, history, or geography$$
|
|||||||
|
|
||||||
## CRITICAL OUTPUT FORMAT (READ CAREFULLY):
|
## CRITICAL OUTPUT FORMAT (READ CAREFULLY):
|
||||||
|
|
||||||
You have to return a reasoning block first, then ONLY raw, perfectly valid JSON after the `<|turn|>` separator. Any mistake with the JSON means the data pipeline breaks.
|
You have to return a reasoning block first, then ONLY raw, perfectly valid JSON as the final answer. Any mistake with the JSON means the data pipeline breaks.
|
||||||
|
|
||||||
ABSOLUTELY NO MARKDOWN FORMATTING. Do NOT wrap your response in json or ``` blocks.
|
ABSOLUTELY NO MARKDOWN FORMATTING. Do NOT wrap your response in json or ``` blocks.
|
||||||
|
|
||||||
|
|||||||
@@ -26,8 +26,9 @@ auto ExtractFinalJsonPayload(std::string raw_response) -> std::string {
|
|||||||
return text.substr(first, last - first + 1);
|
return text.substr(first, last - first + 1);
|
||||||
};
|
};
|
||||||
|
|
||||||
static const std::array<std::string_view, 4> separator_tokens = {
|
static const std::array<std::string_view, 6> separator_tokens = {
|
||||||
"<|turn|>", "<turn|>", "<channel|>", "<|channel|>"};
|
"<|think|>", "<think|>", "<|turn|>",
|
||||||
|
"<turn|>", "<channel|>", "<|channel|>"};
|
||||||
|
|
||||||
std::size_t separator_pos = std::string::npos;
|
std::size_t separator_pos = std::string::npos;
|
||||||
std::size_t separator_length = 0;
|
std::size_t separator_length = 0;
|
||||||
@@ -46,20 +47,15 @@ auto ExtractFinalJsonPayload(std::string raw_response) -> std::string {
|
|||||||
}
|
}
|
||||||
|
|
||||||
const std::string_view trimmed = trim(raw_response);
|
const std::string_view trimmed = trim(raw_response);
|
||||||
const std::size_t first_brace = trimmed.find('{');
|
std::string json_candidate =
|
||||||
if (first_brace == std::string_view::npos) {
|
ExtractLastJsonObjectPublic(std::string(trimmed));
|
||||||
|
if (!json_candidate.empty()) {
|
||||||
|
return ExtractLastJsonObjectPublic(std::string(trimmed));
|
||||||
|
}
|
||||||
|
|
||||||
return std::string(trimmed);
|
return std::string(trimmed);
|
||||||
}
|
}
|
||||||
|
|
||||||
const std::size_t last_brace = trimmed.find_last_of('}');
|
|
||||||
if (last_brace == std::string_view::npos || last_brace < first_brace) {
|
|
||||||
return std::string(trimmed.substr(first_brace));
|
|
||||||
}
|
|
||||||
|
|
||||||
return std::string(
|
|
||||||
trimmed.substr(first_brace, last_brace - first_brace + 1));
|
|
||||||
}
|
|
||||||
|
|
||||||
} // namespace
|
} // namespace
|
||||||
|
|
||||||
auto LlamaGenerator::GenerateBrewery(const BreweryLocation& location,
|
auto LlamaGenerator::GenerateBrewery(const BreweryLocation& location,
|
||||||
@@ -147,9 +143,11 @@ auto LlamaGenerator::GenerateBrewery(const BreweryLocation& location,
|
|||||||
// limits.
|
// limits.
|
||||||
prompt =
|
prompt =
|
||||||
"Your previous response was invalid. Error: " + validation_error +
|
"Your previous response was invalid. Error: " + validation_error +
|
||||||
"\nReturn ONLY valid JSON with this exact schema: "
|
"\nReturn ONLY valid JSON with exactly these keys: "
|
||||||
"{\"name\": \"string\", \"description\": \"string\"}."
|
"{\"name\": \"<brewery name>\", "
|
||||||
"\nDo not include markdown, comments, or extra keys.";
|
"\"description\": \"<single-paragraph description>\"}."
|
||||||
|
"\nDo not include markdown, comments, extra keys, or literal "
|
||||||
|
"placeholder values.";
|
||||||
prompt += "\n\n";
|
prompt += "\n\n";
|
||||||
prompt += retry_location;
|
prompt += retry_location;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -263,12 +263,14 @@ static void AppendTokenPiece(const llama_vocab* vocab, llama_token token,
|
|||||||
output.append(buffer.data(), static_cast<std::size_t>(bytes));
|
output.append(buffer.data(), static_cast<std::size_t>(bytes));
|
||||||
}
|
}
|
||||||
|
|
||||||
static bool ExtractFirstJsonObject(const std::string& text,
|
static bool ExtractLastJsonObject(const std::string& text,
|
||||||
std::string& json_out) {
|
std::string& json_out) {
|
||||||
std::size_t start = std::string::npos;
|
std::size_t start = std::string::npos;
|
||||||
int depth = 0;
|
int depth = 0;
|
||||||
bool in_string = false;
|
bool in_string = false;
|
||||||
bool escaped = false;
|
bool escaped = false;
|
||||||
|
bool found = false;
|
||||||
|
std::string candidate;
|
||||||
|
|
||||||
for (std::size_t i = 0; i < text.size(); ++i) {
|
for (std::size_t i = 0; i < text.size(); ++i) {
|
||||||
const char ch = text[i];
|
const char ch = text[i];
|
||||||
@@ -303,15 +305,29 @@ static bool ExtractFirstJsonObject(const std::string& text,
|
|||||||
}
|
}
|
||||||
--depth;
|
--depth;
|
||||||
if (depth == 0 && start != std::string::npos) {
|
if (depth == 0 && start != std::string::npos) {
|
||||||
json_out = text.substr(start, i - start + 1);
|
candidate = text.substr(start, i - start + 1);
|
||||||
return true;
|
found = true;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (!found) {
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
json_out = std::move(candidate);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
std::string ExtractLastJsonObjectPublic(const std::string& text) {
|
||||||
|
std::string extracted;
|
||||||
|
if (ExtractLastJsonObject(text, extracted)) {
|
||||||
|
return extracted;
|
||||||
|
}
|
||||||
|
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
|
||||||
static std::string ValidateBreweryJson(const std::string& raw,
|
static std::string ValidateBreweryJson(const std::string& raw,
|
||||||
std::string& name_out,
|
std::string& name_out,
|
||||||
std::string& description_out) {
|
std::string& description_out) {
|
||||||
@@ -371,7 +387,7 @@ static std::string ValidateBreweryJson(const std::string& raw,
|
|||||||
std::string validation_error;
|
std::string validation_error;
|
||||||
if (ec) {
|
if (ec) {
|
||||||
std::string extracted;
|
std::string extracted;
|
||||||
if (!ExtractFirstJsonObject(raw, extracted)) {
|
if (!ExtractLastJsonObject(raw, extracted)) {
|
||||||
return "JSON parse error: " + ec.message();
|
return "JSON parse error: " + ec.message();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user