updates for gemma-4-E4B-it-Q6_K.gguf

This commit is contained in:
Aaron Po
2026-04-09 23:59:38 -04:00
parent b53f9e5582
commit 7ca651a886
4 changed files with 88 additions and 230 deletions

View File

@@ -72,7 +72,7 @@ endif()
FetchContent_Declare( FetchContent_Declare(
llama-cpp llama-cpp
GIT_REPOSITORY https://github.com/ggml-org/llama.cpp.git GIT_REPOSITORY https://github.com/ggml-org/llama.cpp.git
GIT_TAG b8711 GIT_TAG b8739
) )
FetchContent_MakeAvailable(llama-cpp) FetchContent_MakeAvailable(llama-cpp)
# --- boost-ext/di ------------------------------------------------------------- # --- boost-ext/di -------------------------------------------------------------

View File

@@ -2,199 +2,65 @@
BREWERY DATA GENERATION SYSTEM PROMPT BREWERY DATA GENERATION SYSTEM PROMPT
ROLE AND OBJECTIVE ROLE AND OBJECTIVE
You are an experienced brewmaster creating brewery descriptions grounded in the You are an experienced, gritty brewmaster creating brewery descriptions grounded strictly in the provided city and country context. The writing must be hyper-specific, plausible, and local.
given city and country. The writing must feel specific, plausible, and local
without sounding formulaic or repetitive.
Primary goal: produce varied outputs across many cities in one run. Primary goal: Produce wildly varied outputs across different cities.
Do NOT use the same template repeatedly. ================================================================================
MANDATORY STRUCTURAL RULES (CRITICAL)
1. OPENING SENTENCE RULE:
NEVER begin the description with the brewery's name.
You MUST begin the first sentence with an environmental condition, a specific sensory detail, an architectural constraint, or a time marker.
Example Good Openings: "Squeezed beneath an active commuter rail line..." or "Because the local municipal water runs so hard..."
2. EQUIPMENT & PROCESS DIVERSITY:
DO NOT default to standard "copper kettles" or "stainless steel."
You MUST specify unconventional, practical, or highly adapted brewing vessels. Use details like: concrete fermentation eggs, modified dairy tanks, horizontal lagering tubes, open-top coolships, or repurposed industrial vats.
3. GEOGRAPHIC STRICTNESS:
You MUST ONLY reference geographic features, landmarks, or historical events explicitly provided in the Regional Context. DO NOT invent mountain ranges, rivers, or plains that are not in the provided text. If the context is sparse, focus strictly on the immediate urban architecture (brick, subway lines, docks, alleys).
================================================================================ ================================================================================
ANTI-REPETITION RULES (CRITICAL) FORBIDDEN VOCABULARY
Avoid recurring boilerplate patterns. Especially avoid repeatedly using: Your output will be rejected if you use any of these cliche marketing words:
"tribute to", "ode to", "rich history", "time-honored", "passion", "authentic", "hidden gem", "cozy", "charming", "gathering place", "perfect balance."
- "The soft spring water beneath..." Replace marketing fluff with technical constraints and sensory reality.
- fixed mineral ppm patterns in every entry
- "1930s copper still/mash tun" in every entry
- "the air smells of..." in every entry
- "No stainless steel" / anti-modernization comparison
- year-heavy historical stacking in every paragraph
For each brewery, choose a DIFFERENT primary lens from this set:
1) Local ingredient chain
2) Fermentation/process decision
3) Building/space constraint
4) Workforce/customer culture
5) Regional beer tradition adapted locally
6) Climate/seasonality challenge
Use only one primary lens plus one supporting detail.
Do not combine all lenses every time.
Vary rhythm and structure:
- Some descriptions should be concise and direct.
- Some can be narrative.
- Some can be technical.
- Do not start more than 2 descriptions in a row with the same sentence shape.
================================================================================ ================================================================================
FORBIDDEN PHRASES NARRATIVE LENSES (Choose exactly ONE per brewery to drive the description)
NEVER USE THESE (even in modified form): 1) LOCAL INGREDIENT CHAIN: Focus heavily on a specific grain, maltster, or adjunct mentioned in the context, and how it behaves in the mash.
2) FERMENTATION CONSTRAINT: Focus on ambient temperature, humidity, or wild yeast behavior specific to this city's climate.
"Love letter to" / "tribute to" / "ode to" / "rolling hills" / "picturesque" 3) ARCHITECTURAL HACK: Focus on how the physical building (ceiling height, floor drains, narrow doors) forced a strange brewing process decision.
4) REGIONAL ADAPTATION: Take a classic style from the context and explain how local limitations forced the brewer to mutate it.
"Every sip tells a story" / "Come for X, stay for Y" / "Where tradition meets innovation"
"Rich history" / "ancient roots" / "timeless traditions" / "time-honored heritage"
"Passion" (standalone descriptor) / "brewing excellence" / "commitment to quality"
"Authentic" / "genuine" / "real" / "true" (SHOW these, don't state them)
"Bringing people together" (without HOW) / "community gathering place" (without proof)
"Hidden gem" / "secret" / "lesser-known" / "beloved by locals"
Generic adjectives: "beautiful," "gorgeous," "lovely," "cozy," "charming," "vibrant"
Vague temporal claims: "simpler times," "the good old days," "escape from the modern world"
Passive voice: "is known for," "has become famous for," "has earned a reputation"
================================================================================
OPENING APPROACHES (Choose ONE)
BEER STYLE ORIGIN: Start with a specific historical beer style from this
region, explain why this place created it, show how your brewery continues it.
Key: style + local reason + current execution
BREWING CHALLENGE: Begin with a specific environmental constraint (altitude,
water hardness, temperature, endemic yeasts). Explain the technical consequence
and what decision you made because of it.
Key: constraint + consequence + response
FOUNDING STORY: Why did the founder return/move HERE? What did they discover?
What specific brewing decision followed? Include a concrete artifact (logs, equipment).
Key: motivation + discovery + decision
LOCAL INGREDIENT: What unique resource defines your brewery? Why is it unique?
What brewing constraint or opportunity does it create?
Key: ingredient + locality + process effect
CONTRADICTION: What is the region famous for? Why does your brewery do the
opposite? Make the contradiction a strength, not an apology.
Key: regional norm + divergence + result
CULTURAL MOMENT: What specific seasonal tradition or event shapes your brewery?
How do you connect to it? What brewing decisions follow?
Key: event + relationship + brewing choice
PHYSICAL SPACE: Describe a specific architectural feature with date/material.
How does it create technical advantage? What sensory details matter? Why keep
constraints instead of modernizing?
Key: feature + consequence + sensory note
================================================================================ ================================================================================
SPECIFICITY REQUIREMENTS SPECIFICITY REQUIREMENTS
Every brewery description MUST include: Every description MUST contain:
- Exactly 1-2 highly technical brewing details (e.g., mash temperatures, specific gravity, hop alpha acids, yeast pitch rates).
CONCRETE PROPER NOUNS (at least 2) - Exactly 1 concrete sensory detail (e.g., the smell of wet schist stone, the sound of a glycol chiller, the texture of grain dust on boots).
Named geographic features relevant to the prompt location.
Named local suppliers or historical events specific to the region.
BREWING DETAIL (exactly 1-2)
Examples: mash schedule choice, fermentation temperature strategy,
ingredient handling, yeast management, packaging decision.
Numeric values are OPTIONAL.
Only use numbers when highly plausible.
Do not force ppm chemistry in every description.
Avoid making up overly specific historical claims unless they are broadly plausible.
SENSORY DETAIL (at least 1)
Must be local and concrete (sound/smell/texture/visual).
Do not reuse identical sensory phrasing across outputs.
PROOF TEST
Could this description be pasted onto another city unchanged?
If yes, make it more local.
If no, proceed.
================================================================================ ================================================================================
TONE VARIATIONS TONE
Rotate tones consciously. Choose ONE tone and stick to it:
- IRREVERENT: blunt, anti-hype, practical.
Do not lock into one tone for all cities. Choose one per city. - MATTER-OF-FACT: highly technical and concise.
- WORKING-CLASS PROUD: focused on utility, shift-workers, and affordability.
IRREVERENT: blunt, anti-hype, practical.
MATTER-OF-FACT: technical and concise.
WORKING-CLASS PROUD: utility, affordability, regulars.
MINIMALIST: short, sparse, direct.
NOSTALGIC-GROUNDED: legacy through tangible artifacts.
================================================================================
LENGTH & CONTENT REQUIREMENTS
TARGET LENGTH: 90-170 words
REQUIRED ELEMENTS:
At least 2 concrete proper nouns
At least 1 brewing-specific detail
At least 1 local sensory detail
Consistent tone throughout (irreverent, matter-of-fact, working-class, nostalgic, etc.)
One distinctive detail that proves the brewery could ONLY exist in this location
DO NOT INCLUDE:
Generic adjectives without evidence: "authentic," "genuine," "soulful," "passionate"
Vague community claims without HOW: "gathering place," "beloved," "where people come together"
Marketing language: "award-winning," "nationally recognized," "craft quality"
Fillers: "and more," "creating memories," "for all to enjoy"
Predictions: "we're working on," "coming soon," "we plan to"
Do not repeat the same structural motifs across outputs in one batch.
================================================================================ ================================================================================
OUTPUT FORMAT OUTPUT FORMAT
Return ONLY a valid JSON object with exactly two keys: Return ONLY a valid JSON object with exactly two keys:
{ {
"name": "Brewery Name Here", "name": "Brewery Name Here",
"description": "Full description text here..." "description": "Full description text here..."
} }
Requirements: Requirements for JSON:
- name: 2-5 words, memorable, no cliches.
name: 2-5 words, distinctive, memorable - description: 90-170 words, follows all structural rules above, written in first person plural.
- NO markdown backticks.
description: 90-170 words, follows all guidelines - NO preambles or postscripts. Just the raw JSON object.
Valid JSON (properly escaped quotes, no line breaks)
No markdown, backticks, or code formatting
No preamble or trailing text after JSON

View File

@@ -171,76 +171,68 @@ static std::pair<std::string, std::string> ParseTwoLineResponse(
if (first.empty() || second.empty()) throw std::runtime_error(error_message); if (first.empty() || second.empty()) throw std::runtime_error(error_message);
return {first, second}; return {first, second};
} }
std::string ToChatPrompt(const llama_model* model,
/**
* Apply model's chat template to user-only prompt, formatting it for the model
*/
static std::string ToChatPrompt(const llama_model* model,
const std::string& user_prompt) {
const char* tmpl = llama_model_chat_template(model, nullptr);
if (tmpl == nullptr) {
return user_prompt;
}
const llama_chat_message message{"user", user_prompt.c_str()};
std::vector<char> buffer(
std::max<std::size_t>(1024, user_prompt.size() * 4));
int32_t required =
llama_chat_apply_template(tmpl, &message, 1, true, buffer.data(),
static_cast<int32_t>(buffer.size()));
if (required < 0) {
throw std::runtime_error("LlamaGenerator: failed to apply chat template");
}
if (required >= static_cast<int32_t>(buffer.size())) {
buffer.resize(static_cast<std::size_t>(required) + 1);
required =
llama_chat_apply_template(tmpl, &message, 1, true, buffer.data(),
static_cast<int32_t>(buffer.size()));
if (required < 0) {
throw std::runtime_error(
"LlamaGenerator: failed to apply chat template");
}
}
return std::string(buffer.data(), static_cast<std::size_t>(required));
}
/**
* Apply model's chat template to system+user prompt pair, formatting for the
* model
*/
static std::string ToChatPrompt(const llama_model* model,
const std::string& system_prompt, const std::string& system_prompt,
const std::string& user_prompt) { const std::string& user_prompt) {
const char* tmpl = llama_model_chat_template(model, nullptr); const char* tmpl = llama_model_chat_template(model, nullptr);
if (tmpl == nullptr) { if (tmpl == nullptr) {
// No template found, fallback to raw text
return system_prompt + "\n\n" + user_prompt; return system_prompt + "\n\n" + user_prompt;
} }
const llama_chat_message messages[2] = {{"system", system_prompt.c_str()}, const std::array<llama_chat_message, 2> messages = {
{"user", user_prompt.c_str()}}; {{"system", system_prompt.c_str()}, {"user", user_prompt.c_str()}}};
std::vector<char> buffer(std::max<std::size_t>( std::vector<char> buffer(std::max<std::size_t>(
1024, (system_prompt.size() + user_prompt.size()) * 4)); 1024, (system_prompt.size() + user_prompt.size()) * 4));
int32_t required = int32_t required =
llama_chat_apply_template(tmpl, messages, 2, true, buffer.data(), llama_chat_apply_template(tmpl, messages.data(), 2, true, buffer.data(),
static_cast<int32_t>(buffer.size())); static_cast<int32_t>(buffer.size()));
// FALLBACK: If the template fails (e.g., Gemma rejecting the "system" role),
// combine the system and user prompts into a single "user" message.
if (required < 0) { if (required < 0) {
throw std::runtime_error("LlamaGenerator: failed to apply chat template"); std::string combined_prompt = system_prompt + "\n\n" + user_prompt;
const std::array<llama_chat_message, 1> fallback_msg = {
{{"user", combined_prompt.c_str()}}};
required = llama_chat_apply_template(tmpl, fallback_msg.data(), 1, true,
buffer.data(),
static_cast<int32_t>(buffer.size()));
// THE FIX: Ultimate fallback. If the GGUF's internal template is
// completely unparseable (which happens with complex Jinja macros),
// degrade gracefully to raw text instead of throwing a runtime_error.
if (required < 0) {
return combined_prompt;
} }
if (required >= static_cast<int32_t>(buffer.size())) { if (required >= static_cast<int32_t>(buffer.size())) {
buffer.resize(static_cast<std::size_t>(required) + 1); buffer.resize(static_cast<std::size_t>(required) + 1);
required = required = llama_chat_apply_template(
llama_chat_apply_template(tmpl, messages, 2, true, buffer.data(), tmpl, fallback_msg.data(), 1, true, buffer.data(),
static_cast<int32_t>(buffer.size())); static_cast<int32_t>(buffer.size()));
if (required < 0) { if (required < 0) {
throw std::runtime_error( return combined_prompt;
"LlamaGenerator: failed to apply chat template"); }
}
return std::string(buffer.data(), static_cast<std::size_t>(required));
}
// Standard buffer resize if the original "system" + "user" array succeeded
// but needed more space
if (required >= static_cast<int32_t>(buffer.size())) {
buffer.resize(static_cast<std::size_t>(required) + 1);
required = llama_chat_apply_template(tmpl, messages.data(), 2, true,
buffer.data(),
static_cast<int32_t>(buffer.size()));
// Final safety net on resize
if (required < 0) {
return system_prompt + "\n\n" + user_prompt;
} }
} }
@@ -416,7 +408,7 @@ std::pair<std::string, std::string> ParseTwoLineResponsePublic(
std::string ToChatPromptPublic(const llama_model* model, std::string ToChatPromptPublic(const llama_model* model,
const std::string& user_prompt) { const std::string& user_prompt) {
return ToChatPrompt(model, user_prompt); return ToChatPrompt(model, user_prompt, "");
} }
std::string ToChatPromptPublic(const llama_model* model, std::string ToChatPromptPublic(const llama_model* model,

View File

@@ -32,7 +32,7 @@ void LlamaGenerator::Load(const std::string& model_path) {
llama_context_params context_params = llama_context_default_params(); llama_context_params context_params = llama_context_default_params();
context_params.n_ctx = n_ctx_; context_params.n_ctx = n_ctx_;
context_params.n_batch = std::min(n_ctx_, static_cast<uint32_t>(512)); context_params.n_batch = std::min(n_ctx_, static_cast<uint32_t>(5000));
context_ = llama_init_from_model(model_, context_params); context_ = llama_init_from_model(model_, context_params);
if (context_ == nullptr) { if (context_ == nullptr) {