Commit Graph

504 Commits

Author SHA1 Message Date
Aaron Po
f07d48f810 Add missing includes, update readme 2026-04-11 14:31:24 -04:00
Aaron Po
bcfde856fe Split data models into dedicated headers 2026-04-11 13:21:50 -04:00
Aaron Po
5946356083 Style audit: update code to strictly follow Google Style Guide 2026-04-11 11:56:45 -04:00
Aaron Po
ae67fa8566 refactor: consolidate and rename data generation and service files 2026-04-11 00:06:23 -04:00
Aaron Po
8c572a2d07 fix: stabilize Gemma 4 brewery generation
remove misleading turn-token output guidance from the brewery prompt
extract the last balanced JSON object before validation
keep README model setup and run instructions aligned
preserve Gemma 4 sampling defaults and local model usage
2026-04-10 22:25:26 -04:00
Aaron Po
902bda6eb9 eat: make Gemma 4 the default model, enable thinking mode 2026-04-10 21:43:18 -04:00
Aaron Po
61d5077a95 update readme 2026-04-10 00:03:45 -04:00
Aaron Po
7ca651a886 updates for gemma-4-E4B-it-Q6_K.gguf 2026-04-09 23:59:38 -04:00
Aaron Po
b53f9e5582 fix: llama backend lifetime, Wikipedia enrichment depth, and misc cleanup 2026-04-09 21:59:46 -04:00
Aaron Po
824f5b2b4f Refactor BiergartenDataGenerator to use dependency injection container 2026-04-09 20:46:20 -04:00
Aaron Po
5d93d76e99 Refactor data generator constructor and update web client handling; enhance README with detailed pipeline overview and class diagram 2026-04-09 18:19:12 -04:00
Aaron Po
028786b8b5 updates 2026-04-09 17:26:49 -04:00
Aaron Po
d7a31b5264 Create one method per file 2026-04-09 17:19:04 -04:00
Aaron Po
b31be494d7 Update documentation 2026-04-08 22:24:23 -04:00
Aaron Po
7807f0bc2a Add beer styles json 2026-04-08 21:26:35 -04:00
Aaron Po
772ef0cdfb Update CMakeLists.txt 2026-04-08 21:25:11 -04:00
Aaron Po
a6e2ea21d0 fix include 2026-04-08 21:24:59 -04:00
Aaron Po
a7cbf7507f fix location.h 2026-04-08 21:07:28 -04:00
Aaron Po
3c7e74e3c1 update readme 2026-04-08 11:27:37 -04:00
Aaron Po
b1ac3a6068 fix: remove outdated data source information from help message 2026-04-07 18:02:21 -04:00
Aaron Po
06d329cac5 refactor 2026-04-07 17:55:15 -04:00
Aaron Po
54c403526b fix: improve error handling and logging in data generation pipeline 2026-04-07 13:36:59 -04:00
Aaron Po
b8e96a6d45 replace SQLite geo pipeline with curated in-memory locations 2026-04-07 02:28:15 -04:00
Aaron Po
60ee2ecf74 add prompts 2026-04-03 15:53:04 -04:00
Aaron Po
e4e16a5084 fix: address critical correctness, reliability, and design issues in pipeline
CORRECTNESS FIXES:
- json_loader: Add RollbackTransaction() and call it on exception instead of
  CommitTransaction(). Prevents partial data corruption on parse/disk errors.
- wikipedia_service: Fix invalid MediaWiki API parameter explaintext=true ->
  explaintext=1. Now returns plain text instead of HTML markup in contexts.
- helpers: Fix ParseTwoLineResponse filter to only remove known thinking tags
  (<think>, <reasoning>, <reflect>) instead of any <...> pattern. Prevents
  silently removing legitimate output like <username>content</username>.

RELIABILITY & DESIGN IMPROVEMENTS:
- load/main: Make n_ctx (context window size) configurable via --n-ctx flag
  (default 2048, range 1-32768) to support larger models like Qwen3-14B.
- generate_brewery: Prevent retry prompt growth by extracting location context
  into constant and using compact retry format (error + schema + location only).
  Avoids token truncation on final retry attempts.
- database: Fix data representativeness by changing QueryCities from
  ORDER BY name (alphabetic bias) to ORDER BY RANDOM() for unbiased sampling.
  Convert all SQLITE_STATIC to SQLITE_TRANSIENT to prevent use-after-free risks.

POLISH:
- infer: Advance sampling seed between generation calls to improve diversity
  across brewery and user generation.
- data_downloader: Remove unnecessary commit hash truncation; use full hash.
- json_loader: Fix misleading log message from "RapidJSON" to "Boost.JSON".
2026-04-03 11:58:00 -04:00
Aaron Po
8d306bf691 Update documentation for llama 2026-04-02 23:24:06 -04:00
Aaron Po
077f6ab4ae edit prompt 2026-04-02 22:56:18 -04:00
Aaron Po
534403734a Refactor BiergartenDataGenerator and LlamaGenerator 2026-04-02 22:46:00 -04:00
Aaron Po
3af053f0eb format codebase 2026-04-02 21:46:46 -04:00
Aaron Po
ba165d8aa7 Separate llama generator class src file into method files 2026-04-02 21:37:46 -04:00
Aaron Po
eb9a2767b4 Refactor web client interface and related components 2026-04-02 18:55:58 -04:00
Aaron Po
29ea47fdb6 update cli arg handling 2026-04-02 18:41:25 -04:00
Aaron Po
52e2333304 Reorganize directory structure 2026-04-02 18:27:01 -04:00
Aaron Po
a1f0ca5b20 Refactor DataDownloader and CURLWebClient: update constructor and modify FileExists method signature 2026-04-02 18:06:40 -04:00
Aaron Po
2ea8aa52b4 update readme and add clangformat and clang tidy 2026-04-02 17:12:22 -04:00
Aaron Po
98083ab40c Pipeline: add CURL/WebClient & Wikipedia service
Introduce a pluggable web client interface and concrete CURL implementation: adds IWebClient, CURLWebClient, and CurlGlobalState (headers + curl_web_client.cpp). DataDownloader now accepts an IWebClient and delegates downloads. Add WikipediaService for cached Wikipedia summary lookups. Refactor SqliteDatabase to return full City records and update consumers accordingly. Improve JsonLoader to use batched transactions during streaming parses. Enhance LlamaGenerator with sampling options, increased token limits, JSON extraction/validation, and other parsing helpers. Modernize CMake: set policy/version, add project_options, simplify FetchContent usage (spdlog), require Boost components (program_options/json), list pipeline sources explicitly, and tweak post-build/memcheck targets. Update README to match implementation changes and new CLI/config conventions.
2026-04-02 16:29:16 -04:00
Aaron Po
ac136f7179 Enhance brewery generation: add country name parameter and improve prompt handling 2026-04-02 01:04:41 -04:00
Aaron Po
280c9c61bd Implement Llama-based brewery and user data generation; remove mock generator and related files 2026-04-01 23:29:16 -04:00
Aaron Po
248a51b35f cleanup 2026-04-01 21:35:02 -04:00
Aaron Po
35aa7bc0df Begin work on biergarten data generator pipeline 2026-04-01 21:18:45 -04:00
Aaron Po
581863d69b Website updates: add new app scaffold, archive legacy site, and refresh docs/tooling (#173) 2026-03-15 22:56:14 -04:00
Aaron Po
9238036042 Add resend confirmation email feature (#166) 2026-03-07 23:03:31 -05:00
Aaron Po
431e11e052 Add WEBSITE_BASE_URL environment variable and update email confirmation link (#165) 2026-03-07 20:11:50 -05:00
Aaron Po
f1194d3da8 Feature: Add token validation, basic confirmation workflow (#164) 2026-03-06 23:23:43 -05:00
Aaron Po
17eb04e20c Update diagrams 2026-02-21 05:04:04 -05:00
Aaron Po
50c2f5dfda Update documentation (#156) 2026-02-21 05:02:22 -05:00
Aaron Po
c5683df4b6 add IEmailService to the DI container (#154) 2026-02-19 22:04:30 -05:00
Aaron Po
2cad88e3f6 Service refactor (#153)
* remove email out of register service

* Update auth service, move JWT handling out of controller

* add docker config for service auth test

* Update mock email system

* Format: ./src/Core/Service

* Refactor authentication payloads and services for registration and login processes

* Format: src/Core/API, src/Core/Service
2026-02-16 15:12:59 -05:00
Aaron Po
0d52c937ce Adding service layer testing (#151) 2026-02-14 21:17:39 -05:00
Aaron Po
6b66f5680f Add user registration emails + email infrastructure (#150)
* Add email functionality

* Add email template project and rendering service

* Update email template dir structure

* Add email header and footer components for user registration template

* update example env

* Refactor email templates namespace and components

* Format email dir
2026-02-13 21:46:19 -05:00