Commit Graph

527 Commits

Author SHA1 Message Date
Aaron Po
385dcd2095 Merge branch 'main-2.0' into feat/add-sqllite-to-cpp-pipeline 2026-04-18 19:37:48 -04:00
Aaron Po
2fd2a35233 Squashed commit of the following:
commit 898cc8971b
Author: Aaron Po <apo2@uwo.ca>
Date:   Sat Apr 18 19:19:14 2026 -0400

    Create biergarten brewery pipeline project (#199)

commit fd3c172e35
Author: Aaron Po <apo2@uwo.ca>
Date:   Sat Mar 28 20:35:50 2026 -0400

    Schema updates (#191)
2026-04-18 19:34:23 -04:00
Aaron Po
898cc8971b Create biergarten brewery pipeline project (#199) 2026-04-18 19:19:14 -04:00
Aaron Po
1b242e86b5 Improve type safety, update logging, remove unused paths 2026-04-18 19:18:21 -04:00
Aaron Po
8a6cbe5efd Fix stale/inaccurate documentation 2026-04-18 19:00:13 -04:00
Aaron Po
056fb47b93 documentation updates 2026-04-18 18:23:30 -04:00
Aaron Po
88527f7709 make prompt formatter unique ptr 2026-04-18 18:21:00 -04:00
Aaron Po
49f4ed6787 Add activity diagram 2026-04-18 16:01:53 -04:00
Aaron Po
4d4b897d02 add activity diagram 2026-04-18 15:59:25 -04:00
Aaron Po
f71e4ddc83 refactor prompt placeholders for consistency 2026-04-18 15:49:58 -04:00
Aaron Po
212077793e add example to readme 2026-04-18 15:45:31 -04:00
Aaron Po
e6d1954506 update readme/prompts 2026-04-18 15:27:27 -04:00
Aaron Po
ce56532728 Update readme 2026-04-18 12:56:34 -04:00
Aaron Po
9649c993e8 Add local language handling 2026-04-18 01:38:50 -04:00
Aaron Po
f782fdb51d Add localized name/description to data models 2026-04-17 22:08:26 -04:00
Aaron Po
fcc7a5dc8b Enhance ValidateBreweryJson to include reasoning output and update GenerateBrewery to use user_prompt
Add gemma parser
2026-04-17 16:41:14 -04:00
Aaron Po
44a74ed2ad update chatprompt and llama prompt handling 2026-04-16 15:34:47 -04:00
Aaron Po
6682b5de01 fix llama grammar 2026-04-15 23:28:27 -04:00
Aaron Po
62dfb5e14a Add llama grammar to ensure proper json output 2026-04-15 13:39:01 -04:00
Aaron Po
ddf4bcb981 cleanup 2026-04-15 00:22:15 -04:00
Aaron Po
15853c62fd remove const to enable use of std::move 2026-04-13 22:02:31 -04:00
Aaron Po
ff4b7f2578 Use unique_ptr with custom deleter for llama 2026-04-13 21:45:00 -04:00
Aaron Po
3c70c46957 fix include order 2026-04-13 10:03:23 -04:00
Aaron Po
c7abc808ea Fix naming violations, use of magic numbers in web client get 2026-04-13 00:33:48 -04:00
Aaron Po
ef4f47d415 Update all .cpp files to use .cc extension (google style) 2026-04-13 00:14:20 -04:00
Aaron Po
035b30abba updates 2026-04-13 00:14:20 -04:00
Aaron Po
1cd30488eb Code format updates 2026-04-11 23:51:08 -04:00
Aaron Po
823599a96f Fix style guide errors 2026-04-11 23:46:16 -04:00
Aaron Po
56ec728ba7 Refactor Llama generator, helpers, and build assets
make Gemma 4 the default model, enable thinking mode
style updates
2026-04-11 23:35:17 -04:00
Aaron Po
7ca651a886 updates for gemma-4-E4B-it-Q6_K.gguf 2026-04-09 23:59:38 -04:00
Aaron Po
b53f9e5582 fix: llama backend lifetime, Wikipedia enrichment depth, and misc cleanup 2026-04-09 21:59:46 -04:00
Aaron Po
824f5b2b4f Refactor BiergartenDataGenerator to use dependency injection container 2026-04-09 20:46:20 -04:00
Aaron Po
5d93d76e99 Refactor data generator constructor and update web client handling; enhance README with detailed pipeline overview and class diagram 2026-04-09 18:19:12 -04:00
Aaron Po
028786b8b5 updates 2026-04-09 17:26:49 -04:00
Aaron Po
d7a31b5264 Create one method per file 2026-04-09 17:19:04 -04:00
Aaron Po
b31be494d7 Update documentation 2026-04-08 22:24:23 -04:00
Aaron Po
7807f0bc2a Add beer styles json 2026-04-08 21:26:35 -04:00
Aaron Po
772ef0cdfb Update CMakeLists.txt 2026-04-08 21:25:11 -04:00
Aaron Po
a6e2ea21d0 fix include 2026-04-08 21:24:59 -04:00
Aaron Po
a7cbf7507f fix location.h 2026-04-08 21:07:28 -04:00
Aaron Po
3c7e74e3c1 update readme 2026-04-08 11:27:37 -04:00
Aaron Po
b1ac3a6068 fix: remove outdated data source information from help message 2026-04-07 18:02:21 -04:00
Aaron Po
06d329cac5 refactor 2026-04-07 17:55:15 -04:00
Aaron Po
54c403526b fix: improve error handling and logging in data generation pipeline 2026-04-07 13:36:59 -04:00
Aaron Po
b8e96a6d45 replace SQLite geo pipeline with curated in-memory locations 2026-04-07 02:28:15 -04:00
Aaron Po
60ee2ecf74 add prompts 2026-04-03 15:53:04 -04:00
Aaron Po
e4e16a5084 fix: address critical correctness, reliability, and design issues in pipeline
CORRECTNESS FIXES:
- json_loader: Add RollbackTransaction() and call it on exception instead of
  CommitTransaction(). Prevents partial data corruption on parse/disk errors.
- wikipedia_service: Fix invalid MediaWiki API parameter explaintext=true ->
  explaintext=1. Now returns plain text instead of HTML markup in contexts.
- helpers: Fix ParseTwoLineResponse filter to only remove known thinking tags
  (<think>, <reasoning>, <reflect>) instead of any <...> pattern. Prevents
  silently removing legitimate output like <username>content</username>.

RELIABILITY & DESIGN IMPROVEMENTS:
- load/main: Make n_ctx (context window size) configurable via --n-ctx flag
  (default 2048, range 1-32768) to support larger models like Qwen3-14B.
- generate_brewery: Prevent retry prompt growth by extracting location context
  into constant and using compact retry format (error + schema + location only).
  Avoids token truncation on final retry attempts.
- database: Fix data representativeness by changing QueryCities from
  ORDER BY name (alphabetic bias) to ORDER BY RANDOM() for unbiased sampling.
  Convert all SQLITE_STATIC to SQLITE_TRANSIENT to prevent use-after-free risks.

POLISH:
- infer: Advance sampling seed between generation calls to improve diversity
  across brewery and user generation.
- data_downloader: Remove unnecessary commit hash truncation; use full hash.
- json_loader: Fix misleading log message from "RapidJSON" to "Boost.JSON".
2026-04-03 11:58:00 -04:00
Aaron Po
8d306bf691 Update documentation for llama 2026-04-02 23:24:06 -04:00
Aaron Po
077f6ab4ae edit prompt 2026-04-02 22:56:18 -04:00
Aaron Po
534403734a Refactor BiergartenDataGenerator and LlamaGenerator 2026-04-02 22:46:00 -04:00