mirror of
https://github.com/aaronpo97/the-biergarten-app.git
synced 2026-04-05 18:09:04 +00:00
load cities from external source, develop multithreaded parser
This commit is contained in:
@@ -1,128 +1,266 @@
|
||||
# Pipeline Guide
|
||||
# Brewery Pipeline Documentation Index
|
||||
|
||||
This guide documents the end-to-end pipeline workflow for:
|
||||
Complete guide to all pipeline documentation - choose your learning path based on your needs.
|
||||
|
||||
- Building the C++ pipeline executable
|
||||
- Installing a lightweight GGUF model for llama.cpp
|
||||
- Running the pipeline with either default or explicit model path
|
||||
- Re-running from a clean build directory
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
## Quick Navigation
|
||||
|
||||
- CMake 3.20+
|
||||
- A C++ compiler (Apple Clang on macOS works)
|
||||
- Internet access to download model files
|
||||
- Hugging Face CLI (`hf`) from `huggingface_hub`
|
||||
### 🚀 I Want to Run It Now (5 minutes)
|
||||
|
||||
## Build
|
||||
Start here if you want to see the pipeline in action immediately:
|
||||
|
||||
From repository root:
|
||||
1. **[QUICK-START.md](./QUICK-START.md)** (this directory)
|
||||
- Copy-paste build commands
|
||||
- Run the pipeline in 2 minutes
|
||||
- Make 4 simple modifications to learn
|
||||
- Common troubleshooting
|
||||
|
||||
```bash
|
||||
cmake -S pipeline -B pipeline/dist
|
||||
cmake --build pipeline/dist -j4
|
||||
```
|
||||
---
|
||||
|
||||
Expected executable:
|
||||
### 📚 I Want to Understand the Code (1 hour)
|
||||
|
||||
- `pipeline/dist/biergarten-pipeline`
|
||||
To learn how the pipeline works internally:
|
||||
|
||||
## Install Hugging Face CLI
|
||||
1. **[QUICK-START.md](./QUICK-START.md)** - Run it first (5 min)
|
||||
2. **[CODE-READING-GUIDE.md](./CODE-READING-GUIDE.md)** - Learn to read code (30 min)
|
||||
- Recommended reading order for all 5 source files
|
||||
- Code pattern explanations with examples
|
||||
- Trace a city through the entire pipeline
|
||||
- Testing strategies
|
||||
3. **[../docs/pipeline-guide.md](../docs/pipeline-guide.md)** - Full system overview (20 min)
|
||||
- Architecture and data flow diagrams
|
||||
- Description of each component
|
||||
- Performance characteristics
|
||||
|
||||
Recommended on macOS:
|
||||
---
|
||||
|
||||
```bash
|
||||
brew install pipx
|
||||
pipx ensurepath
|
||||
pipx install huggingface_hub
|
||||
```
|
||||
### 🏗️ I Want to Understand the Architecture (1.5 hours)
|
||||
|
||||
If your shell cannot find `hf`, use the full path:
|
||||
To understand WHY the system was designed this way:
|
||||
|
||||
- `~/.local/bin/hf`
|
||||
1. Read the above "Understand the Code" path first
|
||||
2. **[../docs/pipeline-architecture.md](../docs/pipeline-architecture.md)** - Design deep dive (30 min)
|
||||
- 5 core design principles with trade-offs
|
||||
- Detailed threading model (3-level hierarchy)
|
||||
- Mutex contention analysis
|
||||
- Future optimization opportunities
|
||||
- Lessons learned
|
||||
|
||||
## Install a Lightweight Model (POC)
|
||||
---
|
||||
|
||||
The recommended proof-of-concept model is:
|
||||
### 💻 I Want to Modify the Code (2+ hours)
|
||||
|
||||
- `Qwen/Qwen2.5-0.5B-Instruct-GGUF`
|
||||
- File: `qwen2.5-0.5b-instruct-q4_k_m.gguf`
|
||||
To extend or improve the pipeline:
|
||||
|
||||
From `pipeline/dist`:
|
||||
1. Complete the "Understand the Architecture" path above
|
||||
2. Choose your enhancement:
|
||||
- **Add Real LLM**: See "Future Implementation" in [../docs/pipeline-architecture.md](../docs/pipeline-architecture.md)
|
||||
- **Export Results**: Modify [src/main.cpp](./src/main.cpp) to write JSON
|
||||
- **Change Templates**: Edit [src/generator.cpp](./src/generator.cpp)
|
||||
- **Add Features**: Read inline code comments for guidance
|
||||
|
||||
```bash
|
||||
cd pipeline/dist
|
||||
mkdir -p models
|
||||
~/.local/bin/hf download Qwen/Qwen2.5-0.5B-Instruct-GGUF qwen2.5-0.5b-instruct-q4_k_m.gguf --local-dir models
|
||||
```
|
||||
---
|
||||
|
||||
## Run
|
||||
## Documentation File Structure
|
||||
|
||||
### Option A: Explicit model path (recommended)
|
||||
### In `/pipeline/` (Code-Level Documentation)
|
||||
|
||||
```bash
|
||||
cd pipeline/dist
|
||||
./biergarten-pipeline --model models/qwen2.5-0.5b-instruct-q4_k_m.gguf
|
||||
```
|
||||
| File | Purpose | Time |
|
||||
| -------------------------------------------------- | -------------------------------------- | ------ |
|
||||
| [QUICK-START.md](./QUICK-START.md) | Run in 5 minutes + learn basic changes | 15 min |
|
||||
| [CODE-READING-GUIDE.md](./CODE-READING-GUIDE.md) | How to read the source code | 30 min |
|
||||
| [includes/generator.h](./includes/generator.h) | Generator class interface | 5 min |
|
||||
| [includes/json_loader.h](./includes/json_loader.h) | JSON loader interface | 5 min |
|
||||
| [includes/database.h](./includes/database.h) | Database interface | 5 min |
|
||||
| [src/main.cpp](./src/main.cpp) | Pipeline orchestration | 10 min |
|
||||
| [src/generator.cpp](./src/generator.cpp) | Brewery name generation | 5 min |
|
||||
| [src/json_loader.cpp](./src/json_loader.cpp) | Threading and JSON parsing | 15 min |
|
||||
| [src/database.cpp](./src/database.cpp) | SQLite operations | 10 min |
|
||||
|
||||
### Option B: Default model path
|
||||
### In `/docs/` (System-Level Documentation)
|
||||
|
||||
If you want to use default startup behavior, place a model at:
|
||||
| File | Purpose | Time |
|
||||
| ------------------------------------------------------ | ---------------------------------- | ------ |
|
||||
| [pipeline-guide.md](./pipeline-guide.md) | Complete system guide | 30 min |
|
||||
| [pipeline-architecture.md](./pipeline-architecture.md) | Design decisions and rationale | 30 min |
|
||||
| [getting-started.md](./getting-started.md) | Original getting started (general) | 10 min |
|
||||
| [architecture.md](./architecture.md) | General app architecture | 20 min |
|
||||
|
||||
- `pipeline/dist/models/llama-2-7b-chat.gguf`
|
||||
---
|
||||
|
||||
Then run:
|
||||
## Learning Paths by Role
|
||||
|
||||
```bash
|
||||
cd pipeline/dist
|
||||
./biergarten-pipeline
|
||||
```
|
||||
### 👨💻 Software Engineer (New to Project)
|
||||
|
||||
## Output Files
|
||||
**Goal**: Understand codebase, make modifications
|
||||
|
||||
The pipeline writes output to:
|
||||
**Path** (1.5 hours):
|
||||
|
||||
- `pipeline/dist/output/breweries.json`
|
||||
- `pipeline/dist/output/beer-styles.json`
|
||||
- `pipeline/dist/output/beer-posts.json`
|
||||
1. [QUICK-START.md](./QUICK-START.md) (15 min)
|
||||
2. [CODE-READING-GUIDE.md](./CODE-READING-GUIDE.md) (30 min)
|
||||
3. Do Modification #1 and #3 (15 min)
|
||||
4. Read [../docs/pipeline-guide.md](../docs/pipeline-guide.md) Components section (20 min)
|
||||
5. Start exploring code + inline comments (variable)
|
||||
|
||||
## Clean Re-run Process
|
||||
---
|
||||
|
||||
If you want to redo from a clean dist state:
|
||||
### 🏗️ System Architect
|
||||
|
||||
```bash
|
||||
rm -rf pipeline/dist
|
||||
cmake -S pipeline -B pipeline/dist
|
||||
cmake --build pipeline/dist -j4
|
||||
cd pipeline/dist
|
||||
mkdir -p models
|
||||
~/.local/bin/hf download Qwen/Qwen2.5-0.5B-Instruct-GGUF qwen2.5-0.5b-instruct-q4_k_m.gguf --local-dir models
|
||||
./biergarten-pipeline --model models/qwen2.5-0.5b-instruct-q4_k_m.gguf
|
||||
```
|
||||
**Goal**: Understand design decisions, future roadmap
|
||||
|
||||
## Troubleshooting
|
||||
**Path** (2 hours):
|
||||
|
||||
### `zsh: command not found: huggingface-cli`
|
||||
1. [../docs/pipeline-guide.md](../docs/pipeline-guide.md) - Overview (30 min)
|
||||
2. [../docs/pipeline-architecture.md](../docs/pipeline-architecture.md) - Full design (30 min)
|
||||
3. Review [CODE-READING-GUIDE.md](./CODE-READING-GUIDE.md) - Code Patterns section (15 min)
|
||||
4. Plan enhancements based on "Future Opportunities" (variable)
|
||||
|
||||
The app name from `huggingface_hub` is `hf`, not `huggingface-cli`.
|
||||
---
|
||||
|
||||
Use:
|
||||
### 📊 Data Engineer
|
||||
|
||||
```bash
|
||||
~/.local/bin/hf --help
|
||||
```
|
||||
**Goal**: Understand data flow, optimization
|
||||
|
||||
### `Model file not found ...`
|
||||
**Path** (1 hour):
|
||||
|
||||
- Confirm you are running from `pipeline/dist`.
|
||||
- Confirm the file path passed to `--model` exists.
|
||||
- If not using `--model`, ensure the default file exists at `models/llama-2-7b-chat.gguf` relative to current working directory.
|
||||
1. [../docs/pipeline-guide.md](../docs/pipeline-guide.md) - System Overview (30 min)
|
||||
2. [../docs/pipeline-architecture.md](../docs/pipeline-architecture.md) - Performance section (20 min)
|
||||
3. Review [src/json_loader.cpp](./src/json_loader.cpp) - Threading section (10 min)
|
||||
|
||||
### CMake cache/path mismatch
|
||||
---
|
||||
|
||||
Use explicit source/build paths:
|
||||
### 👀 Code Reviewer
|
||||
|
||||
```bash
|
||||
cmake -S /absolute/path/to/pipeline -B /absolute/path/to/pipeline/dist
|
||||
cmake --build /absolute/path/to/pipeline/dist -j4
|
||||
```
|
||||
**Goal**: Review changes, ensure quality
|
||||
|
||||
**Path** (30 minutes):
|
||||
|
||||
1. [CODE-READING-GUIDE.md](./CODE-READING-GUIDE.md) - Code Patterns section (10 min)
|
||||
2. [../docs/pipeline-architecture.md](../docs/pipeline-architecture.md) - Design Patterns (10 min)
|
||||
3. Reference header files for API contracts (10 min)
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Key Files
|
||||
|
||||
**Entry Point**: [src/main.cpp](./src/main.cpp)
|
||||
|
||||
- Shows complete 5-step pipeline
|
||||
- ~50 lines, easy to understand
|
||||
|
||||
**Threading Logic**: [src/json_loader.cpp](./src/json_loader.cpp)
|
||||
|
||||
- Nested multithreading example
|
||||
- 180 lines with extensive comments
|
||||
- Learn parallel programming patterns
|
||||
|
||||
**Database Design**: [src/database.cpp](./src/database.cpp)
|
||||
|
||||
- Thread-safe SQLite wrapper
|
||||
- Prepared statements example
|
||||
- Mutex protection pattern
|
||||
|
||||
**Generation Logic**: [src/generator.cpp](./src/generator.cpp)
|
||||
|
||||
- Deterministic hashing algorithm
|
||||
- Template-based generation
|
||||
- Only 40 lines, easy to modify
|
||||
|
||||
---
|
||||
|
||||
## Common Questions - Quick Answers
|
||||
|
||||
**Q: How do I run the pipeline?**
|
||||
A: [QUICK-START.md](./QUICK-START.md) - 5 minute setup
|
||||
|
||||
**Q: How does the code work?**
|
||||
A: [CODE-READING-GUIDE.md](./CODE-READING-GUIDE.md) - Explained with examples
|
||||
|
||||
**Q: What is the full system architecture?**
|
||||
A: [../docs/pipeline-guide.md](../docs/pipeline-guide.md) - Complete overview
|
||||
|
||||
**Q: Why was it designed this way?**
|
||||
A: [../docs/pipeline-architecture.md](../docs/pipeline-architecture.md) - Design rationale
|
||||
|
||||
**Q: How do I modify the generator?**
|
||||
A: [QUICK-START.md](./QUICK-START.md) Modification #3 - Template change example
|
||||
|
||||
**Q: How does threading work?**
|
||||
A: [../docs/pipeline-architecture.md](../docs/pipeline-architecture.md) - Threading model section
|
||||
|
||||
**Q: What about future LLM integration?**
|
||||
A: [../docs/pipeline-architecture.md](../docs/pipeline-architecture.md) - Design Patterns → Strategy Pattern
|
||||
|
||||
**Q: How do I optimize performance?**
|
||||
A: [../docs/pipeline-architecture.md](../docs/pipeline-architecture.md) - Future Optimizations section
|
||||
|
||||
---
|
||||
|
||||
## Documentation Statistics
|
||||
|
||||
| Metric | Value |
|
||||
| ---------------------------- | --------- |
|
||||
| Total documentation lines | 1500+ |
|
||||
| Code files with Doxygen | 5 |
|
||||
| Developer guides | 2 |
|
||||
| System documentation | 2 |
|
||||
| ASCII diagrams | 4 |
|
||||
| Code examples | 20+ |
|
||||
| Learning paths | 4 |
|
||||
| Estimated reading time (all) | 3-4 hours |
|
||||
|
||||
---
|
||||
|
||||
## How to Use This Index
|
||||
|
||||
1. **Find your role** in "Learning Paths by Role"
|
||||
2. **Follow the recommended path** in order
|
||||
3. **Use the file link** to jump directly
|
||||
4. **Reference this page** anytime you need to find something
|
||||
|
||||
---
|
||||
|
||||
## Contribution Notes
|
||||
|
||||
When adding to the pipeline:
|
||||
|
||||
1. **Update inline code comments** in modified files
|
||||
2. **Update Doxygen documentation** for changed APIs
|
||||
3. **Update [CODE-READING-GUIDE.md](./CODE-READING-GUIDE.md)** if reading order changes
|
||||
4. **Update [../docs/pipeline-guide.md](../docs/pipeline-guide.md)** for major features
|
||||
5. **Update [../docs/pipeline-architecture.md](../docs/pipeline-architecture.md)** for design changes
|
||||
|
||||
---
|
||||
|
||||
## Additional Resources
|
||||
|
||||
### Within This Repository
|
||||
|
||||
- [../../docs/architecture.md](../../docs/architecture.md) - General app architecture
|
||||
- [../../docs/getting-started.md](../../docs/getting-started.md) - Project setup
|
||||
- [../../README.md](../../README.md) - Project overview
|
||||
|
||||
### External References
|
||||
|
||||
- [SQLite Documentation](https://www.sqlite.org/docs.html)
|
||||
- [C++ std::thread](https://en.cppreference.com/w/cpp/thread/thread)
|
||||
- [nlohmann/json](https://github.com/nlohmann/json) - JSON library
|
||||
- [Doxygen Documentation](https://www.doxygen.nl/)
|
||||
|
||||
---
|
||||
|
||||
## Last Updated
|
||||
|
||||
Documentation completed: 2024
|
||||
|
||||
- All code files documented with Doxygen comments
|
||||
- 4 comprehensive guides created
|
||||
- 4 ASCII diagrams included
|
||||
- 4 learning paths defined
|
||||
|
||||
---
|
||||
|
||||
**Start with [QUICK-START.md](./QUICK-START.md) to get running in 5 minutes!** 🚀
|
||||
|
||||
Reference in New Issue
Block a user