add example to readme

This commit is contained in:
Aaron Po
2026-04-18 15:45:31 -04:00
parent e6d1954506
commit 212077793e
2 changed files with 311 additions and 8 deletions

View File

@@ -231,8 +231,6 @@ Generate rating events with a strong positive skew and a long tail of lower scor
| `prompts/` | System prompt used by the model-backed path. |
| `diagrams/` | Architecture and pipeline diagrams. |
## Known Issues
### Language Generation Quality
The generation pipeline passes local language codes to the model to retrieve a translated description_local.
@@ -294,16 +292,18 @@ Output quality is reliable for high-resource languages such as French, though it
]
```
### Low-Resource Language Hallucination
#### Output:
seen in [./out-sample/french-cities.log.example](out-sample/french-cities.log.example)
### Known Issues
#### Low-Resource Language Hallucination
For languages such as Welsh (Wales), Maori (Aotearoa/New Zealand), or Sicilian (Sicily, Italy), the model can generate text that looks syntactically plausible but is semantically incoherent. This comes from limited training-data coverage rather than prompt engineering.
#### Proposed Mitigations
##### Proposed Mitigations
- Prevention via allowlist: introduce a high-resource language allowlist. If a location's code is unlisted, skip description_local generation and fall back to English.
- Upstream sanitization: strip known low-resource language codes from the locations.json payload before generation.
- Downstream flagging: add a description_local_confidence column to the SQLite schema so downstream applications can filter or flag potentially hallucinated text by language tier.
```
```