Admin
Prompt evals
Historical scoring by prompt version. MVP evals are operator ratings and rejection reasons from real/test recommendation runs; shadow comparisons can be added once v2 exists.
Code-managed prompt versions
Reframes budget cap as a TIER SIGNAL, not just a ceiling. Adds a hard floor on product subtotal (now 65% of cap, up from 56%) and an aim point at 85% (up from 78%), with explicit rules: cheap-and-thoughtful loses to right-priced-and-thoughtful; if the best pick costs under the floor, pair with a complementary add-on or pick again. Eval showed v3 routinely undershooting (e.g. $25 book on a $75 cap).
User prompt now includes every customer-supplied field that should shape the pick: sender name, personal P.S., shipping name, shipping city/state/zip (for seasonal/regional fit), in addition to the existing recipient + occasion + budget set. Adds explicit rules for handling form-vs-description age conflicts and tone signals from the gift note.
Strengthens output rules: no markdown fences, no prose, bare numeric cost fields. Same task definition and merchant constraints as v1.
Adds explicit prompt versioning, all-order-fact grounding, approved merchant constraints, direct product URLs, budget math, URL review, and anti-generic fallback guidance.
No runs to evaluate yet.
Generate recommendations from orders, the tester, or QA smoke suite.
