Admin

Prompt evals

Historical scoring by prompt version. MVP evals are operator ratings and rejection reasons from real/test recommendation runs; shadow comparisons can be added once v2 exists.

Showing local beta recommendation runs.

Code-managed prompt versions

gift-recommendation-v4

active / created 2026-05-26

Reframes budget cap as a TIER SIGNAL, not just a ceiling. Adds a hard floor on product subtotal (now 65% of cap, up from 56%) and an aim point at 85% (up from 78%), with explicit rules: cheap-and-thoughtful loses to right-priced-and-thoughtful; if the best pick costs under the floor, pair with a complementary add-on or pick again. Eval showed v3 routinely undershooting (e.g. $25 book on a $75 cap).

gift-recommendation-v3

archived / created 2026-05-26

User prompt now includes every customer-supplied field that should shape the pick: sender name, personal P.S., shipping name, shipping city/state/zip (for seasonal/regional fit), in addition to the existing recipient + occasion + budget set. Adds explicit rules for handling form-vs-description age conflicts and tone signals from the gift note.

gift-recommendation-v2

archived / created 2026-05-26

Strengthens output rules: no markdown fences, no prose, bare numeric cost fields. Same task definition and merchant constraints as v1.

gift-recommendation-v1

archived / created 2026-05-25

Adds explicit prompt versioning, all-order-fact grounding, approved merchant constraints, direct product URLs, budget math, URL review, and anti-generic fallback guidance.

No runs to evaluate yet.

Generate recommendations from orders, the tester, or QA smoke suite.