Live demo
Lead enrichment, built two ways
Paste a LinkedIn-style profile and a company description. The app scores fit against a fixed B2B SaaS ICP and returns source-quoted claims, an outreach hook, and a routing decision: auto-add, propose, or discard.
Two builds run the same workflow against the same eval set. The integrated build uses a strict tool schema, extended thinking, and a per-claim grounding rule. The chat build uses a task-describing system prompt and nothing else. The scorecard measures the architectural gap.
What’s different
Explore
- 01
Integrated build
Lead queue with structured outputs, source-quoted claims, extended thinking deltas, and an under-the-hood telemetry drawer.
Open the queue → - 02
Chat build
Productized chat against the same model, without tools or schema. Conversations stay in the browser, and diagnostics live behind a toggle.
Open the chat → - 03
Eval scorecard
73-item test set, both modes scored against the same gold labels, two grounding judges, and a dated pre/post-fix snapshot.
See the numbers → - 04
Methodology
What's measured, how the labels were built, the criteria revision log, judge models, and the known limits to read before the numbers.
Read the methodology →