SHUR IQ · System Demo

The Grammar Engine

This week we wrote down the editorial rules that make a SHUR IQ report read the way it should. Those rules now run as a repeatable agent program that reviews every report against them and proposes the fixes. Here is how it works, and what it did on its first full run.

Validated tonight · going online tomorrow
Internal demo · prepared for the SHUR team · 8 June 2026
The shift

From a habit in our heads to a program on the rails

A week ago, keeping a report clean depended on whoever was editing it remembering the rules: no inverted phrasing, no section that narrates itself, no idea argued twice in two places.

Every correction we made got written down — first as grammar rules with real IDs, then as standing feedback the system carries forward. The next step was to stop applying them by hand. The rules now drive an agent program that reads a finished report, scores it against the full rulebook, and returns the exact changes that bring it into line. The judgment is the same. The application is now consistent, fast, and the same every time.

The method

One improvement loop, run on every report

Each round of feedback feeds the next. Nothing we learn has to be re-learned.

01

Feedback

A correction during a real build — phrasing, structure, a repeated idea.

02

Rule

Written down with an ID and a detection test, added to the rulebook.

03

Program

The agent pipeline reads the rulebook and reviews against every rule.

04

Verified result

Exact fixes applied, deployed, and checked on the live page.

The program

Six agents, three stages, one apply-ready answer

The reviewers never touch the file. They read and judge; the edits are applied in one controlled pass afterward, so two agents can never collide on the same report.

Stage 1 · Compile

Build the canonical rulebook

One agent reads the grammar specification and every standing piece of feedback, then distills them into a single rubric — each rule with an ID, what it bans, and how to detect it. Every reviewer scores against the same reference.

rubric-compiler
rubric-compiler agent output: 15 distinct rules compiled from the grammar spec and feedback memories
Stage 1, live: the compiler reads the v0.7 grammar spec and 11 feedback memories, deduplicates them to 15 canonical rules, and names its consolidation decisions.
Stage 2 · Review

Four reviewers, four lenses, in parallel

Each reviewer owns a cluster of rules and reads the whole report, returning every violation with the exact text, the problem, and a proposed fix that keeps the meaning and the voice.

self-reference + naminginversion + slopargument progressionscaffolding + headlines
Four review agents running in parallel, all completed on Opus 4.8
Stage 2: the four reviewers run at the same time, each on its own lens.
Stage 3 · Synthesize

Resolve, dedupe, verify against the live file

A final agent merges overlapping findings, drops false positives, re-checks every quote against the current text, and returns one ordered edit list ready to apply.

synthesizer
Synthesizer agent output: 6 of 6 agents done in 5 minutes 28 seconds, 20 findings verified into a 12-edit list
Stage 3: the synthesizer verifies all 20 raw findings against the live file and returns a 12-edit, non-overlapping apply list. Six agents, start to finish, in 5m 28s.
The proof

First full run: the AHA gold-standard report

We pointed the program at the v07 American Heart Association brief — the build that tracks our gold standard. It read the whole report, surfaced twenty raw findings, and resolved them into twelve verified changes. Every fix kept the facts and figures intact.

6
agents, end to end
20→12
findings to verified edits
0
rule violations left in the live prose
~5½ min
full review, start to finish
Inverted phrasing in an action headlineR-LINT.1
wasBecome a daily presence, not an annual campaign
nowBecome a daily presence women use year-round
The "X, not Y" construction is the most recognizable machine tell. The fix leads with the affirmative and keeps the year-round contrast.
The same idea argued in two placesR-DIST.1
wasWomen read their hearts on screens the Heart Association is not on
nowThe companies women check every morning carry no Heart Association mark
This card was repeating the claim its neighbor already made. The rewrite gives it its own point — who owns the daily relationship, and who earns the trust — so each card carries one idea.
The report describing its own structureno-self-reference
wasOne split sits under every gap that follows.
nowOne split sits under everything the Heart Association now faces.
"Every gap that follows" points at the document's layout instead of making the point. The fix names the subject and states the idea directly.

What the reviewers actually caught

Each reviewer returns its findings with the rule ID, the severity, the exact text, and a fix that keeps the voice. The full reasoning for every agent is archived, so any change traces back to the rule that caught it.

Why it matters

What the team gets from this

Corrections become permanent

A note made once becomes a rule the program enforces on every report from then on. We never lose ground.

Every report, the same bar

The same rulebook reviews each build, so quality no longer rides on who happened to edit it.

Fully auditable

Every agent's reasoning is archived. We can show exactly which rule caught what, and why each change was made.

Status

Finishing touches tonight, online tomorrow

The pipeline ran clean end to end and the AHA report is live with its changes verified. We are putting the final touches on it now. From tomorrow, new reports run through the engine as a standard step — the writing improves on its own, every time the rulebook grows.