SHUR IQ · System Demo

The Grammar Engine

This week we wrote down the editorial rules that make a SHUR IQ report read the way it should. Those rules now run as a repeatable agent program that reviews every report against them and proposes the fixes. Here is how it works, and what it did on its first full run.

Validated tonight · going online tomorrow

Internal demo · prepared for the SHUR team · 8 June 2026

The shift

From a habit in our heads to a program on the rails

A week ago, keeping a report clean depended on whoever was editing it remembering the rules: no inverted phrasing, no section that narrates itself, no idea argued twice in two places.

Every correction we made got written down — first as grammar rules with real IDs, then as standing feedback the system carries forward. The next step was to stop applying them by hand. The rules now drive an agent program that reads a finished report, scores it against the full rulebook, and returns the exact changes that bring it into line. The judgment is the same. The application is now consistent, fast, and the same every time.

The method

One improvement loop, run on every report

Each round of feedback feeds the next. Nothing we learn has to be re-learned.

Feedback

A correction during a real build — phrasing, structure, a repeated idea.

Rule

Written down with an ID and a detection test, added to the rulebook.

Program

The agent pipeline reads the rulebook and reviews against every rule.

Verified result

Exact fixes applied, deployed, and checked on the live page.

The program

Six agents, three stages, one apply-ready answer

The reviewers only read and judge; the edits are applied in one controlled pass afterward, so two agents can never collide on the same report.

Stage 1 · Compile

Build the canonical rulebook

One agent reads the grammar specification and every standing piece of feedback, then distills them into a single rubric — each rule with an ID, what it bans, and how to detect it. Every reviewer scores against the same reference.

rubric-compiler

Stage 1, live: the compiler reads the v0.7 grammar spec and 11 feedback memories, deduplicates them to 15 canonical rules, and names its consolidation decisions.

Stage 2 · Review

Four reviewers, four lenses, in parallel

Each reviewer owns a cluster of rules and reads the whole report, returning every violation with the exact text, the problem, and a proposed fix that keeps the meaning and the voice.

self-reference + naminginversion + slopargument progressionscaffolding + headlines

Four review agents running in parallel, all completed on Opus 4.8

Stage 2: the four reviewers run at the same time, each on its own lens.

Stage 3 · Synthesize

Resolve, dedupe, verify against the live file

A final agent merges overlapping findings, drops false positives, re-checks every quote against the current text, and returns one ordered edit list ready to apply.

synthesizer

Stage 3: the synthesizer verifies all 20 raw findings against the live file and returns a 12-edit, non-overlapping apply list. Six agents, start to finish, in 5m 28s.

The proof

First full run: the AHA gold-standard report

We pointed the program at the v07 American Heart Association brief — the build that tracks our gold standard. It read the whole report, surfaced twenty raw findings, and resolved them into twelve verified changes. Every fix kept the facts and figures intact.

agents, end to end

20→12

findings to verified edits

rule violations left in the live prose

~5½ min

full review, start to finish

Inverted phrasing in an action headlineR-LINT.1

wasBecome a daily presence, not an annual campaign

nowBecome a daily presence women use year-round

The "X, not Y" construction is the most recognizable machine tell. The fix leads with the affirmative and keeps the year-round contrast.

The same idea argued in two placesR-DIST.1

wasWomen read their hearts on screens the Heart Association is not on

nowThe companies women check every morning carry no Heart Association mark

This card was repeating the claim its neighbor already made. The rewrite gives it its own point — who owns the daily relationship, and who earns the trust — so each card carries one idea.

The report describing its own structureno-self-reference

wasOne split sits under every gap that follows.

nowOne split sits under everything the Heart Association now faces.

"Every gap that follows" points at the document's layout instead of making the point. The fix names the subject and states the idea directly.

What the reviewers actually caught

Each reviewer returns its findings with the rule ID, the severity, the exact text, and a fix that keeps the voice. The full reasoning for every agent is archived, so any change traces back to the rule that caught it.

Argument-progression reviewer findings — Argument progression: one strong claim had been spread across too many sections. Seven findings, the structural ones that mattered most.

Self-reference reviewer findings — Self-reference + section naming: six spots where the prose described its own structure instead of making the point.

Inversion and slop reviewer findings — Inversion + slop: two "X, not Y" constructions flagged, including one in a prominent headline.

Why it matters

What the team gets from this

Corrections compound

Each new rule joins the same rulebook, so a report reviewed today is held to every standard we have ever set — and tomorrow's adds one more.

Every report, the same bar

The same rulebook reviews each build, so quality no longer rides on who happened to edit it.

Fully auditable

Every agent's reasoning is archived. We can show exactly which rule caught what, and why each change was made.

Status

Finishing touches tonight, online tomorrow

The pipeline ran clean end to end and the AHA report is live with its changes verified. We are putting the final touches on it now. From tomorrow, new reports run through the engine as a standard step — the writing improves on its own, every time the rulebook grows.