Plumbline.ai: An AI Co-Pilot for Construction Lenders, Built in Five Days with Opus 4.7

πŸ—οΈ Inside our submission to Anthropic's Built with Opus 4.7 hackathon β€” a seven-agent pipeline plus a Claude Managed Agents Supervisor that turns a CRMC's draw-verification workflow into a structured, citable pipeline.

Asaf Erez β€’ May 3, 2026 β€’ 9 min read

From April 21–26, 2026 we joined Anthropic's Built with Opus 4.7 hackathon and shipped Plumbline.ai β€” an AI co-pilot for construction lenders. The product targets a real and underserved corner of regulated finance: the monthly draw verification that banks and Construction Risk Management Consultants (CRMCs) perform when they release loan tranches against milestone completion on a construction project.

🏦 Why this product needed to exist

Construction lending is a paper-driven, milestone-driven business. Every month, the contractor submits an AIA G702 / G703 application for payment claiming a percentage complete on each line of the Schedule of Values, and the lender has to decide whether to release the next tranche. Today that decision is mostly manual: an inspector drives to site, eyeballs progress, and writes a narrative.

It is also a domain that regulators keep flagging. The OCC and FDIC have repeatedly cited inadequate Change Order tracking and weak draw-verification controls as safety-and-soundness issues at construction-lending banks. The same playbook keeps surfacing in MRA findings: missing evidence, retainage drift, tranche over-allocation, scope deviations approved at the draw meeting that nobody traced back to the approved plans.

Plumbline's thesis: this workflow is structured enough β€” and cited well enough in 50 years of CRMC practice β€” that an AI co-pilot can turn it into a defensible, citation-ready pipeline rather than a narrative.

🧱 Three principles drawn straight from CRMC practice

πŸ€– The seven-agent pipeline

The project-setup pipeline is six Claude tool-use calls plus one form-driven ingester. Every Claude step uses forced tool use with a Zod-validated output schema, so each agent emits a single structured object the next stage can rely on β€” no free-text postprocessing.

Upload plan PDFs ──▢ Agent 1  PlanClassifier         (Claude Β· forced tool use)
                       discipline + sheet role + title-block
                                β–Ό
                     Agent 2  PlanFormatExtractor    (Claude)
                       per-discipline structured PlanFormat
                                β–Ό
Upload finance  ──▢  Agent 3  FinancePlan ingestion
plan (JSON form)       form + Zod cross-field validation
                       (no model call β€” hackathon cut)
                                β–Ό
                     Agent 4  PhotoGuidance          (Claude)
                       tells the inspector what to shoot
                                β–Ό
Upload photos   ──▢  Agent 5  PhotoQuality           (Claude)
                       blur / exposure / framing gate
                                β–Ό
                     Agent 6  PhotoToPlanFormat      (Claude)
                       structured observation per photo,
                       bound to plan element kinds
                                β–Ό
Request draw    ──▢  Agent 7  ComparisonAndGap       (Claude)
report                 SOV line-by-line comparison β†’
                       APPROVE / APPROVE_WITH_CONDITIONS /
                       HOLD / REJECT

Agents 1 and 2 chain as a background pipeline triggered by a plan upload. Agents 5 and 6 chain per photo (Agent 6 only fires if Agent 5 returns quality: "GOOD"). Agent 4 runs on demand once a Draw is approved and caches per draw. Agent 7 runs synchronously on POST /reports. Every Claude step writes an AgentRun row so the UI can stream status to the inspector and to the lender.

πŸ“‘ The monthly draw cycle and the eighth agent

Project setup runs once. The monthly draw cycle is what construction lenders actually live in, and it sits outside the numbered pipeline. On every monthly draw the contractor submits a G702 cover sheet and a G703 continuation sheet β€” and the G703 is the financial document that drives everything Plumbline does that month.

An eighth Claude agent β€” G703Extractor β€” parses the contractor's uploaded G703 into structured rows, maps each row to the project Gantt, and attaches an aiConfidence score per line. The contractor reviews and overrides any low-confidence rows on a side-by-side screen; once they approve, the Draw becomes the canonical "claim" that Agents 4–7 verify against site evidence. A Draw walks parsing β†’ ready_for_review β†’ approved (or β†’ rejected, β†’ failed), and Agent 4's photo guidance now consumes the approved Draw directly so every shot the inspector takes carries referenceLineNumbers back to the specific G703 row it verifies.

🧠 The Supervisor β€” Claude Managed Agents for the open-ended part

The seven-agent pipeline is great at structured extraction, but draw verification has an open-ended layer on top: the senior CRMC who reads the gap report, decides whether the claim is defensible, and may demand a re-inspection. That reasoning loop β€” look at the G703, look at the photos, look at the plan scope, decide what to investigate next, decide when to stop, write a signed finding β€” is exactly the shape that Claude Managed Agents is built for.

So we layered a Supervisor on top: an Opus 4.7 Managed Agent with the standard agent_toolset_20260401 plus four custom tools β€” read_draw_state, read_photo_evidence, read_plan_scope, and generate_reinspection_request β€” and a single prompt-bound record_finding call that emits the verdict. The agent runs server-side in Anthropic's managed harness with persistent session, built-in bash / file / web_search / web_fetch tools, and a clean SSE stream the UI can render as a live trace.

Crucially, the Supervisor enforces a strict "do no harm" contract: it never mutates any existing Plumbline collection. It reads from Draw, GapReport, Observation, PhotoAssessment, Document, PlanFormat, AgentRun, and writes only to three new collections (SupervisorSession, SupervisorFinding, ReinspectionRequest). Disable the Supervisor and the rest of the product behaves identically β€” which is the property a regulated lender actually wants from an autonomous reasoning layer.

πŸ› οΈ Stack, scope, and what shipped

The hackathon scope is deliberately narrow: four disciplines (Architecture, Structural, Electrical, Plumbing), sealed PDF drawings only, phone photos with EXIF provenance, residential loans as the demo path, and form-first finance-plan ingestion. The full product premise β€” drone / 360Β° / fixed cameras / IoT telemetry, Mechanical / Civil / Landscape / Fire-protection lanes, CAD / BIM / IFC ingestion, Excel/PDF G703 OCR, and a full commercial / HUD 221(d)(4) productisation β€” is schema-ready but not exercised in the demo pipeline.

πŸ§ͺ What we learned in five days

🎬 Closing

Five days, seven agents, one Managed-Agents Supervisor, and a draw verdict that cites G703 line items back to authenticated photo evidence. If you're building AI for regulated finance, two takeaways: (1) forced tool use plus Zod gets you 80% of the structured-extraction quality you need without inventing a framework; (2) when the work turns open-ended, switch platforms β€” Claude Managed Agents is what the autonomous reasoning layer should sit on, not a 47th tool-use call.

For a deeper read on AI accountability and audit trails in regulated workflows, see our EU AI Act guide for fintech and banking and our take on fiduciary-grade AI in financial services.