Plumbline.ai: An AI Co-Pilot for Construction Lenders, Built in Five Days with Opus 4.7

From April 21–26, 2026 we joined Anthropic's Built with Opus 4.7 hackathon and shipped Plumbline.ai — an AI co-pilot for construction lenders. The product targets a real and underserved corner of regulated finance: the monthly draw verification that banks and Construction Risk Management Consultants (CRMCs) perform when they release loan tranches against milestone completion on a construction project.

🏦 Why this product needed to exist

undefined

Construction lending is a paper-driven, milestone-driven business. Every month, the contractor submits an AIA G702 / G703 application for payment claiming a percentage complete on each line of the Schedule of Values, and the lender has to decide whether to release the next tranche. Today that decision is mostly manual: an inspector drives to site, eyeballs progress, and writes a narrative.

It is also a domain that regulators keep flagging. The OCC and FDIC have repeatedly cited inadequate Change Order tracking and weak draw-verification controls as safety-and-soundness issues at construction-lending banks. The same playbook keeps surfacing in MRA findings: missing evidence, retainage drift, tranche over-allocation, scope deviations approved at the draw meeting that nobody traced back to the approved plans.

Plumbline's thesis: this workflow is structured enough — and cited well enough in 50 years of CRMC practice — that an AI co-pilot can turn it into a defensible, citation-ready pipeline rather than a narrative.

undefined

🧱 Three principles drawn straight from CRMC practice

Multi-source, chain-of-custody evidence. A draw decision is only as defensible as the evidence behind it. The platform is designed to accept any sealed, time-stamped, geo-bound capture — phone photos, drone orthomosaic, 360° walkthrough rigs, fixed-position construction cameras, and IoT telemetry (concrete maturity sensors, moisture probes, strain gauges). Each artefact carries authenticated provenance: EXIF, C2PA / Content Credentials, device attestation, signed timestamp, GPS fence. Hackathon cut: phone photos only; the other modalities are schema-ready but not wired into the demo pipeline.
Full-fidelity plan ingestion — structured and unstructured. Approved construction documents arrive in every format the industry uses: sealed PDF drawings (Architectural, Structural, MEP), CSI Masterformat specs, RFIs, redlined markups; plus structured Bill of Quantities, AIA G703 continuation sheets, project schedules. We normalise all of it into a single discipline-keyed PlanFormat that downstream agents cite line-by-line.
Finance modelled on lender and AIA best practice. The FinancePlan mirrors what a construction-loan administrator already recognises: AIA G702/G703 application-and-certificate-for-payment structure, SOV line items keyed to CSI codes, retainage with step-down (10% → 5% at 50% completion), Change Order thresholds tuned to OCC/FDIC expectations, cure periods for monetary vs non-monetary defaults, and milestone tranches whose plannedReleasePct values are monotonic and sum to 100. Cross-field Zod validation catches the findings CRMCs most often raise at the draw meeting before the first photo is uploaded.

🤖 The seven-agent pipeline

The project-setup pipeline is six Claude tool-use calls plus one form-driven ingester. Every Claude step uses forced tool use with a Zod-validated output schema, so each agent emits a single structured object the next stage can rely on — no free-text postprocessing.

Upload plan PDFs ──▶ Agent 1  PlanClassifier         (Claude · forced tool use)
                       discipline + sheet role + title-block
                                ▼
                     Agent 2  PlanFormatExtractor    (Claude)
                       per-discipline structured PlanFormat
                                ▼
Upload finance  ──▶  Agent 3  FinancePlan ingestion
plan (JSON form)       form + Zod cross-field validation
                       (no model call — hackathon cut)
                                ▼
                     Agent 4  PhotoGuidance          (Claude)
                       tells the inspector what to shoot
                                ▼
Upload photos   ──▶  Agent 5  PhotoQuality           (Claude)
                       blur / exposure / framing gate
                                ▼
                     Agent 6  PhotoToPlanFormat      (Claude)
                       structured observation per photo,
                       bound to plan element kinds
                                ▼
Request draw    ──▶  Agent 7  ComparisonAndGap       (Claude)
report                 SOV line-by-line comparison →
                       APPROVE / APPROVE_WITH_CONDITIONS /
                       HOLD / REJECT

Agents 1 and 2 chain as a background pipeline triggered by a plan upload. Agents 5 and 6 chain per photo (Agent 6 only fires if Agent 5 returns quality: "GOOD"). Agent 4 runs on demand once a Draw is approved and caches per draw. Agent 7 runs synchronously on POST /reports. Every Claude step writes an AgentRun row so the UI can stream status to the inspector and to the lender.

undefined

How to verify a monthly construction draw with AI

undefined

📑 The monthly draw cycle and the eighth agent

Project setup runs once. The monthly draw cycle is what construction lenders actually live in, and it sits outside the numbered pipeline. On every monthly draw the contractor submits a G702 cover sheet and a G703 continuation sheet — and the G703 is the financial document that drives everything Plumbline does that month.

An eighth Claude agent — G703Extractor — parses the contractor's uploaded G703 into structured rows, maps each row to the project Gantt, and attaches an aiConfidence score per line. The contractor reviews and overrides any low-confidence rows on a side-by-side screen; once they approve, the Draw becomes the canonical "claim" that Agents 4–7 verify against site evidence. A Draw walks parsing → ready_for_review → approved (or → rejected, → failed), and Agent 4's photo guidance now consumes the approved Draw directly so every shot the inspector takes carries referenceLineNumbers back to the specific G703 row it verifies.

🧠 The Supervisor — Claude Managed Agents for the open-ended part

The seven-agent pipeline is great at structured extraction, but draw verification has an open-ended layer on top: the senior CRMC who reads the gap report, decides whether the claim is defensible, and may demand a re-inspection. That reasoning loop — look at the G703, look at the photos, look at the plan scope, decide what to investigate next, decide when to stop, write a signed finding — is exactly the shape that Claude Managed Agents is built for.

So we layered a Supervisor on top: an Opus 4.7 Managed Agent with the standard agent_toolset_20260401 plus four custom tools — read_draw_state, read_photo_evidence, read_plan_scope, and generate_reinspection_request — and a single prompt-bound record_finding call that emits the verdict. The agent runs server-side in Anthropic's managed harness with persistent session, built-in bash / file / web_search / web_fetch tools, and a clean SSE stream the UI can render as a live trace.

Crucially, the Supervisor enforces a strict "do no harm" contract: it never mutates any existing Plumbline collection. It reads from Draw, GapReport, Observation, PhotoAssessment, Document, PlanFormat, AgentRun, and writes only to three new collections (SupervisorSession, SupervisorFinding, ReinspectionRequest). Disable the Supervisor and the rest of the product behaves identically — which is the property a regulated lender actually wants from an autonomous reasoning layer.

When should you use Claude's Messages API with forced tool use vs. Claude Managed Agents?

Use forced tool use when each step is a deterministic structured-extraction problem with a known output schema and a known stopping point — that is the right tool for our seven-agent project-setup pipeline. Use Claude Managed Agents when the work is open-ended reasoning that decides what to look at next and when to stop, and when you want the harness to handle the agent loop, persistent session, container tools, and live event stream for you. Plumbline does both: structured extraction in the pipeline, autonomous reasoning in the Supervisor.

🛠️ Stack, scope, and what shipped

Backend: Node 20+, TypeScript, Fastify 4, Mongoose 8, Zod, Anthropic SDK, pdf-to-img, heic-convert, exifr
Frontend: React 19, Vite 8, Tailwind 3, shadcn/ui patterns over Radix primitives, TanStack Query, Zustand, react-router, framer-motion, dnd-kit
Database: MongoDB 7
Model: claude-opus-4-7 (configurable via ANTHROPIC_MODEL; we ran the demo on Opus 4.7 and tested fallbacks on Haiku 4.5 for cost-sensitive paths)
Agent platforms: Messages API + forced tool use for the seven-agent pipeline; Claude Managed Agents (beta) for the Supervisor
Cost per full draw cycle: ≈ $0.05–$0.10 in Anthropic credits on Opus 4.7 (Agents 1+2 on plan upload, 4 on guidance, 5+6 per photo, 7 on report)

The hackathon scope is deliberately narrow: four disciplines (Architecture, Structural, Electrical, Plumbing), sealed PDF drawings only, phone photos with EXIF provenance, residential loans as the demo path, and form-first finance-plan ingestion. The full product premise — drone / 360° / fixed cameras / IoT telemetry, Mechanical / Civil / Landscape / Fire-protection lanes, CAD / BIM / IFC ingestion, Excel/PDF G703 OCR, and a full commercial / HUD 221(d)(4) productisation — is schema-ready but not exercised in the demo pipeline.

🧪 What we learned in five days

Forced tool use is the right default for regulated extraction. Each of our six Claude calls returns a Zod-validated object. We never had to write a single regex or fallback parser. When validation failed, the loop retried with the validation error in context — and almost always recovered on the second turn.
Cross-field Zod validation pays for itself before the first model call. SOV sum mismatch, tranche over-allocation, retainage drift, planDocRef integrity — every CRMC finding category we hand-coded as a Zod refinement caught real bugs in the sample finance plans we ingested. The most expensive AI mistake is the one you didn't need an AI to catch.
Two agent platforms beat one. Trying to force the Supervisor's open-ended reasoning into our seven-step pipeline would have meant re-inventing the agent loop, the session store, and the SSE event protocol. Trying to force the pipeline's structured extraction into the Managed Agents harness would have meant giving up Zod schemas at every step. Picking the right platform per layer was the single biggest architectural lever we pulled.
"Do no harm" is a feature, not a constraint. The Supervisor only writes to three new collections; it never re-runs pipeline agents. That property is what makes it shippable into a real lender's ops without a six-month risk review.

🎬 Closing

Five days, seven agents, one Managed-Agents Supervisor, and a draw verdict that cites G703 line items back to authenticated photo evidence. If you're building AI for regulated finance, two takeaways: (1) forced tool use plus Zod gets you 80% of the structured-extraction quality you need without inventing a framework; (2) when the work turns open-ended, switch platforms — Claude Managed Agents is what the autonomous reasoning layer should sit on, not a 47th tool-use call.

For a deeper read on AI accountability and audit trails in regulated workflows, see our EU AI Act guide for fintech and banking and our take on fiduciary-grade AI in financial services.