Workflow Benchmark

Lease Termination → New Lease → Lender Consent

Analyze an early lease termination, replacement lease economics, and whether lender consent conditions are satisfied.

OfficeMulti-document5-step workflow18 scored fields

What This Benchmark Tests

A real asset management problem: a tenant wants to terminate a Class A office lease early. Can the model calculate remaining obligations, evaluate a replacement tenant LOI, run NER on both leases, and determine whether a specific 2-of-3 lender consent test from the loan agreement is satisfied? Requires reading across five documents and chaining five analytical steps.

The Prompt

The exact system + user prompt every model runs against.

Reference Documents

Executed Lease (Class A Office)

Original tenant · remaining term · rent schedule

Termination Request Emails

Tenant counsel · proposed termination timeline

Replacement Tenant LOI

Proposed new lease economics

Loan Agreement Excerpt

Lender consent conditions and covenants

Manhattan Commission Schedule

Tiered commission rates with abatement treatment

Task Structure

1
Calculate remaining lease obligations
Remaining rent through expiration, unamortized TI, leasing commissions, free rent.
2
Calculate replacement tenant costs
Downtime rent, free rent, TI, landlord work, commissions for the replacement tenant.
3
Calculate net effective rent
NER for both leases using the methodology prescribed in the loan agreement.
4
Apply lender consent conditions
Test whether at least 2 of 3 loan covenant conditions are satisfied.
5
Make a recommendation
Should the landlord proceed with the termination?

How It's Scored

Eighteen numeric outputs across the five steps are compared to a hand-validated answer key. Dollar figures must fall within tight tolerances; the lender-consent conclusion must match both the binary outcome and the supporting test logic. Completion rate is tracked separately — some models refuse or fail mid-workflow.

See full methodology →

Results Snapshot (Top 5)

Model	Score	Notes
Gemini 3.1 Pro	100.0%	5 of 5 runs perfect
Claude Opus 4.6	100.0%	2 of 2 completed perfect · 40% completion rate
GPT-5 Mini	90.0%	2 of 5 perfect · 100% completion
GPT-5.4	88.9%	2 of 5 perfect · 100% completion
GPT-5	86.1%	1 of 3 completed perfect

See full analysis →

Run it yourself

Pick a model, run the benchmark, and see where it holds up and where it breaks.

Run this demo →See full analysis