← Back to Workflow Benchmarks
Workflow Benchmark

Lease Termination → New Lease → Lender Consent

Analyze an early lease termination, replacement lease economics, and whether lender consent conditions are satisfied.

OfficeMulti-document5-step workflow18 scored fields

What This Benchmark Tests

A real asset management problem: a tenant wants to terminate a Class A office lease early. Can the model calculate remaining obligations, evaluate a replacement tenant LOI, run NER on both leases, and determine whether a specific 2-of-3 lender consent test from the loan agreement is satisfied? Requires reading across five documents and chaining five analytical steps.

The Prompt

The Prompt

The exact system + user prompt every model runs against.

Reference Documents

Executed Lease (Class A Office)

Original tenant · remaining term · rent schedule

Termination Request Emails

Tenant counsel · proposed termination timeline

Replacement Tenant LOI

Proposed new lease economics

Loan Agreement Excerpt

Lender consent conditions and covenants

Manhattan Commission Schedule

Tiered commission rates with abatement treatment

Task Structure

  1. 1

    Calculate remaining lease obligations

    Remaining rent through expiration, unamortized TI, leasing commissions, free rent.

  2. 2

    Calculate replacement tenant costs

    Downtime rent, free rent, TI, landlord work, commissions for the replacement tenant.

  3. 3

    Calculate net effective rent

    NER for both leases using the methodology prescribed in the loan agreement.

  4. 4

    Apply lender consent conditions

    Test whether at least 2 of 3 loan covenant conditions are satisfied.

  5. 5

    Make a recommendation

    Should the landlord proceed with the termination?

How It's Scored

Eighteen numeric outputs across the five steps are compared to a hand-validated answer key. Dollar figures must fall within tight tolerances; the lender-consent conclusion must match both the binary outcome and the supporting test logic. Completion rate is tracked separately — some models refuse or fail mid-workflow.

See full methodology →

Results Snapshot (Top 5)

ModelScoreNotes
Gemini 3.1 Pro100.0%5 of 5 runs perfect
Claude Opus 4.6100.0%2 of 2 completed perfect · 40% completion rate
GPT-5 Mini90.0%2 of 5 perfect · 100% completion
GPT-5.488.9%2 of 5 perfect · 100% completion
GPT-586.1%1 of 3 completed perfect
See full analysis →

Run it yourself

Pick a model, run the benchmark, and see where it holds up and where it breaks.