← Back to Workflow Benchmarks
Workflow Benchmark

Lease Abstract → NER Calculation

Extract economic terms from a full industrial lease, then calculate Net Effective Rent.

IndustrialMulti-step workflow9 scored fields

What This Benchmark Tests

Can a model read a real ~1M SF, single-tenant, NNN industrial lease and produce the numbers a CRE analyst would produce — premises size, term, base rent, escalations, free rent, TI, LL work, commissions — and then correctly compute Net Effective Rent using the methodology provided? Two prompt conditions test whether models need explicit methodology hand-holding or can reason about NPV conventions on their own.

The Prompt

The Prompt

The exact system + user prompt every model runs against. Both prompt variants shown.

Reference Documents

Industrial Lease (NNN)

~1M SF single-tenant · full lease with Addendum 1 rent schedule

Task Structure

  1. 1

    Extract lease economics

    Premises, term, base rent, escalations, free rent, TI, LL work, commissions.

  2. 2

    Calculate Net Effective Rent

    NPV of monthly cash flows over the full term, annualized to $/SF/yr.

How It's Scored

Nine numeric fields are compared to a hand-validated answer key. Extraction fields (premises, term, rent schedule) must match exactly. NER is scored with a tolerance for legitimate convention differences (annual vs. monthly, end vs. beginning of period). Workflow score weights extraction and calculation 50/50.

See full methodology →

Results Snapshot (Top 5)

Detailed prompt

Full methodology spelled out — discount rate, monthly cash flow mechanics, annualization.

ModelScoreNotes
Gemini 3.1 Pro100.0%
Claude Opus 4.698.7%
GPT-5 Mini98.2%
Claude Sonnet 4.697.9%
Claude Haiku 4.595.0%

Less detailed prompt

Methodology unlocked — the model decides which NPV convention to apply.

ModelScoreNotes
GPT-593.5%
Claude Sonnet 4.678.3%
Gemini 3.1 Pro77.5%
GPT-5 Mini73.9%
Claude Opus 4.668.4%
See full analysis →

Run it yourself

Pick a model, run the benchmark, and see where it holds up and where it breaks.