Skip to content
Get Started
TypeBench: Measuring Human Taste

Nov 15, 2025

TypeBench: Measuring Human Taste

By TypeOS Research

Read Paper

The first benchmark specifically measuring human taste in writing, derived from real editing behavior, not artificial ratings.

TypeBench evaluates models on style alignment, tone control, document structure, argument quality, and personalization.

Methodology

Unlike static benchmarks that rely on multiple-choice questions or abstract reasoning, TypeBench uses "Taste Tasks"—complex rewriting and drafting instructions that require stylistic nuance.

We use Bradley-Terry scoring to rank models based on pairwise human preference comparisons from expert editors.

Taste Tasks

  • Rewrite: Transform a rough email into a polished executive summary.
  • Tone Shift: Change a defensive message to an apologetic yet firm one.
  • Style Mimicry: Write a paragraph in the style of The Economist.
  • Summarize: Condense a legal brief without losing key clauses.

Initial Results

RankModelHPSR Score
Hemingway 189.4
Claude 3.5 Sonnet84.2
GPT-4o82.8
    TypeBench: Measuring Human Taste | TypeOS Research