Nov 15, 2025
TypeBench: Measuring Human Taste
By TypeOS Research
Read PaperThe first benchmark specifically measuring human taste in writing, derived from real editing behavior, not artificial ratings.
TypeBench evaluates models on style alignment, tone control, document structure, argument quality, and personalization.
Methodology
Unlike static benchmarks that rely on multiple-choice questions or abstract reasoning, TypeBench uses "Taste Tasks"—complex rewriting and drafting instructions that require stylistic nuance.
We use Bradley-Terry scoring to rank models based on pairwise human preference comparisons from expert editors.
Taste Tasks
- Rewrite: Transform a rough email into a polished executive summary.
- Tone Shift: Change a defensive message to an apologetic yet firm one.
- Style Mimicry: Write a paragraph in the style of The Economist.
- Summarize: Condense a legal brief without losing key clauses.
Initial Results
| Rank | Model | HPSR Score |
|---|---|---|
| Hemingway 1 | 89.4 | |
| Claude 3.5 Sonnet | 84.2 | |
| GPT-4o | 82.8 |


