Nov 29, 2025
Accept/Reject Preference Learning
By TypeOS Research
Read PaperThe world's first fine-grained writing preference dataset created from inline accept/reject decisions inside real documents.
TypeOS captures micro-edits (word choice, tone, structure, formatting, semantics), builds taste vectors, updates reward models, and enables personalized writing models. Unlike traditional RLHF which relies on binary "thumbs up/down" on entire responses, our approach leverages the granular edits users make to their own documents.
Micro-Level Preference Signals
Each edit generates micro-level preference supervision. No other platform collects real writing edits at this resolution. This becomes a new category of RLHF: Inline Revision Feedback (IRF).
Figure 1: An inline revision capturing a preference for "active voice" and "impactful vocabulary".
Personalized Taste Vectors
By aggregating thousands of these micro-decisions, TypeOS constructs a high-dimensional "Taste Vector" for each user or organization. This vector informs the model about preferred:
- Tone: Formal vs. Casual, Direct vs. Diplomatic
- Structure: Dense paragraphs vs. Bulleted lists
- Vocabulary: Simple vs. Academic
- Formatting: Oxford commas, capitalization rules, spacing
Core Metrics
RAR
Revision Acceptance Rate
The percentage of AI suggestions accepted without modification.
TSI
Taste Stability Index
Consistency of user preferences over time.
PDS
Preference Divergence Score
Difference between model output and user final edit.
LPA
Local Preference Accuracy
Accuracy of predicting specific word-level choices.


