Skip to content
AI Strategy

The Taste Treadmill: Why Better AI Models Make Old Ones Feel Worse

Updated

Knowledge on this page was mainly distilled from The Problem With “Open Models Are Last Year’s Frontier”.

Reference Points Move Faster Than Models

After a few days with a significantly better AI model, your internal standard recalibrates. The older model does not feel "slightly weaker." It feels like a loss. This is not irrational. Research on loss aversion shows that people evaluate outcomes relative to a reference point, and deviations below that reference loom larger than equivalent gains above it.

The same mechanism explains why the claim "open models are basically last year's frontier" rarely lands with people who have already used the current frontier. The distance is measured in calendar time, but the pain is measured in taste. Once taste upgrades, older output looks worse even though it has not changed.

Q&A

What is the taste treadmill in AI model usage?

It is the cycle where exposure to a better model raises your internal standard, making previously acceptable models feel inadequate. The term borrows from the hedonic treadmill concept in psychology. Each upgrade resets your baseline, so 'good enough' is a moving target that rarely stays stable long enough to finish a project without temptation to switch.

Why does 'almost as good' stop being persuasive after an upgrade?

'Almost as good' is an argument made from outside the experience. From inside, you feel the gap on every third or fifth prompt as a loss relative to your new reference point. Loss aversion research shows that losses feel roughly twice as painful as equivalent gains feel rewarding. So a small capability gap registers as a disproportionately large experiential downgrade.

How does loss aversion apply to AI model switching?

Once your reference point becomes 'the model that usually nails the instruction,' using a model that nails it only sometimes does not feel like a neutral step down. It feels like something was taken away. This is consistent with prospect theory, which shows people are more sensitive to losses from a reference level than to gains of the same size.

How does Paul Graham's concept of taste relate to model evaluation?

Graham describes taste as the ability to recognize what is good, a faculty that improves with exposure to great work. In the AI context, using a strong model trains your eye for what good output looks like. Once that faculty sharpens, older output that was previously fine starts looking obviously flawed. The work did not change; the evaluator did.

Is the taste treadmill a problem or an advantage?

Both. Sharper taste helps you demand better output and ship higher-quality work. But it also means your definition of 'good enough' keeps moving, which can turn 'shipping' into 'shopping for output quality.' The practical discipline is to decide deliberately when you are upgrading your standards versus when you just need to finish something.

What does 'the reference point keeps moving' mean for open-weight models?

It means that even if open models close the capability gap on benchmarks, users who regularly sample frontier models keep resetting their expectations. The relevant question is not just how fast open models improve, but how fast user taste improves from exposure to the best available model. If taste moves faster, the perceived gap persists regardless of calendar distance.