Execution is worthless

#165 - How the end of scaling puts ideas back at the center of innovation

Dec 06, 2025

Hello friends, I hope you had a great week!

Last weekend I ran a marathon in Florence, and while it was only the second I ran it was much more fun (and less pain) than the past one! This gave me a lot of time running and thus listening to podcasts, and one of these podcasts was the inspiration for this post.

In 2011 I spent a few weeks in Silicon Valley trying to be a founder. I wasn’t even an entr more of an enthusiastic outsider absorbing whatever the ecosystem believed at the time. And there was one sentence I heard more than any other. You’d get it from mentors, investors, other twenty-something founders at Philz Coffee, people at meetups, sometimes even in the first five minutes of a pitch conversation:

“Ideas are worthless. Execution is everything.”

It was delivered with the confidence of a natural law. If you questioned it, people smiled the way you smile at someone who hasn’t understood the rules of the game. And in that era, it wasn’t completely wrong. Capital was scarce unless you were already “in the network.” Distribution was controlled by a handful of platforms. Engineering talent was thin. A good idea didn’t matter much if you couldn’t out-ship competitors or build something users actually wanted.

I feel the narrative flipped upside-down recently.

Last week I was listening to Ilya Sutskever talk about the shift from the age of scaling back to the age of research, and he had this consideration: the world he describes is one where execution has become cheap and ideas have become scarce. Compute, infra and training pipelines have been commoditized. There are “more companies than ideas.” And the bottleneck in AI isn’t shipping faster; it’s having a genuinely new hypothesis that isn’t a cosmetic variation of everyone else’s.

From “ideas are worthless” to a world with more companies than ideas

That 2011 mantra came from a specific economic environment. Most consumer startups had similar ideas anyway (social apps, marketplaces, mobile utilities, SaaS for X) so investors learned that the idea was rarely the differentiator. What mattered was whether a team could ship fast enough, iterate before running out of cash, and find distribution before someone else did. The real risk wasn’t that your idea was bad. The real risk was that you couldn’t get anything done. Ideas were abundant; execution was scarce.

AI today is the exact inverse. The bottleneck has moved.

Sutskever’s line that there are now “more companies than ideas” captures something you can feel across the industry. Every lab is training roughly the same family of models, with the same broad architectures, on overlapping data, with broadly similar RL stacks. They’re competing on capex, not concepts. That’s the paradox: the sector has never had more money, more researchers or more compute and yet it feels intellectually narrow. The age of scaling did that. Once the recipe became obvious, the only meaningful lever became scale itself. You didn’t need a new idea; you needed more GPUs and a budget that could survive a few nine-figure invoices.

It worked for a while. If everyone scales the same recipe, you get predictable progress. You get better evals. You get product improvements you can show to customers. And you avoid the risk of betting on unproven theories that might blow up after $80M of training.

But the trade-off is that the ecosystem gets trapped in a local optimum. Scaling is a safe bet, so everyone piles into it. And once everyone piles in, the marginal value of execution collapses. You can’t out-execute a competitor who is running the same training playbook at the same scale. You can only out-think them… which brings the bottleneck back to ideas.

The execution age wasn’t wrong. It was the result of a specific landscape. That landscape is not what we see in AI today.

The eval mirage, and why the old mantra made sense back then

Before getting into Sutskever’s eval critique, it’s worth pausing on why the “ideas are worthless” mantra existed in the first place.

In the 2000s and early 2010s most startup ideas were variations of the same themes: another photo-sharing app, another marketplace, another tool built on top of the Facebook API. The false-positive rate on “good ideas” was absurdly high. More importantly, an idea had no value unless you could execute it under brutal constraints. Fundraising was slow. Hiring was a grind. Cloud infrastructure wasn’t as forgiving. Distribution was controlled by Apple, Google, and a few social platforms that could kill your company with one policy change. The real moat was operational: shipping faster, iterating faster, and surviving long enough to reach some form of traction. That was the deeper meaning of the mantra. Not that ideas didn’t matter, but that ideas alone couldn’t save you.

Frontier AI flips this logic. Today, execution is the part everyone can buy. The hard part is the hypothesis. And the clearest place where this shows up is in the gap between what models achieve on evals and what they deliver in practice.

Sutskever captures it with a simple example: a model that beats coding benchmarks yet oscillates between two broken bug fixes when used in real workflows. It apologizes, repairs the problem, reintroduces another bug, apologizes again, and loops. The eval performance looks superhuman; the real capability is inconsistent. The same pattern appears in RL. Teams design environments, reward functions and curricula that are supposed to generalize, but in practice they end up training systems that optimize for whatever makes the benchmark look better. Not intentionally, but the incentives push in that direction.

This is the downside of a world that over-indexed on execution. When you have a recipe, you optimize for what’s measurable. And in AI, what’s measurable is always a benchmark. The result is inflated scores, brittle behavior, and a growing recognition that new ideas (and not more training runs) are the real constraint.

Long before the age of scaling peaked, economists were already circling around a broader problem. If you look at productivity data across multiple industries, the pattern is consistent: research output hasn’t collapsed, but research productivity has. The amount of progress you get per unit of R&D keeps shrinking (I recently wrote a post about this).

How scaling hid the scarcity problem

The years from 2020 to 2024 felt unusually smooth for anyone working around AI. If you had access to capital, access to GPUs, and a team that could run large training jobs reliably, you could show progress that looked linear and inevitable. Scaling laws gave everyone a roadmap. Bigger models → lower loss → better evals → better demos. There were still engineering challenges, but they were the kind you could solve with competent teams and a bigger capex envelope. Nothing about the process required a strong theory of intelligence.

This environment created a sense that AI was operating under different economic rules than the rest of tech. Every other industry shows diminishing returns to R&D. AI, during those years, looked like the exception. You didn’t need to invent a new architecture. You didn’t need a new conceptual breakthrough. You just needed to scale the recipe. “Execution” could be purchased. Money converted directly into capability.

But that stability hid the fact that the underlying inputs were running out of headroom. High-quality pre-training data is finite, and we already consumed most of what the internet has to offer. You can scrape more, but you get more noise than insight. The returns on parameter count also flatten: doubling compute no longer delivers the intuitive leap forward users expect. And RL, which looked like the natural next lever, turns out to be expensive, unstable, and easy to overfit to benchmarks rather than real-world tasks.

This is the moment when the real constraint shows itself again. When the predictable slope of scaling weakens, you’re forced back into the harder questions:

How do we make models generalize?
How do we design value functions that genuinely guide learning?
Which parts of the pre-training process matter, and which are dead weight?
What does a more sample-efficient architecture even look like?

These aren’t execution questions. They’re idea questions. And now that the scaling curve is flattening, the field is back where every other mature technology sits: progress requires insight, not procurement.

Age of research 2.0: taste becomes the real moat

If the next phase of AI is an age of research rather than an age of scaling, the scarce resource shifts from hardware to taste. Sutskever’s definition is surprisingly practical: beauty, simplicity and a top-down conviction that “this has to work.” It sounds soft, but it’s a functional description of how good researchers filter ideas before touching a GPU. Taste is a way to decide which hypotheses are worth tens of millions of dollars in training runs, and which ones should stay in a notebook.

The reason taste matters now is that pure execution has lost its edge. Training infrastructure is standardized. Large-scale RL pipelines are familiar. Every frontier lab knows how to run mixed precision, distributed training, and complex evaluation suites. The gap between “bad execution” and “good execution” still exists, but the gap between “good execution” and “great execution” is narrowing. What remains differentiated is the ability to choose the right problem and the right abstraction.

Sutskever’s line about having “more companies than ideas” captures how narrow the conceptual toolkit has become. Everyone is scaling transformers with small variations. Everyone is fine-tuning with similar RL recipes. Everyone is chasing the same evals.

Taste breaks that stagnation. A team with good taste bets on directions that look strange before they look correct. They pursue ideas that do not optimize for the next leaderboard release, but for the underlying mechanics of generalization, reasoning or value learning. They can keep pushing even when early results look unstable, because the top-down reasoning is strong enough to justify the search.

Execution still matters once you have the idea. But execution is no longer the scarce commodity. In an age of research, the real moat is the ability to identify the right hill to climb, not how fast you can sprint once you’re on it.

Where the bottleneck moves next

When I think back to that 2011 Silicon Valley mantra “ideas are worthless, execution is everything” it feels like a relic from a different economic regime. It reflected a world where capital was scarce, distribution was concentrated, and even a good team could struggle to get anything in front of users. In that environment, the idea wasn’t the limiting factor. Survival was.

Frontier AI has flipped this logic. Execution is still necessary, but it isn’t scarce. The playbooks are known. The infra is standardized. Every serious lab can train large models, run RL pipelines, and hit strong evals. The hard part is no longer getting things done. The hard part is knowing what is worth doing.

We’re back in the age of research. GPUs are cheap. Ideas are not.

As I wrapped this piece, I realized how often I’ve been circling the same ideas lately. In The real gap between Western and Asian growth, I wrote about how systems fall behind when they stop generating new concepts and rely only on scale. In Have we finished good ideas?, I looked at the data showing how much harder progress has become when invention slows. Both themes point to the same conclusion here.

So as we head into resolution season, I’ll make one for myself: try to make 2026 a year dedicated to cultivating bigger ideas, stepping back from relentless execution, and giving myself more space to study, explore and invent. Looks like the ability to do that will be more and more valuable in the future!

Have a fantastic weekend!
Giovanni

Rainbow Roxy

Dec 14

Didn't expect this take. Absolutly spot on!

Polina

Dec 6Edited

You need, of course, both, constant flow of good ideas and strong execution. I'd think of it a pyramid, similar to Maslow's hierarchy where each layer builds on the next (sufficient capital -> space of feasible ideas -> strong execution -> judgement/taste -> ?)

For instance, rockets are earlier in the lifecycle than AI, they are still in capital-intensive stage.

4 more comments...

Beyond tweets

Discussion about this post

Ready for more?