You need, of course, both, constant flow of good ideas and strong execution. I'd think of it a pyramid, similar to Maslow's hierarchy where each layer builds on the next (sufficient capital -> space of feasible ideas -> strong execution -> judgement/taste -> ?)
For instance, rockets are earlier in the lifecycle than AI, they are still in capital-intensive stage.
The part about evals killing taste is spot on. When everyone optimizes for the same benchmarks, you end up with models that game the metrics instead of actually solving problems. The eval system trains the labs just as much as the models themselves, and that's kinda dangerous when the real world doesn't run on leaderboards.
Didn't expect this take. Absolutly spot on!
You need, of course, both, constant flow of good ideas and strong execution. I'd think of it a pyramid, similar to Maslow's hierarchy where each layer builds on the next (sufficient capital -> space of feasible ideas -> strong execution -> judgement/taste -> ?)
For instance, rockets are earlier in the lifecycle than AI, they are still in capital-intensive stage.
The part about evals killing taste is spot on. When everyone optimizes for the same benchmarks, you end up with models that game the metrics instead of actually solving problems. The eval system trains the labs just as much as the models themselves, and that's kinda dangerous when the real world doesn't run on leaderboards.
Very true. It’s the old saying of “follow the incentives”. It’s inevitable that evals end up shaping reality more than assessing it only
right on: execution will of course continue to matter, but ideas are back, baby! thx for writing and sharing your ideas!
Indeed!! Ideas are back!