Updating Your Priors [Edition 1]: The AI Scaling Shift is Happening Now

Highlights, notes, and deep-dives into the most interesting startup/tech podcasts

Jan 03, 2025

I know you expected Triple Shot Saturday in your inbox tomorrow. Plot twist: Your usual Triple Shot Saturday is evolving into something more ambitious. Welcome to the first edition of 'Updating Your Priors.'

Triple Shot Saturday, though started just six months ago, seems strangely out of place in early 2025 when the quality of AI-based apps/ChatGPT/Claude has skyrocketed. If you know the right tool or prompt, surfacing highlights from a podcast is a matter of a few clicks. The models have become very good, and the AI-generated highlights will easily be an 8/10 or 9/10, which is good enough for the time it saves.

The constraint shifts downstream from ‘if AI can get me the most important highlights’ to ‘whether you can find the most interesting episode and spend enough time absorbing the insights/anecdotes.’

Also, in an ocean of podcast content, I believe any episode is worth your time only if it helps you update your priors: discover a perspective you hadn’t considered, revisit an outdated assumption, or sharpen your understanding of a complex topic.

Which brings us to ‘Updating Your Priors.’

In ‘Updating Your Priors,’ I’ll explore one (or perhaps two) startup/tech podcast episode(s) in depth, combining human-curated insights and anecdotes with my commentary to maximize the episode’s value for you. My guardrail to choosing the podcast episode(s) is that it should help you revisit what you already know and relearn/unlearn.

Of course, each edition will be longer than Triple Shots and, hopefully, something you’ll enjoy reading with a warm cup of tea on a lazy Sunday morning.

Happy reading!

In the first edition, I go deep into the 400th episode of Invest Like The Best, where Patrick O'Shaughnessy hosted Chetan Puttagunta, GP, at the legendary Valley venture fund Benchmark, and Twitter anon account Modest Proposal, known for deep US public market insights.

The core theme is how AI scaling will change in 2025 and what it means for startups, incumbents, hyperscalers, and VCs.

I have read this episode twice already and will most likely read it one more time. Here are the best bits that I found incredibly valuable. Enjoy.

Chetan Puttagunta and Modest Proposal - Capital, Compute & AI Scaling - [Invest Like the Best, EP.400]

Link to the transcript

A shift in AI model scaling is coming, which no one is talking about.

AI model development has hit a significant turning point. The traditional approach of improving models through pre-training with more data and compute power has reached its limits, as we've exhausted available human-generated text data, and synthetic data isn't proving effective enough. The industry is now shifting to "test-time compute," where models improve by exploring multiple solutions in parallel and using verifiers to iterate toward better answers. This marks a fundamental change in how AI capabilities will scale going forward.

Yeah, I think we're now at a point where it's either consensus or universally known that all the labs have hit some kind of plateauing effect on how we perceive scaling for the last two Years, which was specifically in the pre-training world. And the power laws of scaling stipulated that the more you could increase compute in pre-training, the better model you were going to get. And everything was thought of in orders of magnitude. So throw 10x more compute at the problem, and you get a step function in model performance and intelligence. And this certainly led to incredible breakthroughs here. And we saw from all of the labs, really terrific models. The overhang on all of this, even starting in late 2022, was at some point we were going to run out of text data that was generated by human beings. And we were going to enter the world of synthetic data fairly quickly. All of the world's knowledge effectively had been tokenized and had been digested by these models. And sure, there were niche data and private data and all these little repositories that hadn't been tokenized. But in terms of orders of magnitude, it wasn't going to increase the amount of available data for these models particularly significantly. As we looked out in 2022, you saw this big question of was synthetic data going to enable these models to continue to scale? Everybody assumed, as you saw that line, this problem was going to really come to the forefront in 2024. And here we are. We're here and we're all trying to train on synthetic data, the large model providers. And now, as it's been reported in the press and as all these AI lab leaders have gone on the record, we're now hitting limits because of synthetic data. The synthetic data as generated by the LLMs themselves are not enabling the scaling and pre-training to continue. So we're now shifting to a new paradigm called test time compute. And what test time compute is in a very basic way is you actually ask the LLM to look at the problem, come up with a set of potential solutions to it, and pursue multiple solutions in parallel. You create this thing called a verifier, and you pass through the solution over and over again iteratively. And the new paradigm of scaling, if you will, the x-axis is time measured in logarithmic scale, and intelligence is on the y-scale. And that's where we are today, where it seems that almost everybody is moving to a world where we're scaling on pre-training and training to scaling on what's now being called reasoning, Or that is inference time, test time, however you want to call it. And that's where we are as of Q4 2024.

What is test-time compute scaling, and why should you care as long as the models get better?

The shift to test-time compute isn't just an academic distinction - it comes with a real implication: Model quality doesn't scale linearly with more compute.

Unlike pre-training where throwing more compute improved model quality, test-time compute faces fundamental constraints: algorithms can quickly exhaust the solution space, verifiers don't scale linearly with compute, and some problems remain challenging regardless of computing power. These limitations mean we can't assume a linear relationship between compute thrown at the model and model’s capabilities.

…is it reasonable to say based on what you know now that the switch To test time scaling where time is the variable is like a who cares? As long as these things keep getting more and more capable, isn't that all that matters? And the fact that we're doing it in a different way than just based on pre-training, does anyone really care? Does it matter?
There's two things that come up pretty quickly in test time or reasoning paradigm, which is as LLMs explore the space for potential solutions, very quickly as a model developer or somebody Working on models, you quickly realize that algorithms used for test time compute might exhaust the useful search space for solutions quite quickly. That's number one. Number two, you have this thing called a verifier that's looking at what's potentially a good solution, what's potentially a bad solution, what should you pursue. And the ability to figure out what's a good solution, what's a bad solution, or what's an optimal path and not an optimal path, it's unclear that that scales linearly with infinite compute. And then finally, tasks themselves can be complex, ambiguous, and the limiting factor there may or may not be compute. So it's always really interesting to think of these problems as if you were to have infinite compute to solve this problem, could you go faster? And there's going to be a number of problems in reasoning where you could go faster if you just scaled compute. But oftentimes we're starting to see evidence that it's not necessarily something that scales with compute linearly with the technology we have today. Now, can we solve all of that? Of course. There's going to be algorithmic improvement. There's going to be data improvement. There's going to be hardware improvement. There's going to be all sorts of optimization improvements here. The other thing we're still finding is the inherent knowledge or data available to the underlying model that you're using for a reasoning still continues to be limited. And just because you're pursuing test time, it doesn't mean that you can break through all previous data limitations by just scaling compute at test time. So it's not that we're hitting walls on reasoning or we're hitting walls on test time. It's just the problem set and the challenges and the computer science problems are starting to evolve.

What happens to data center capex if scaling fully moves to test-time compute?

The shift to test-time compute could fundamentally change AI infrastructure investments. Instead of massive $20-50B data centers built upfront for pre-training, companies might shift to smaller, distributed inference centers that scale with actual usage. This works very well for incumbent hyperscalers as the cost aligns better with revenue creation, and they don’t have to slide in $20B line items in their cashflow statements. The big change could be - from huge centralized facilities in low-cost locations to a network of smaller, lower-latency centers spread across regions.

And in my mind, it was easy to talk about that when the cost of anteing up was a billion dollars or $5 billion. But we were rapidly approaching the point in time where the ante was going to be $20 billion or $50 billion. And you can look at the cash flow statements of these companies, it's hard to sneak in a $30 billion training. And so the success of GPT-5 class globally, let's apply that to all the various labs, I think was going to be a big proof point as to whether or not the amount of capital was committed because These are three, four year commitments. If you go back to when the article was written on Stargate, which is the hypothesized $100 billion data center that OpenAI and Microsoft were talking about. That was a 2028 delivery. But at some point here in the next six to nine months, it's a go, no go. We already know that the 300,000 to 400,000 chip supercluster is going to be delivered end of next year, early 2026. But we probably need to see some evidence of success on this next model in order to get the next round of commitment.
this is a really powerful shift if we move from pre-training to inference time. And there are a couple of big ramifications. One, it better aligns revenue generation and expenditure. I think that is a really, really beneficial outcome for the industry at large, which is in the pre-training world, you are going to spend 20, 30, $40 billion on capex, train the model Over 9 to 12 months, do post-training, then roll it out, then hope to generate revenue off of that in inference. In a test time compute scaling world, you are now aligning your expenditures with the underlying usage of the model. So just from a pure efficiency and scalability on a financial side, this is much, much better for the hyperscalers. I think a second big implication, again, we have to say, we don't know that Pre training scaling is going to stop. But if you do see this shift towards inference time, I think that you need to start to think about how do you re-architecture the network design? Do you need million chip superclusters in energy low-cost land locations, or do you need smaller, lower latency, more efficient inference time data centers scattered throughout The country? And as you re-architect the network, the implications on power utilization, grid design, a lot of the, I would say, narratives that have underpinned huge swaths of the investment World, I think have to be rethought.

Here are some more notes on how test-time compute can impact data center capex momentum from the podcast series - Machine Learning Street Talk, where Jonas Hübotter, a German researcher, delves into his research on test-time compute (link to the episode).

Resource Allocation Based on Complexity Test-time inference suggests a future where compute resources are allocated dynamically based on task complexity rather than fixed model size. This means that less complex tasks may require fewer resources, while more complex problems could leverage greater computational power at inference time. This dynamic allocation contrasts with current monolithic models that demand substantial resources regardless of task complexity.

Hybrid Deployment Strategies Hybrid deployment strategies that merge local and cloud computation come into the picture. Less demanding tasks could be processed locally on laptops, while resource-intensive tasks would be delegated to cloud-based data centers. This approach optimizes resource utilization and reduces costs.

Reduced Need for Massive Pre-training The adoption of test-time inference could potentially reduce the need for extremely large pre-trained models. Smaller models, strategically augmented with data at test time, can outperform significantly larger models, reducing the energy consumption and infrastructure requirements of data centers.

Continuous Learning and Adaptation Test-time inference facilitates the creation of AI systems that continuously learn and adapt. Rather than fixed models, this approach supports architectures that evolve and adjust based on experience, leading to more efficient and adaptable data center designs that can handle changing workloads and demands.

The impact of AI on public companies is more than estimated publicly.

AI's influence on public markets extends far beyond tech companies. An estimated 40-45% of the market capitalization of all companies is now directly tied to AI in some way. This includes not just obvious tech players but also industrials, utilities, and global semiconductor companies like ASML and TSMC, showing how deeply AI has become embedded in the broader market's valuation story.

…so much of this story has been the spend, CapEx, the Strategic positioning, the quote unquote ROI on all this spend and how they're going to earn a return on this insane outlay of capital. Do you think that everything Jathan just said is well reflected in the stance and the pricing and the valuations of the public tech companies?
AI has permeated far broader into industrials, into utilities, and really makes up, I would argue, somewhere between 40% and 45% of the market Cap is a direct play on this. And if you even abstract to the rest of the world, you start bringing in ASML, you bring in TSMC, you bring in the entire Japanese chip sector. And so if you look at the cumulative market cap that is a direct play on artificial intelligence right now, it's enormous
And if you go back to when we talked probably four months ago, I would say that the distribution of outcomes has shifted. And at that point in time, pre-training and scaling on that axis was definitely the way…

What are the key trends in frontier models due to open-source models (like Llama) and test-time compute?

US Venture funds are seeing increased democratization in the model layer, with small teams of 2-5 people now able to reach the model frontier at a fraction of the previous cost, thanks to two key shifts. First, Meta's open-source Llama models have provided a powerful foundation that developers can freely modify and fine-tune for specific use cases. Second, the shift to test-time compute means teams no longer need billion-dollar budgets for training runs - they can now build competitive solutions with minimal capital, marking a return to the classic Silicon Valley garage startup model in the AI space.

…the story of technology innovation has been there's always been two to three people in a garage somewhere in Palo Alto doing something to catch up to incumbents very, very quickly. I think we're seeing that now in the model layer in a way that we haven't seen, frankly, in two years
Specifically, I think we still don't know 100% that pre-training and training scaling isn't coming back. We don't know that yet. But at the moment, at this plateauing time, we're starting to see these small teams catch up to the frontier. And what I mean by frontier is where are the state-of models, especially around text, performing? We're seeing these small teams of quite literally two to five people jumping to the frontier with spend that is not one order, but multiple orders of magnitude less than what these large Labs were spending to get there.
part of what's happened is the incredible proliferation of open source models. Specifically, what Meta has been doing with Llama has been an extraordinary force here. Llama 3.1 comes in three flavors, 405 billion, 70 billion, 8 billion. And then Llama 3.2 comes in 1 billion, 3 billion, 11 billion, and 90 billion. And you can take these models, download them, put them on local machine. You can put them in a cloud. You can put them on a server. And you can use these models to distill, fine-tune, train on top of, modify, etc., etc., and catch up to the frontier with pretty interesting algorithmic techniques. And because you don't need massive amounts of compute or you don't need massive amounts of data, you could be particularly clever and innovative about a specific vertical space or A specific technique or a particular use case to jump to the frontier very, very quickly. that is largely changing how I personally think about the model layer and potential early stage investments in the model layer.
And literally in six weeks, no, this could be true anymore. But if this state holds, which is that pre-training isn't scaling because of synthetic data, it just means that you can now do a lot more, jump to the frontier very quickly with a minimum Amount of capital, find your use case, find where you're most powerful. And then from that point onward, the hyperscalers frankly become best friends. Because today, if you are at the frontier, you're powering your use case, you're not particularly GPU constrained anymore, especially if you're going to pursue test time inference Or test time compute or something like that. And you're serving, let's say, 10 enterprise customers, or maybe it's a consumer solution that's optimized for a particular use case. The compute side of it just doesn't become as challenging as it was in 2022. In 2022, you would talk to these developers and it just became a question of, well, could you get 100,000 cluster together because we need to go train and then we have to go buy all these Data. And then even if you knew all the techniques, all of a sudden you would pencil it out and say, like, I need a billion dollars to get the first training run to go. And that just is not a model historically that's been the venture capital model. The venture capital model has been, could you get together a team of extraordinary people, have a technology breakthrough, be capital light, and jump way ahead of incumbents very Quickly and then somehow get a distribution foothold and go. At the model layer for the last two years, that certainly didn't seem like it was possible. And literally in the last six, eight weeks, that's definitively
test-time compute is significantly cheaper than pre-training for scaling models. The massive capital expenditures required for pre-training, potentially reaching tens of billions of dollars, are highlighted as a key limitation. In contrast, test-time compute aligns spending more directly with usage, making it a more financially sustainable approach, especially for smaller organizations. The cost of inference is mentioned as having dropped dramatically, even by a factor of 100x or 200x, making it negligible compared to the huge upfront costs of pre-training.

Llama gives Meta an unfair advantage (that it can milk in the long-term)

By getting the developer ecosystem to standardize on Llama's transformer architecture, Meta can effectively sett the technical standards for the entire AI stack - from hardware vendors to hyperscalers - making their architecture the default foundation for new AI development, similar to how Windows became the PC standard in the 90s.

What's really interesting about that is that regardless of whether Llama 4 is a step function from Llama 3, it kind of doesn't matter. If they push the boundaries of efficiency and get to a point where even if it's incrementally better, what it does to the developer landscape is pretty profound because the force of Lama today has been two things, and I think this has been very beneficial to Meta, is one, the transformer architecture that Lama is using is a sort of standard architecture, but it has Its own nuances. And if the entire developer ecosystem that's building on top of Llama is starting to just assume that that Llama 3 transformer architecture is the foundational and sort of standard Way of doing things, it's sort of standardizing the entire stack towards this Llama way of thinking, all the way from how the hardware vendors will support your training ones to the Hyperscalers and on and on and on. And so standardizing on Lama itself is starting to become more and more prevalent. And so if you were to start a new model company, what ends up happening is starting with Llama today is not only great because Lama is open source, it's also extraordinarily efficient Because the entire ecosystem is standardizing on that architecture.

OpenAI is the ‘standard’ AI app for consumers, but can it outrun ‘free’?

OpenAI's investment thesis is built around ChatGPT's dominant consumer mindshare and brand recognition (even children know ChatGPT but not Claude or Grok), rather than enterprise APIs. While this could make them wildly profitable if training costs decrease, they face a fundamental challenge: Google and Meta can give away models with similar AI capabilities for free to their billions of users. This creates a classic "can you outrun free?" dilemma, especially when competing against Google's Gemini, which will be integrated into Search and other Google products that billions already use daily.

So I think the interesting part for OpenAI was because they just raised the recent round and there was some fairly public commentary around what the investment case was. You're right. A lot of it oriented around the idea that they had escape velocity on the consumer side and that chat GPT was now the cognitive referent. And that over time, they would be able to aggregate an enormous consumer demand side and charge appropriately for that. And that it was much less a play on the enterprise API and application building. And that's super interesting. If you actually play out what we've talked about, when you look at their financials, if you take out training runs, if you take out the need for this massive upfront expenditure, this Actually becomes a wildly profitable company quite quickly in their projections
Now, then the question becomes, what's the defensibility of a company that is no longer step function advancing on the frontier? And there, I think this is ultimately going to come down to one, Google is also advancing on the frontier and they most likely will give the product away for free. And Meta, I think we could probably spend an entire episode just talking about Meta and the embedded optionality that they have on both the enterprise side and the consumer side, but Let's stick to the consumer side. This is a business that has over 3 billion consumer touch points. They are clearly rolling Meta AI out into various surfaces. It is not very difficult to see them building a search functionality. I've joked they should buy perplexity, but you've also just had the DOJ come out and say that Google should be forced to license their search index. I can think of no bigger beneficiary in the world than Meta having the opportunity or at marginal cost to take on Google search index. But the point is that I think there will be two very large scaled internet players giving away what essentially looks like chat GPT for free. So it will be a fascinating case study in can this product that has dominant consumer mindshare. My children know what chat GPT is. They have no idea what Claude is. My family knows what chat GPT is. They have no idea what Grock is. So I think for OpenAI, the question is, can you outrun free? And if you can, and training becomes less of an expense, this is going to be a really profitable company really quickly.

Anthropic is stuck in a no-man’s land.

Despite having superior technology and top talent, Anthropic is caught in a strategic no-man's land: they can't compete with OpenAI's consumer brand recognition, while Meta's open-source Llama makes it hard to capture enterprise value. Their recent $4B raise, insufficient for traditional pre-training scaling, suggests they're searching for a new direction.

Anthropic, I think they have an interesting dilemma, which is people think Sonnet 3.5 is possibly the best model out there. They have incredible technical talent. They keep ingesting more and more of OpenAI's researchers, and I think they're going to build great models, but they're kind of stuck. They don't have the consumer mindshare. And on the enterprise side, I think that Lama is going to make things very difficult for the frontier model builders to try to grab great value creation there. So they're stuck in the middle, wonderful technologists, great products, but not really a viable strategy. And you see they raise another $4 billion. To me, that's indicative that free training is not scaling so well because $4 billion is not anywhere close to what they're going to need if the scaling vector is free training. I don't have a good sense for what their strategic path forward is. I think they're stuck in the middle

What is their go-to when you have people who have staked their claim on the consumer side, and then you have an open source entity on the enterprise side that's every bit as formidable?

AI Apps are eating the SaaS platforms as stable models enable AI app developers to build with more degrees of freedom.

AI applications are seeing rapid enterprise adoption, with sales cycles collapsing from months to a demo + a pilot before a contract is signed. AI apps offer 10x improvements over existing SaaS solutions, not just incremental gains. This disruption mirrors previous tech revolutions (like the App Store and internet), with VCs investing at similar historic rates. Traditional SaaS incumbents aren’t sitting still but while they can bolt-on AI to their products, they cannot fundamentally rearchitect it unless they stop selling it for 2 years. The stabilization of the model layer (shift from adding more features to focusing on reasoning) is actually helping application developers who previously hesitated to build due to rapidly changing model capabilities.

the application vendors that have come out with production AI applications for both consumer and for enterprise have found that those solutions, which Can now only exist because of AI, are unlocking distribution in ways that was frankly not possible in the world of SaaS or prosumer SaaS or whatever. I'll give you a very specific example. With an AI-powered application, we're now going to CIOs at Fortune 500 companies showing these demos. And two years ago, there were really nice demos. Today, it's a really nice demo combined with five customer references of peers that are using it in production and experiencing great success. And what becomes very clear in that conversation is that what we're presenting is not a 5% improvement over an existing SaaS solution. It's about we can eliminate significant amounts of software spend and human capital spend and move this to this AI solution. And your 10x traditional ROI definition of software is easily justified and people get it within 30 minutes. And so you're starting to see these, what used to be a very long sales cycle for SaaS in AI applications, it's 15 minutes to a yes, 30 minutes to a yes. And then the procurement process for an enterprise completely changed it. Now the CIO says something like, let's put this in as quickly as possible. We're going to run a 30-day pilot. The minute that's successful, we're signing a contract and we're deploying right away.
We've made 25 investments in AI companies. And for a $500 million fund with five partners, that's an extraordinary pace. The last time we had that kind of a pace was, surprise, when the App Store came out in 2009. And then the pace that we had, that kind of pace was again in 95, 96 with the internet. And in between those, you see us in our pace being pretty slow. We average around maybe five to seven investments a year in non-disruptive times.
These are things like three, four years ago in SaaS was just completely out of the realm of possibility because you were competing against incumbents, you were competing against their Distribution advantage, their service advantage, and all this kind of stuff. And it was very hard to prove why your particular product was unique
And it's not that the incumbent software vendors are standing still. It's just that innovators dilemma in enterprise software is playing out much more aggressively in front of our eyes today than it is in consumer. I think in consumer, the consumer players recognize it and are moving it and are doing stuff about it. Whereas I think in enterprise, it's just, even if you recognize it, even if you have the desire to do something, the solutions are just not built in a way that is responsive to dramatic Re-architecture. Now, could we see this happening? Could a giant SaaS company just pause selling for two and completely re-architect their application stack? Sure, but I just don't see that happening.
And frankly, the model layer stabilizing is a huge boon for this application layer, primarily because as an application developer, you were sitting there watching the model layer Take step function leaps every year. And you kind of didn't know what to build and what you should just wait on building. Because obviously you wanted to be completely aligned with a model layer. Because the model layers are now moving to reasoning, this is a great place for an application developer

AI Inference Costs have dropped by 100x to 200x in the last two years.

Inference costs dropped by 100-200x since 2022, making them nearly negligible for most use cases. Companies are using smart algorithms to intelligently route tasks between small and frontier models, using basic models for simple tasks and reserving expensive frontier models only for complex prompts, achieving SaaS-beating 95% gross margins.

In the private markets, one of the things that's happening is just the dramatic drop in prices of just compute, whether it's inference or training or whatever, because it's just becoming Way more available. If you're sitting here today as an application developer versus two years ago, the cost of inference of these models is down 100x, 200x.
We were looking at cost curves in the first wave of application companies that we funded in 2022. You look at the inference costs and it would be like $15 to $20 per million tokens on the latest frontier models. And today, most companies don't even think about inference costs because it's just like, well, we've broken this task up and then we're using these small models for these tasks that Are pretty basic. And then we're like, the stuff we're hitting with the most frontier models are these like very few prompts. And the rest of the stuff, we've just created this intelligent routing system. And so our cost of inference is essentially zero and our gross margin for this task is 95%. You just look at that and you're just like, wow, that is a totally different way to think about application gross margins than what we've had to do with SaaS and what we've had to do with Basically software for the last decade plus.
And it starts with people that provide inference. It starts with the tooling and the orchestration layer. So we have a portfolio company that's extremely popular called LinkChain and the inference layer, we have Fireworks. These kinds of companies are seeing extraordinary usage by developers. And then all the way up the stack to the applications themselves.

What to look for in the next 6 months in the AI training/inference landscape?

Two potential game-changers: a breakthrough in synthetic data that could revive massive pre-training (and $100B training clusters) or major advances in video and audio AI models. While text data is largely exhausted, video and audio still have vast untapped potential, making them the likely frontier for the next wave of AI innovations.

On the positive side, if somebody came out with the results that pre-training was back on and there was a huge breakthrough on synthetic data and all of a sudden it's go-go again and $10 Billion and $100 billion cluster was back on the table, you would go back, but all of a sudden the paradigm shift would be wild. All of a sudden, we would now be talking about $100 billion super cluster that was going to pre-train. And then obviously, if my expectation comes out that next year we're going to call AGI, we're going to have AGI and we're building a $100 billion cluster because we had a breakthrough On synthetic data and it all just works. And we can just simulate everything.
I think another scenario is it's pretty clear now that while we've exhausted data on text, we are not close to exhausting data on video and audio. And I think that it's still TBD on what these models are capable of on new forms of modes. And so we just don't know because the focus hasn't been there. But now you're starting to see large labs talk more about audio and video, what these models will be capable of from a human interaction perspective, I think it's going to be pretty amazing.

Hope you found this edition of

Updating Your Priors