It took about a period for the finance planet to start freaking out about DeepSeek, but erstwhile it did, it took more than half a trillion dollars — or one full Stargate — off Nvidia’s marketplace cap. It wasn’t just Nvidia, either: Tesla, Google, Amazon, and Microsoft tanked.
DeepSeek’s 2 AI models, released in fast succession, put it on par with the best available from American labs, according to Alexandr Wang, Scale AI CEO. And DeepSeek seems to be working within constraints that mean it trained much more cheaply than its American peers. 1 of its fresh models is said to cost just $5.6 million in the final training run, which is about the wage an American AI expert can command. Last year, Anthropic CEO Dario Amodei said the cost of training models ranged from $100 million to $1 billion. OpenAI’s GPT-4 cost more than $100 million, according to CEO Sam Altman. DeepSeek seems to have just upended our thought of how much AI costs, with possibly tremendous implications across the industry.
This has all happened over just a fewer weeks. On Christmas Day, DeepSeek released a reasoning model (v3) that caused quite a few buzz. Its second model, R1, released last week, has been called “one of the most amazing and awesome breakthroughs I’ve always seen” by Marc Andreessen, VC and advisor to president Donald Trump. The advances from DeepSeek’s models show that “the AI race will be very competitive,” says Trump’s AI and crypto czar David Sacks. Both models are partially open source, minus the training data.
DeepSeek’s successes call into question whether billions of dollars in compute are actually required to win the AI race. The conventional wisdom has been that large tech will dominate AI simply due to the fact that it has the spare cash to pursuit advances. Now, it looks like large tech has simply been lighting money on fire. Figuring out how much the models actually cost is simply a small tricky because, as Scale AI’s Wang points out, DeepSeek may not be able to talk honestly about what kind and how many GPUs it has — as the consequence of sanctions.
Even if critics are correct and DeepSeek isn’t being truthful about what GPUs it has on hand (napkin math suggests the optimization techniques utilized means they are being truthful), it won’t take long for the open-source community to find out, according to Hugging Face’s head of research, Leandro von Werra. His squad started working over the weekend to replicate and open-source the R1 recipe, and erstwhile researchers can make their own version of the model, “we’re going to find out beautiful rapidly if numbers add up.”
Led by CEO Liang Wenfeng, the two-year-old DeepSeek is China’s premier AI startup. It spun out from a hedge fund founded by engineers from Zhejiang University and is focused on “potentially game-changing architectural and algorithmic innovations” to build artificial general intelligence (AGI) — or at least, that’s what Liang says. Unlike OpenAI, it besides claims to be profitable.
In 2021, Liang started buying thousands of Nvidia GPUs (just before the US put sanctions on chips) and launched DeepSeek in 2023 with the goal to “explore the essence of AGI,” or AI that’s as intelligent as humans. Liang follows quite a few the same lofty talking points as OpenAI CEO Altman and another manufacture leaders. “Our destination is AGI,” Liang said in an interview, “which means we request to survey fresh model structures to realize stronger model capability with limited resources.”
So, that’s precisely what DeepSeek did. With a fewer innovative method approaches that allowed its model to run more efficiently, the squad claims its final training run for R1 cost $5.6 million. That’s a 95 percent cost reduction from OpenAI’s o1. alternatively of starting from scratch, DeepSeek built its AI by utilizing existing open-source models as a starting point — specifically, researchers utilized Meta’s Llama model as a foundation. While the company’s training data mix isn’t disclosed, DeepSeek did mention it utilized synthetic data, or artificially generated information (which might become more crucial as AI labs seem to hit a data wall).
Without the training data, it isn’t precisely clear how much of a “copy” this is of o1
Without the training data, it isn’t precisely clear how much of a “copy” this is of o1 — did DeepSeek usage o1 to train R1? Around the time that the first paper was released in December, Altman posted that “it is (relatively) easy to copy something that you know works” and “it is highly hard to do something new, risky, and hard erstwhile you don’t know if it will work.” So the claim is that DeepSeek isn’t going to make fresh frontier models; it’s simply going to replicate old models. OpenAI investor Joshua Kushner also seemed to say that DeepSeek “was trained off of leading US frontier models.”
R1 utilized 2 key optimization tricks, erstwhile OpenAI policy investigator Miles Brundage told The Verge: more efficient pre-training and reinforcement learning on chain-of-thought reasoning. DeepSeek found smarter ways to usage cheaper GPUs to train its AI, and part of what helped was utilizing a new-ish method for requiring the AI to “think” step by step through problems utilizing trial and mistake (reinforcement learning) alternatively of copying humans. This combination allowed the model to accomplish o1-level performance while utilizing way little computing power and money.
“DeepSeek v3 and besides DeepSeek v2 before that are fundamentally the same kind of models as GPT-4, but just with more clever engineering tricks to get more bang for their buck in terms of GPUs,” Brundage said.
To be clear, another labs employment these techniques (DeepSeek utilized “mixture of experts,” which only activates parts of the model for certain queries. GPT-4 did that, too). The DeepSeek version innovated on this concept by creating more finely tuned expert categories and developing a more efficient way for them to communicate, which made the training process itself more efficient. The DeepSeek squad besides developed something called DeepSeekMLA (Multi-Head Latent Attention), which dramatically reduced the memory required to run AI models by compressing how the model stores and retrieves information.
What is shocking the planet isn’t just the architecture that led to these models but the fact that it was able to so rapidly replicate OpenAI’s achievements within months, alternatively than the year-plus gap typically seen between major AI advances, Brundage added.
OpenAI positioned itself as uniquely capable of building advanced AI, and this public image just won the support of investors to build the world’s biggest AI data center infrastructure. But DeepSeek’s fast replication shows that method advantages don’t last long — even erstwhile companies effort to keep their methods secret.
“These close sourced companies, to any degree, they evidently live off people reasoning they’re doing the top things and that’s how they can keep their valuation. And possibly they overhyped a small bit to rise more money or build more projects,” von Werra says. “Whether they overclaimed what they have internally, nobody knows, evidently it’s to their advantage.”
The investment community has been delusionally bullish on AI for some time now — beautiful much since OpenAI released ChatGPT in 2022. The question has been little whether we are in an AI bubble and more, “Are bubbles actually good?” (“Bubbles get an unfairly negative connotation,” wrote DeepWater Asset Management, in 2023.)
It’s not clear that investors realize how AI works, but they nevertheless anticipate it to provide, at minimum, broad cost savings. Two-thirds of investors surveyed by PwC anticipate productivity gains from generative AI, and a akin number anticipate an increase in profits as well, according to a December 2024 report.
The public company that has benefited most from the hype cycle has been Nvidia, which makes the sophisticated chips AI companies use. The thought has been that, in the AI gold rush, buying Nvidia stock was investing in the company that was making the shovels. No substance who came out dominant in the AI race, they’d request a stockpile of Nvidia’s chips to run the models. On December 27th, the shares closed at $137.01 — almost 10 times what Nvidia stock was worth at the beginning of January 2023.
DeepSeek’s success upends the investment explanation that drove Nvidia to sky-high prices. If the company is indeed utilizing chips more efficiently — alternatively than simply buying more chips — another companies will start doing the same. That may mean little of a marketplace for Nvidia’s most advanced chips, as companies effort to cut their spending.
“Nvidia’s growth expectations were definitely a small ‘optimistic’ so I see this as a essential reaction,” says Naveen Rao, Databricks VP of AI. “The current gross that Nvidia makes is not likely under threat; but the massive growth experienced over the last couple of years is.”
Nvidia wasn’t the only company that was boosted by this investment thesis. The Magnificent 7 — Nvidia, Meta, Amazon, Tesla, Apple, Microsoft, and Alphabet — outperformed the remainder of the marketplace in 2023, inflating in value by 75 percent. They continued this staggering bull run in 2024, with all company but Microsoft outperforming the S&P 500 index. Of these, only Apple and Meta were untouched by the DeepSeek-related rout.
The craze hasn’t been limited to the public markets. Startups specified as Anthropic and OpenAI have besides hit dizzying valuations — $157 billion and $60 billion, respectively — as VCs have dumped money into the sector. Profitability hasn’t been as much of a concern. OpenAI expected to lose $5 billion in 2024, even though it estimated gross of $3.7 billion.
DeepSeek’s success suggests that just splashing out a ton of money isn’t as protective as many companies and investors thought. It hints tiny startups can be much more competitive with the behemoths — even disrupting the known leaders through method innovation. So while it’s been bad news for the large boys, it might be good news for tiny AI startups, peculiarly since its models are open source.
Just as the bull run was at least partially psychological, the sell-off may be, too. Hugging Face’s von Werra argues that a cheaper training model won’t actually reduce GPU demand. “If you can build a super strong model at a smaller scale, why wouldn’t you again scale it up?” he asks. “The natural thing that you do is you figure out how to do something cheaper, why not scale it up and build a more costly version that’s even better.”
Optimization as a necessity
But DeepSeek isn’t just rattling the investment scenery — it’s besides a clear shot across the US’s bow by China. The advances made by the DeepSeek models propose that China can catch up easy to the US’s state-of-the-art tech, even with export controls in place.
The export controls on state-of-the-art chips, which began in earnest in October 2023, are comparatively new, and their full effect has not yet been felt, according to RAND expert Lennart Heim and Sihao Huang, a PhD candidate at Oxford who specializes in industrial policy.
The US and China are taking other approaches. While China’s DeepSeek shows you can innovate through optimization despite limited compute, the US is betting large on natural power — as seen in Altman’s $500 billion Stargate task with Trump.
“Reasoning models like DeepSeek’s R1 require quite a few GPUs to use, as shown by DeepSeek rapidly moving into problem in serving more users with their app,” Brundage said. “Given this and the fact that scaling up reinforcement learning will make DeepSeek’s models even stronger than they already are, it’s more crucial than always for the US to have effective export controls on GPUs.”
For others, it feels like the export controls backfired: alternatively of slowing China down, they forced innovation.
DeepSeek’s chatbot has surged past ChatGPT in app store rankings, but it comes with serious caveats. Startups in China are required to submit a data set of 5,000 to 10,000 questions that the model will decline to answer, about half of which relate to political ideology and criticism of the Communist Party, The Wall Street Journal reported. The app blocks discussion of delicate topics like Taiwan’s democracy and Tiananmen Square, while user data flows to servers in China — raising both censorship and privacy concerns.
There are some people who are skeptical that DeepSeek’s achievements were done in the way described. “We question the notion that its feats were done without the usage of advanced GPUs to fine tune it and/or build the underlying LLMs the final model is based on,” says Citi analyst Atif Malik in a investigation note. “It seems categorically false that ‘China duplicated OpenAI for $5M’ and we don’t think it truly bears further discussion,” says Bernstein analyst Stacy Rasgon in her own note.
For others, it feels like the export controls backfired: alternatively of slowing China down, they forced innovation. While the US restricted access to advanced chips, Chinese companies like DeepSeek and Alibaba’s Qwen found creative workarounds — optimizing training techniques and leveraging open-source technology while developing their own chips.
Doubtless individual will want to know what this means for AGI, which is understood by the savviest AI experts as a pie-in-the-sky pitch meant to woo capital. (In December, OpenAI’s Altman notably lowered the bar for what counted as AGI from something that could “elevate humanity” to something that will “matter much less” than people think.) due to the fact that AI superintelligence is inactive beautiful much just imaginative, it’s hard to know whether it’s even possible — much little something DeepSeek has made a reasonable step toward. In this sense, the whale logo checks out; this is an manufacture full of Ahabs. The end game on AI is inactive anyone’s guess.
The future AI leaders asked for
AI has been a communicative of excess: data centers consuming energy on the scale of tiny countries, billion-dollar training runs, and a communicative that only tech giants could play this game. For many, it feels like DeepSeek just blew that thought apart.
While it might seem that models like DeepSeek, by reducing training costs, can solve environmentally ruinous AI — it isn’t that simple, unfortunately. Both Brundage and von Werra agree that more efficient resources mean companies are likely to usage even more compute to get better models. Von Werra besides says this means smaller startups and researchers will be able to more easy access the best models, so the request for compute will only rise.
DeepSeek’s usage of synthetic data isn’t revolutionary, either, though it does show that it’s possible for AI labs to make something useful without robbing the full internet. But that harm has already been done; there is only 1 internet, and it has already trained models that will be foundational to the next generation. Synthetic data isn’t a complete solution to uncovering more training data, but it’s a promising approach.
The most crucial thing DeepSeek did was simply: be cheaper. You don’t should be technically inclined to realize that powerful AI tools might shortly be much more affordable. AI leaders have promised that advancement is going to happen quickly. 1 possible change may be that individual can now make frontier models in their garage.
The race for AGI is mostly imaginary. Money, however, is real enough. DeepSeek has commandingly demonstrated that money alone isn’t what puts a company at the top of the field. The longer-term implications for that may reshape the AI manufacture as we know it.