Matt Sheehan and Sam Winter-Levy
Why is DeepSeek causing global technology shockwaves?
Matt Sheehan: DeepSeek is a Chinese AI startup that recently released a series of very impressive generative AI models. One of those models, DeepSeek R1, is a “reasoning model” that takes its time to think through an extended chain of logic before it gives an answer. This type of reasoning is a relatively new paradigm that was pioneered by OpenAI last year, and it is viewed by many as the most promising way forward for AI research. In terms of performance, DeepSeek’s new model is roughly on par with OpenAI’s o1 model from last September.
The “reckoning” here comes from how DeepSeek did it: quickly, cheaply, and openly. DeepSeek had finished an initial version of R1 just a couple months after OpenAI’s release, far faster than Chinese companies were able to catch up to U.S. models in previous years. Perhaps most shocking was that DeepSeek was able to generate this performance using far less computing power—a key input for training a model—than U.S. companies. That extraordinary efficiency is likely a knock-on effect of U.S. export controls on chips: Chinese companies have been forced to get very creative with their limited computing resources. And finally, DeepSeek released its model in a relatively open source way, allowing anyone with a laptop and an internet connection to download it for free. That has thrown into doubt lots of assumptions about business models for AI companies and led to the turmoil in U.S. stock markets.
Sam Winter-Levy: Just to give you a sense of DeepSeek’s efficiency, the company claims it trained its model for less than $6 million, using only about 2,000 chips. That’s an order of magnitude less money than what Meta, for example, spent on training its latest system, which used more than 16,000 chips. Now DeepSeek’s cost estimate almost certainly only captures the marginal cost: It ignores their expenditures on building the data centers, buying the chips in the first place, and hiring a large technical team. But regardless, it’s clear that DeepSeek managed to train a highly capable model more efficiently than its U.S. competitors.