13 February 2025

AI’s Efficiency Wars Have Begun - Analysis

Sarosh Nagar and David Eaves

The rapid release of DeepSeek-R1—one of the newest models by Chinese AI firm DeepSeek—sent the world into a frenzy and the Nasdaq into a dramatic plunge. The reason is simple— DeepSeek-R1, a type of artificial intelligence reasoning model that takes time to “think” before it answers questions, is up to 50 times cheaper to run than many U.S. AI models. Distilled versions of it can also run on the computing power of a laptop, while other models require several of Nvidia’s most expensive chips. But what has really turned heads is DeepSeek’s claim that it only spent about $6 million to finally train its model—much less than OpenAI’s o1. While this figure is misleading and does not include the substantial costs of prior research, refinement, and more, even partial cost reductions and efficiency gains may have significant geopolitical implications.

So, why is DeepSeek-R1 so much cheaper to train, run, and use? The answer lies in several computational efficiency improvements made to the R1 model. First, R1 used a different machine learning architecture called “mixture of experts,” which divides a larger AI model into smaller subnetworks, or “experts.” This approach means that when given a prompt, RI only needs to activate the experts relevant to a given task, greatly decreasing its computational costs.

No comments: