Kai-Shen Huang
In December 2024, the Hangzhou-based AI company DeepSeek released its V3 model, igniting a firestorm of debate. The result has been dubbed “China’s AI Shock.”
DeepSeek-V3’s comparable performance to its U.S. counterparts such as GPT-4 and Claude 3 at lower costs casts doubt on the U.S. dominance over AI capabilities, undergirded by the United States’ current export control policy targeting advanced chips. It also called into question the entrenched industry paradigm, which prioritizes heavy hardware investments in computing power. To echo U.S. President Donald Trump’s remarks, the emergence of DeepSeek represents not just “a wake-up call” for the tech industry but also a critical juncture for the United States and its allies to reassess their technology policy strategies.
What, then, does DeepSeek seem to have disrupted? The cost efficiencies claimed by DeepSeek for its V3 model are striking: its total training cost is only $5.576 million, a mere 5.5 percent of the cost for GPT-4, which stands at $100 million. The training was completed using 2,048 NVIDIA GPUs, achieving resource efficiency eight times greater than U.S. companies, which typically require 16,000 GPUs. This was accomplished using the less advanced H800 GPUs instead of the superior H100, yet DeepSeek delivered comparable performance.
No comments:
Post a Comment