DeepSeek V4 Parameter Size Expected to Reach 1.6 Trillion, 60% Higher Than Anticipated

However, the more they deny it, the more interested people become in DeepSeek V4, as this update to DeepGEMM has many highlights and is unlikely to be unrelated to the V4 large model.

This update supports FP8_FP4 hybrid operators and optimizes support for NVIDIA Blackwell, with architectural upgrades mainly focusing on Mega MoE and HyperConnection. Mega MoE has the potential to bring a significant upgrade to the MoE architecture.

Mega MoE has many advantages, as explained online. Analysis of Gemini suggests that V4's number of activation experts will be significantly higher than V3's 256, possibly in the thousands. This would greatly improve V4's performance while maintaining flexibility and avoiding excessive demands on computing power and memory.

More importantly, this update to DeepGEMM also hints at the parameter size of the V4 large model. Netizens estimate that a single layer of MoE is approximately 25.37B. If it still has 60 layers, V4 is likely to be a 1.6T large model, or at the very least a 1.25T large model with 48 layers.

Compared to previous rumors of V4 being a 1T parameter model, a 1.6T parameter size means it is 60% higher than previously expected, making its performance highly anticipated.

Even if 1.6T is not achieved, a 1.25T parameter size will be double the 67 billion parameters of the current V3, and still worth looking forward to. After all, implementing Mega MoE technology with thousands of activation experts would be a transformation and a milestone event in the development of MoE architecture large models.