Understanding the Gap Between DeepSeek V4 and Top US AI Models in One Chart: 8 Months Behind

Many tests have already evaluated the capabilities of DeepSeek V4. A research report by three senior researchers from the US Council on Foreign Relations showed it lags behind top US large models by approximately 7 months.

Now, the Artificial Intelligence Standards and Innovation Center (CAISI) under the US National Institute of Standards and Technology (NIST) has also evaluated DeepSeek V4. Their conclusion is that DeepSeek V4 lags behind the US by about 8 months, similar to the previous gap.

In their AI capability assessment results, DeepSeek V4 scored 800 points, while the current strongest model is GPT-5.5, with a score exceeding 1200 points. GPT-5.4 and Opus 4.6 also scored above 1000 points.

DeepSeek V4's overall performance is comparable to GPT-5 from 8 months ago, but DeepSeek officials previously stated in their release report that it was comparable to GPT-5.4.

However, CAISI also acknowledged that DeepSeek V4 is the strongest Chinese large AI model they have evaluated, performing well in nine tests across five areas: networking, software engineering, natural sciences, abstract reasoning, and mathematics.

More importantly, DeepSeek V4 offers better value for money. Even compared to the most cost-effective GPT-5.4 mini large model in the US, DeepSeek V4 has lower costs in 4 out of 7 benchmark tests, ranging from 41% to 53% better.