DeepSeek API Input Cache Price Reduction: Only 1/10 of the Initial Price
DeepSeek, a domestic large language model, has officially announced a price reduction for its entire API input cache hit to 1/10 of the initial price. Combined with a limited-time discount, the V4-Pro cache input is as low as 0.025 yuan/million tokens, setting a new low price for global large models. This price adjustment covers the entire DeepSeek-V4-Pro and V4-Flash series, with the core reduction focused on input cache hit scenarios.

DeepSeek-V4-Pro: Reduced from 1 yuan to 0.1 yuan/million tokens, with an additional 2.5x discount until May 5th, resulting in an actual payment of 0.025 yuan.
DeepSeek-V4-Flash: Reduced from 0.2 yuan to 0.02 yuan/million tokens, with no additional discount.
Cache miss and output prices are also reduced accordingly: V4-Pro input (miss) 3 yuan, output 6 yuan; V4-Flash input (miss) 1 yuan, output 2 yuan, all 1/4 of the original price.
This price reduction directly addresses a major industry pain point, with the cache input price being only 1/700 of GPT-5.5 Pro, significantly reducing costs for enterprises in long-text and high-frequency call scenarios.
For applications with high cache hit rates, such as RAG knowledge bases, intelligent customer service, and document analysis, costs can be reduced by more than 90%.
Technically, DeepSeek-V4 adopts a self-developed sparse attention architecture, supporting ultra-long contexts of 160k, and leads in long-text processing efficiency. It is currently adapted to 8 major cloud platforms including Huawei Cloud and Alibaba Cloud, as well as numerous intelligent computing centers.
Industry insiders believe that DeepSeek's move will reshape the industry's pricing system, accelerate the popularization of AI applications, force overseas models to lower prices, and consolidate the cost advantages of domestic large models.