DeepSeek API Input Cache Price Reduction: Only 1/10 of the Initial Price

DeepSeek-V4-Pro: Reduced from 1 yuan to 0.1 yuan/million tokens, with an additional 2.5x discount until May 5th, resulting in an actual payment of 0.025 yuan.

DeepSeek-V4-Flash: Reduced from 0.2 yuan to 0.02 yuan/million tokens, with no additional discount.

Cache miss and output prices are also reduced accordingly: V4-Pro input (miss) 3 yuan, output 6 yuan; V4-Flash input (miss) 1 yuan, output 2 yuan, all 1/4 of the original price.

This price reduction directly addresses a major industry pain point, with the cache input price being only 1/700 of GPT-5.5 Pro, significantly reducing costs for enterprises in long-text and high-frequency call scenarios.

For applications with high cache hit rates, such as RAG knowledge bases, intelligent customer service, and document analysis, costs can be reduced by more than 90%.

Technically, DeepSeek-V4 adopts a self-developed sparse attention architecture, supporting ultra-long contexts of 160k, and leads in long-text processing efficiency. It is currently adapted to 8 major cloud platforms including Huawei Cloud and Alibaba Cloud, as well as numerous intelligent computing centers.

Industry insiders believe that DeepSeek's move will reshape the industry's pricing system, accelerate the popularization of AI applications, force overseas models to lower prices, and consolidate the cost advantages of domestic large models.