Back to list
This article was auto-translated.View original (中文)
Tech2mo ago

DeepSeek Quietly Modifies Interface: Free AI May Not Last Long

In the early hours of April 8th, DeepSeek quietly launched a new tiered interface with "Fast Mode" and "Expert Mode" options, alongside a grayscale-tested "Vision Mode." This is widely seen as preparation for the release of their next-generation V4 model, but the tiered system itself is more noteworthy – a mechanism for on-demand computing resource allocation, reducing unnecessary token consumption and lowering overall costs. The article discusses the growing "compute anxiety" among large model companies, exemplified by Anthropic's changes to its Claude subscription and the increasing focus on token efficiency.

DeepSeek Quietly Modifies Interface: Free AI May Not Last Long

图|Deepseek网页版截图

The new functional division is quite clear: Fast Mode is for everyday conversations and low-latency responses; Expert Mode is for complex reasoning and in-depth tasks, potentially triggering longer inference times but slower responses; and Vision Mode enables multimodal capabilities including image input, though it’s currently in limited grayscale testing.

Outsiders generally believe this is a pre-launch warm-up for the next-generation V4 model. However, compared to releasing a new model, the current tiered system is perhaps more worthy of attention. As a scheduling mechanism for "on-demand computing power," it assigns simple tasks to low-cost pathways, activating high-compute inference only when necessary, thereby reducing wasteful token consumption and achieving a structural reduction in overall costs.

The Compute Anxiety of Large Model Companies

About a week ago, Anthropic announced that, starting April 5th, its Claude subscription service would no longer cover third-party integration tools, including Lobster. Users wishing to continue using the model would have to switch to a pay-as-you-go plan, separate from the subscription service, and pay an additional fee.

The underlying logic is easy to understand. Following Huang Renxun’s emphasis on “token economics” at the GTC conference, global tech giants have turned token consumption into a performance metric. Some domestic internet giants have even created monthly token consumption rankings, with the "token consumption theory" gaining prominence.

According to Anthropic, the subscription pricing model was originally designed based on the “normal usage intensity of individual users.” However, the usage intensity of automated agent tools like OpenClaw far exceeded expectations—some heavy users paid only $200 per month for a subscription but consumed $5,000 worth of computing resources, putting significant cost pressure on Anthropic.

Lu Fulai, head of Xiaomi AI and a former core member of DeepSeek, deconstructed this concept, believing that Anthropic had finally escaped a pitfall. She published a lengthy article on social media platform X, arguing that global compute supply is failing to keep pace with the growth rate of token demand created by Agents. The real solution isn’t cheaper tokens, but rather the synergistic evolution of “higher token efficiency Agent frameworks” and “more powerful and efficient models.”

Industry data shows that as of March 2026, China’s AI large model daily token call volume has exceeded 140 trillion, more than a thousand times the level at the beginning of 2024.

Lu Fulai calculated that, converted to API pricing, the actual cost of these frameworks is roughly dozens of times the subscription price. She believes this gap is “not a deficit, but a pitfall.”

Even more noteworthy for domestic AI companies is that Anthropic announced on April 7th that its annualized revenue (ARR) had exceeded $30 billion, officially surpassing OpenAI’s $25 billion.

From $9 billion at the end of 2025 to $30 billion now, it has achieved a 233% explosive growth in just over three months. Even so, Anthropic is still meticulously calculating its finances.

In Lu Fulai’s view, the true value of Anthropic’s ban on “Lobster” lies in making inefficient costs visible, thereby forcing the entire ecosystem towards engineering self-discipline. Short-term pain isn’t a bad thing; it will encourage framework developers to seriously improve context management, maximize prompt cache hit rates, and reduce wasteful token consumption.

Releasing New Models May Not Be So Important Right Now

DeepSeek R1’s initial brilliance was also due to its innovative architecture, which greatly saved tokens. Although DeepSeek was the source of low-priced tokens at the time, its original intention was never to engage in a price war; it was later adopters who turned this innovation into a price war game.

The popularity at the beginning of 2025 also left DeepSeek repeatedly facing capacity shortages and frequent downtime.

After the first wave of large-scale users flooded in, a DeepSeek insider told Phoenix Technology that, due to insufficient resources at the time, users appeared to be limited in their usage. Later, through internal optimization methods, resources were reallocated.

However, this internal architectural innovation is no longer sufficient to meet current token call demands.

Guojin Securities pointed out in a research report that the supply and demand of computing power are sending key signals—demand is expanding exponentially, while supply is constrained by chip export controls and cost constraints, making synchronous expansion difficult.

The free model has become an accelerator of this crisis. The operating costs of large models are extremely high, and the free model has always lagged user growth in terms of platform compute capacity expansion.

Since the beginning of 2026, DeepSeek has experienced at least seven large-scale service interruptions. From the evening of March 29th to the morning of March 30th, the platform once again suffered a global crash, with both the web and APP ends being unavailable for approximately 12 hours, until it was restored to normal at 9:13 am the next day.

Perhaps under pressure, DeepSeek quietly updated its dialogue interface on April 8th, adding “Fast Mode” and “Expert Mode” options above the input box. Industry insiders believe that the tiered design can alleviate peak pressure by diverting computing power and pave the way for subsequent construction of a paid system and quota restrictions.

Not long ago, OpenAI announced the shutdown of Sora, refocusing its limited computing resources on core services. This, along with DeepSeek’s tiered approach and Anthropic’s peak limiting measures, reveals a reality: demand growth has far exceeded the expansion capacity of infrastructure.

The “Elephant in the Room” of the AI Track

From DeepSeek’s unsustainable free model to Anthropic’s ban and Lu Fulai’s price war warning, these seemingly independent events all point to the same structural contradiction: token usage in the AI track is expanding exponentially.

Overseas AI data centers are aggressively purchasing storage chips and issuing bills to Wall Street, like an endless gambling game.

In fact, it’s not just chips; the power crisis is also compounding: AI computing power accounts for 46% of the growth rate of total social electricity consumption, far exceeding the overall growth rate of 6.1%, and insufficient power elasticity has become a hard constraint.

Against this backdrop, the industry is undergoing a paradigm shift from “burning money for users” to “refined operation of computing power.” Alibaba Cloud and Tencent Cloud had already started raising compute prices earlier, with the highest increase reaching 34%. But it’s not so much a price increase as it is a restoration of normal pricing after removing the discounts from the price war period.

On April 8th, during the release of ZhiPu’s flagship open-source model GLM-5.1, it raised prices by another 10%, having already raised prices twice before.

If the keywords in the large model industry over the past two years have been “scale” and “speed,” then the keywords have quietly become two words: “cost.”

Even overseas star companies like OpenAI and Anthropic are currently in a high-investment phase, with huge expenses on computing power, talent, and infrastructure. While continuing to rely on financing, they must answer a realistic question: when will this business become self-sufficient?

Therefore, the industry has begun to see a clear shift: when AI starts making money, the first step isn’t to earn more, but to lose less.

A group of players represented by OpenAI are choosing a more aggressive route: rapid product iteration, capability first, open ecosystem, while maintaining expansion through continuous financing. Another group, represented by Anthropic, is clearly more restrained, focusing on cost structure, stability, and enterprise services, and improving efficiency through engineering optimization.

The difference between the two can be simply understood as: one is “do it first and then talk,” and the other is “calculate it clearly before doing it.”

This change will also have a direct impact on ordinary users.

First, API prices may not fall significantly as many people expect. Although unit prices are decreasing, the pressure to control costs has not disappeared, and companies are more likely to digest costs through structural optimization rather than unlimited price cuts.

Second, free quotas and subsidies may gradually be tightened. The stage of relying on “burning money for growth” is coming to an end. When every token needs to be precisely measured, generous free strategies become unsustainable.

Third, in terms of experience, users may also feel changes: model responses will be more restrained and concise; long texts, complex reasoning, or frequent calls may be more strictly limited or tiered pricing. The “shorter answers” you see are often not because the model has become “lazy,” but because the system is proactively optimizing costs.

In a sense, the moment a token is saved, the cost doesn’t disappear, but is redistributed—flowing between model manufacturers, enterprise customers, and end users.

Ultimately, AI is completing a transformation from “experiment” to “commodity.” Large models are not purely technical issues, but a capital-intensive business. When the myth of growth fades, accounting becomes the most core, realistic, and unavoidable problem.

This is the true industry logic behind “squeezing tokens.”