Back to list
This article was auto-translated.View original (中文)
Tech1mo ago

After 5 Months of "Red Alert," GPT Image 2 Sweeps the Leaderboards, Surpassing Google by a Significant Margin

GPT Image 2 achieved a clean sweep upon its release, topping the leaderboards in Text-to-Image, Single-Image Edit, and Multi-Image Edit within 12 hours. OpenAI's official statement described it as a "clean sweep." The model boasts a 93% win rate in blind tests against other models and is being hailed as a generational leap forward. This comes after a period where Google's Nano Banana models had gained significant traction, prompting OpenAI to issue a "code red" internal memo and refocus resources on ChatGPT.

After 5 Months of "Red Alert," GPT Image 2 Sweeps the Leaderboards, Surpassing Google by a Significant Margin

Text-to-Image leaderboard: GPT Image 2 with 1512 points, Nano Banana 2 with 1271 points. A gap of 241 points, the largest in Arena history.

“No model has ever dominated the Image Arena with such a margin,” Arena officials stated.

In all blind tests on Image Arena, GPT Image 2 has a 93% win rate: 93 out of 100 paired images were chosen by people as the OpenAI version.

“If DALL-E is cave paintings, and Images 1.0 is ancient art, then Images 2.0 is the Renaissance.”

OpenAI introduced Images 2.0 with this description at the launch event, and Altman called it a generational upgrade:

It’s like jumping from GPT-3 to GPT-5 overnight.

https://www.youtube.com/watch?v=sWkGomJ3TLI

OpenAI’s official API documentation gave Images 2.0 its highest praise.

https://developers.openai.com/api/docs/models/gpt-image-2

But the real story isn’t in the data.

Pressed for Half a Year by Google

OpenAI Finally Strikes Back

Rewind to August 2025.

Google released Nano Banana. This image generation model embedded in Gemini instantly went viral.

At the Q3 earnings call three months later, Google CEO Sundar Pichai revealed a set of figures: Gemini monthly active users rose from 450 million in July to 650 million in October.

Josh Woodward, head of Google Labs, said that much of this growth came from the image generation boom driven by Nano Banana.

In November, Google released Nano Banana Pro. Its text rendering capabilities were stunning, and AI images could finally spell words correctly, surpassing OpenAI in the consumer market.

On November 18th, Google delivered another blow. Gemini 3 was released and immediately topped the LM Arena with 1501 points, becoming the first leading model to break 1500.

At the end of January, Altman issued a “red alert” (code red) internal memo to the entire company.

According to The Information, Altman told employees privately that Gemini 3 could create economic headwinds for OpenAI. Yahoo Finance subsequently revealed that under “code red,” OpenAI paused the development of other products, such as AI Agents, and poured all resources into ChatGPT.

In December, OpenAI hastily released GPT Image 1.5. It was first in the Arena, but failed to go viral among consumers.

In February 2026, Google delivered another blow, with Nano Banana 2 debuting and regaining the lead in the Arena.

OpenAI lost again.

Until April 21st, when GPT Image 2 launched, OpenAI finally achieved a comeback and regained the upper hand.

AI Image Generation Will Be Redefined

What makes GPT Image 2 lead by 241 points?

The core answer lies in the architecture.

GPT Image 2 is not a diffusion model like Stable Diffusion.

OpenAI research leader Boyuan Chen called it a “revamped from scratch” “generalist model,” and OpenAI’s internal name for it is “the GPT of images.”

But Chen refused to publicly acknowledge whether it is a diffusion or autoregressive architecture during the press briefing.

It is generally understood as an “image generation system with reasoning and planning”: plan before painting, then put brush to canvas. This is the biggest difference between GPT Image 2 and previous image models.

OpenAI gave it a new label in its official statement: the first image model with native thinking capabilities.

Think before painting, self-check after painting, search for information online when needed, and produce 8 coherent images at once.

This isn’t a paintbrush; it’s a visual assistant that thinks.

Arena leaderboard itemized data shows:

Text Rendering, GPT Image 2 increased by 316 points compared to the previous generation; Cartoon/Anime and Portraits each increased by 296 points; 3 Products/3D/Realistic categories, overall in the +247 to +277 point range.

Text rendering was the problem Nano Banana Pro first solved in November 2025, but its accuracy was 94% at the time. GPT Image 2 pushed it to 99%.

At the OpenAI launch event demonstration: have GPT Image 2 draw a bowl of rice, with the model’s name written on only one grain of rice.

Specifically in terms of capability demonstration, OpenAI President Greg Brockman demonstrated on his X account.

The first case, old photo restoration.

A faded and yellowed family old photo, one prompt, instantly transformed into a high-definition color version.

OpenAI’s official API documentation’s “high-fidelity image inputs” refers to the model’s ability to preserve the details of the original image: the input end can accurately read the details of faded, damaged, and blurry old photos, and the output end can re-render a clear version.

In the second case, Brockman forwarded a set of test images from user @doodlestein: use the same complex prompt to have GPT Image 2 draw a mathematical explanation diagram.

He commented that even with complex prompts, GPT Image 2 can generate images with different styles.

@doodlestein tested GPT Image 2 using the same prompt to draw a linear algebra explanation diagram. The model drew 4 completely different versions at once: all the same Mona Lisa + eigenvector teaching, but each version has a completely different composition, color scheme, and information density.

The true value of this case isn’t “being able to draw mathematical diagrams,” but solving an important pain point in AI image generation over the past two years: single output, poor controllability of variations.

GPT Image 2 was the first to make “give me 4 completely different directions with one prompt” a product-level capability.

A senior LM Arena tester commented:

The gap between GPT Image 2 and Nano Banana Pro is as big as the gap between Nano Banana Pro and DALL-E.

It’s a whole generation ahead.

GPT Image 2 Thinking mode generated manga-style comic pages: starting from a simple prompt, the model maintains character consistency and lays out multi-panel plots.

DALL-E Retires

Adobe Canva Forced into a Corner

On the day of its release, the speed of downstream tool integration was faster than expected by the tech community.

Figma, Canva, Adobe Firefly, fal, Hermes Agent, all completed integration on April 21st.

The API pricing also hides a trick:

High-quality images $0.21 each; ChatGPT Plus $20 per month, image generation is already included in the package.

This price difference could bring about the biggest industrial restructuring in the image generation industry in 2026.

GPT Image 2 generated photorealistic candid: coastline, overcast, vintage car, film texture—this visual effect that previously required professional photographers to shoot outdoors and post-process is now $0.21 per API. OpenAI researcher Gabriel Goh said that photorealism is what he is most excited about in this model.

On May 12th, DALL-E 2 and DALL-E 3 officially retired.

They were the pioneers who launched the entire AIGC visual revolution in 2022. Three years later, they were sent into history by their own successors at OpenAI.

OpenAI mentioned in its official release statement:

Images aren’t decoration, they’re language. A good image does what a good sentence does: chooses, arranges, reveals.

This represents a shift in product philosophy.

Of course, there are dissenting voices. ZDNet discovered in its testing that GPT Image 2 couldn’t accurately replicate brand logos, and even ZDNet’s own logo was drawn crookedly.

Nano Banana 2 still has advantages in portrait realism and multi-reference consistency.

Although GPT Image 2 is not yet perfect, the track pattern has changed.

The Rendering Era is Over

The Reasoning Era is Just Beginning

Google put reasoning into image models. OpenAI put image tools into reasoning models. The 242-point Elo gap measures the architectural differences between the two.

implicator.ai’s comment divides image generation into two eras.

From 2022 to 2025, it was the rendering era.

DALL-E, Midjourney, Stable Diffusion, the goal was to “draw like.” The model is the paintbrush, the user is the artist, and the prompt is the draft.

GPT Image 2 represents an era of reasoning.

The model thinks before painting, can search, can self-check, and can complete tasks. It’s not a paintbrush; it’s an assistant that can paint.

What’s truly worth paying attention to about the release of GPT Image 2 is the fact that image generation is moving towards “thinking” itself.

In the short term, Black Forest Labs (Flux 2) may be in the most trouble.

Kingy AI bluntly stated: as a diffusion-first vendor, Flux 2’s entire technical pipeline is in conflict with the “token-by-token” reasoning route in terms of architecture.

Either merge, or rewrite, there is no third way.

In the medium term, Google may strike back next quarter. Nano Banana 3, or Imagen-Reason, won’t be long in coming.

In the long term, the impact of this will go far beyond image generation.

When AI starts to use “thinking” to produce images, videos, audio, and code, the entire paradigm of generative AI will change.

Altman probably didn’t expect that when he wrote “code red” in the memo last December, he would return to the top of the Arena leaderboard in this way five months later.

But the true meaning of this comeback may not be that OpenAI won against Google, but that OpenAI rewrote the rules of the image generation track.

Arena.AI single image editing leaderboard (Image Edit Arena): GPT Image 2 (medium) still leads with 1510+ points, and the second, third, fourth, and fifth places are all occupied by OpenAI’s own models and Google Gemini series. https://arena.ai/leaderboard/image-edit

When will Google launch its next punch? This question determines the direction of the AI landscape in the second half of 2026.

And before that punch is thrown, no one knows how long GPT Image 2 will sit at the top of the Arena leaderboard.

References:

https://x.com/gdb/status/2048449695622586576

https://arena.ai/leaderboard/image-edit