South Korea's Naver Announces Complete Abandonment of Alibaba's Qwen Encoder

Naver stated that the new encoder is a significant improvement based on its existing "VUClip" technology, and its performance has reached a level comparable to that of the world's leading model, Qwen.

A visual encoder is a module in multimodal AI that converts image and video information into understandable data formats, and is known as the model's "optic nerve."

Earlier this year, Naver sparked controversy for partially using Alibaba's Qwen 2.5 visual encoder in its HyperCLOVA X SEED 32B Sync model when participating in a South Korean government-led independent AI foundational model project.

On January 15th, the South Korean Ministry of Science and ICT announced the results of the first round of reviews, and Naver Cloud was directly eliminated due to insufficient model originality and technological independence, along with NC AI.

At the time, Naver argued that "the visual encoder can be replaced at any time and is not an indispensable core component."

Four months later, Naver launched its new encoder, with the biggest highlight being that it was designed with a focus on Korean from the training stage, directly connecting images with Korean without going through an intermediate translation layer.

A Naver Cloud official emphasized that when processing visual data containing Korean geography, culture, or proper nouns, the new encoder can avoid information distortion during translation.

However, the encoder replacement plan for the previously open-source HyperCLOVA X SEED 32B Sync model has not yet been determined.