The AI Revolution: Insights from Alibaba’s Apsara Conference 2024

September 24, 2024, 4:01 pm

36kr

IndustryInternetMediaNewsPlatformScience

Location: China, Beijing

Employees: 501-1000

Founded date: 2011

The Apsara Conference 2024, held in September, was a showcase of the future. Nearly 300 companies gathered to unveil close to 1,000 new AI products. The atmosphere buzzed with innovation. Two trends emerged as clear frontrunners: multimodal capabilities and embodied intelligence.

Walking into the AI pavilion, one could feel the shift. Gone were the days when size mattered. Now, the focus was on how AI could understand and process various inputs—images, video, and sound. Multimodal capabilities became the new standard. Visitors engaged with tools that seamlessly blended audio, video, and text. These AI systems transformed into one-stop solutions, leaving behind the era of competing solely on model size.

In the frontier tech pavilion, robotics took center stage. Over 20 companies displayed their latest creations. Bipedal robots performed tricks, flipping and withstanding kicks. The crowd was captivated, yet questions lingered: Why does a robot need to be kickproof? The spectacle was impressive, but practicality loomed large.

Despite the high-tech marvels, the products generating the most buzz were those with everyday applications. For the first time, business owners from Yiwu, a major commercial hub in China, attended. They sought real-world solutions. Real-time translation tools, digital human presenters, and AI-powered product image generation caught their attention. The question on their minds was simple: “How much money can this help me make?”

Alibaba Cloud’s Tongyi Qianwen was a star attraction. Visitors could generate tai chi-themed portraits by mimicking poses displayed on a screen. This image-to-image feature was just the beginning. Alibaba showcased a full suite of multimodal tools, including text-to-image and image-to-video capabilities. One standout was a short video generation feature. Users could upload a photo and an audio clip, and within minutes, the app produced a dance video or animated emoji. This tool quickly gained popularity, with over 100,000 users since its launch.

Zhipu AI drew crowds with its innovative teaching tool. Parents were eager to test its practicality. Users could point their camera at a homework problem, and the AI would not only solve it but also guide students through the process. This hands-on approach provided a learning experience rivaling that of a human teacher.

Shengshu Technology’s Vidu AI made waves with its video generation capabilities. It tackled the challenge of maintaining consistent appearances across frames, a feat even established players struggled with. Users could upload a reference image, and Vidu AI would generate a video that matched the style and appearance. The precision left attendees impressed.

VAST’s Tripo model showcased the future of 3D generation. With just a text or image input, it could create a 3D prototype in seconds. This tool integrated seamlessly with major 3D editing software, making it a game-changer for designers and engineers.

AI-generated music also took the spotlight. Yinfeng, an AI music generation platform, gained attention for its ability to create cohesive tracks. Users could input lyrics and select a genre, allowing the AI to generate music that maintained a consistent style throughout. This capability was particularly appealing in the age of short video content.

HiDream.ai attracted Yiwu merchants with its e-commerce-focused AI image generation platform. It simplified the process of creating high-quality product images, allowing users to customize backgrounds, lighting, and even model attributes. This tool put professional-level photography within reach for businesses of all sizes.

Galbot G1, a robot clerk, demonstrated its capabilities in an unmanned store scenario. However, its slow retrieval of items raised questions about its readiness for fast-paced environments. Meanwhile, Qingbao showcased lifelike humanoid robots designed for factory floors. These robots could perform quality checks and parts distribution, promising to reduce labor costs significantly.

Coocaa, a cloud TV manufacturer, found a lifeline in AI. Its AI-powered operating system allowed users to search for content using voice commands and provided personalized recommendations. This innovation helped the company adapt to the changing landscape of entertainment consumption.

Alibaba Cloud also unveiled a groundbreaking subtitling feature. By simply uploading a video file, the system could automatically generate subtitles in multiple languages. This advancement eliminated the labor-intensive process of manual transcription and translation.

Liepin introduced Doris, an AI-powered interviewer capable of conducting hundreds of interviews in a day. While some candidates felt anxious during these AI interviews, the technology showcased the potential for efficiency in initial screenings.

Motiff, a leading developer for user interface design, simplified the design process. Users could generate UI drafts in seconds by inputting a single sentence. This innovation reduced the time spent on repetitive tasks, streamlining the design workflow.

The Apsara Conference 2024 was a glimpse into a future where AI seamlessly integrates into daily life. From multimodal capabilities to practical applications, the event highlighted the transformative power of technology. As businesses seek solutions to enhance operations, the focus on tangible benefits will drive the next wave of innovation. The future is here, and it’s powered by AI.