Geely Auto, Stepfun open-source multimodal AI models for video, audio generation

Published: Feb 18, 2025 14:53
Source: gasgoo
On February 18, Geely Auto Group and its tech ecosystem partner Stepfun announced the open-sourcing of two multimodal AI large models—the Step-Video-T2V for video generation and the Step-Audio for v...

Shanghai (Gasgoo)- On February 18, Geely Auto Group and its tech ecosystem partner Stepfun announced the open-sourcing of two multimodal AI large models—the Step-Video-T2V for video generation and the Step-Audio for voice interaction.

The collaboration leveraged both companies' strengths in computing power, algorithms, and scenario-based training, significantly enhancing the AI models' performance. Stepfun stated that the initiative aims to share the latest advancements in multimodal large models with the global open-source community and contribute to its development.

Step-Video-T2V

With 30 billion parameters, the Step-Video-T2V can generate high-quality videos at 540p resolution with 204 frames, ensuring exceptional information density and consistency.

To comprehensively assess AI-generated video quality, Stepfun has also released an open-source benchmark dataset, the Step-Video-T2V-Eval. This dataset includes 128 real-world Chinese-language queries to evaluate video performance across 11 categories, such as motion, landscapes, animals, abstract concepts, surrealism, human figures, 3D animation, and cinematography.

The company said the Step-Video-T2V outperforms existing open-source models in instruction adherence, motion smoothness, physical realism, and aesthetic appeal. The model excels in generating complex motion sequences, expressive human figures, visually imaginative scenes, bilingual text integration, and advanced cinematographic compositions.

The AI model's ability to accurately depict intricate movements is particularly noteworthy. Whether it's the grace of ballet, the intensity of karate, the speed of badminton, or the high-speed rotations of diving, the model demonstrates a deep understanding of physical space and motion dynamics. In one test case, it realistically portrayed the spatial relationships between a panda, a sloped surface, and a skateboard, producing physics-aware visuals—one of the most challenging aspects of AI video generation today.

Step-Audio

According to Stepfun, the Step-Audio is the industry's first product-grade open-source voice interaction model. It can generate speech with diverse emotions, dialects, languages, singing styles, and personalized expressions, enabling natural, high-quality conversations across various scenarios, including film, entertainment, social interactions, and gaming.

The company added that the Step-Audio has outperformed similar open-source models in five major industry-standard tests, including LLaMA Question and Web Questions. Its performance in the HSK-6 (Chinese Proficiency Test Level 6) evaluation highlights its deep understanding of the Chinese language, making it one of the most proficient open-source voice AI models for Chinese speakers.

Beyond language comprehension, Step-Audio also demonstrates high emotional intelligence, offering empathetic and thoughtful responses, much like a close friend providing guidance through life's challenges.

Additionally, it excels in rhythm and melody processing, allowing it to generate dynamic rap performances with a deep understanding of linguistic cadence and flow.

Recognizing the lack of comprehensive voice AI evaluation benchmarks, Stepfun has also introduced the StepEval-Audio-360, an open-source testing framework. This benchmark assesses voice AI models across nine key dimensions, including role-playing, logical reasoning, content generation, wordplay, creative abilities, and instruction-following.

Data Source Statement: Except for publicly available information, all other data are processed by SMM based on publicly available information, market communication, and relying on SMM‘s internal database model. They are for reference only and do not constitute decision-making recommendations.

For any inquiries or to learn more information, please contact: lemonzhao@smm.cn
For more information on how to access our research reports, please contact:service.en@smm.cn
Related News
Toyota to Invest $1 Billion in Kentucky and Indiana, Advance Battery Production Plans
Mar 24, 2026 16:55
Toyota to Invest $1 Billion in Kentucky and Indiana, Advance Battery Production Plans
Read More
Toyota to Invest $1 Billion in Kentucky and Indiana, Advance Battery Production Plans
Toyota to Invest $1 Billion in Kentucky and Indiana, Advance Battery Production Plans
On March 23 (local time), Toyota announced plans to invest $800 million in its Kentucky plant to prepare for a second battery electric vehicle (BEV) production line and expand capacity for the Camry and RAV4. The company will also invest $200 million across two facilities in Indiana to increase production of the Grand Highlander SUV, while advancing parallel efforts related to battery manufacturing.
Mar 24, 2026 16:55
South Korea to Strengthen EV Battery Safety Oversight, Expand Disclosure of Manufacturing Details
Mar 24, 2026 16:53
South Korea to Strengthen EV Battery Safety Oversight, Expand Disclosure of Manufacturing Details
Read More
South Korea to Strengthen EV Battery Safety Oversight, Expand Disclosure of Manufacturing Details
South Korea to Strengthen EV Battery Safety Oversight, Expand Disclosure of Manufacturing Details
South Korea’s Ministry of Land, Infrastructure and Transport announced that it will pre-announce amendments to the Enforcement Decree and related regulations of the Automobile Management Act to enhance safety management of batteries used in electric vehicles. The revisions include expanded information disclosure and stricter criteria for certification revocation. Under the new rules, consumers will be able to access more detailed information when purchasing EVs, including the battery’s country of origin, manufacturer, and production date.
Mar 24, 2026 16:53
Hyundai Recalls Over 61,000 Vehicles in US Due to Safety Concerns
Mar 20, 2026 17:45
Hyundai Recalls Over 61,000 Vehicles in US Due to Safety Concerns
Read More
Hyundai Recalls Over 61,000 Vehicles in US Due to Safety Concerns
Hyundai Recalls Over 61,000 Vehicles in US Due to Safety Concerns
According to an announcement by the US National Highway Traffic Safety Administration, Hyundai Motor America announced a recall of 61,093 vehicles.
Mar 20, 2026 17:45
Geely Auto, Stepfun open-source multimodal AI models for video, audio generation - Shanghai Metals Market (SMM)