A free and open source AI model 'Ovi' that can create short videos at high speed is born, and video and audio can be generated simultaneously with 'text' and 'text + image'

Ovi is an AI model that can create 5-second videos using text alone or text and images. It is open source and can be used for free if you set up your own environment.
GitHub - character-ai/Ovi

You can view the video generated by Ovi at the link below.
Ovi/example_prompts at main · character-ai/Ovi · GitHub
The generated video is 5 seconds long, has a frame rate of 24 fps, and has a maximum base resolution of 720 x 720, but also has an upscaling function that can generate videos with higher resolutions.
It is open source and can be used for free by configuring your own environment. The minimum GPU memory requirement is 32GB, but a model quantized to FP8 can run on 24GB of memory. The end-to-end time for generating a 121-frame, 720x720 video with 50 levels of noise reduction is less than 40 seconds.

You can also try wavespeed.ai or HuggingFace , but you'll need paid credits for each site.
Someone who actually tried it said, 'I've been using it for about a week and it's amazing. Like other AI generation tools, it's like a slot machine: even if you give it good input, you can get a bad output, but if you give it enough time, you can get something good or usable. I've created a lot of videos from text, and from text and images, that look and sound realistic. With text only, the image quality can sometimes look like a '90s TV, but that's what makes it feel so real. Using an RTX 5090, it takes about 4 to 5 minutes to generate a 5-second video.'
Ovi is a model developed by 'character-ai,' which provides services that allow users to converse with character AI. Using a proprietary voice dataset, Ovi designed and pre-trained a voice branch with 5B (5 billion) parameters from scratch to generate voices.
Going forward, they plan to fine-tune it using higher resolution data and work to generate even longer videos.
Related Posts:







