From text to images, and now to videos, AI continues to expand its capabilities. OpenAI, the creator of ChatGPT and DALL-E, has unveiled a new tool, called ‘Sora,’ that creates highly realistic videos from text prompts.
The text-to-video model, named Sora after the Japanese word for “sky”, can generate realistic 60-second videos based on a simple text prompt.
In a blog post on February 15, the company said Sora is capable of generating videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. Sora excels in generating complex scenes featuring multiple characters, precise motion dynamics, and meticulous attention to detail in both the subject and background.
“The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world,” the company said. OpenAI said it intends to train the AI models to “help people solve problems that require real-world interaction.”
According to OpenAI:
- Sora can create “realistic” and “imaginative” 60-second videos from text prompts.
- Sora can incorporate multiple scenes within a single generated video. The model’s deep understanding of language allows it to accurately interpret prompts and generate compelling characters that express vibrant emotions.
- Sora builds on the research in DALL·E and GPT models, using the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data.
Who can access Sora?
Sora will initially be exclusively available to a select group of visual artists, designers, and filmmakers, who will provide feedback to improve the model for creative applications. Additionally, it will be made available to cybersecurity professors, called “red teamers,” who will assess potential risks.
OpenAI is also working on tools that can detect when a video is generated by Sora, and plans to embed metadata to include the origin of a video into such content if the model is made available for public use in the future
Sora limitations
OpenAI has acknowledged that Sora has certain weaknesses, including challenges in accurately simulating the physics of complex scenes and understanding cause-and-effect relationships. For instance, it may fail to depict a cookie with a bite mark after a person takes a bite. Additionally, the model may mix up spatial details, such as left and right, and encounter difficulties in providing precise descriptions of events unfolding over time, such as tracking a specific camera trajectory.
This is not the first time such videos or audio have been created. Google is testing Lumiere, Meta has a model called Emu, and AI startup Runway is also developing products to help filmmakers create videos. However, AI experts and analysts said the length and quality of the Sora videos went beyond what has been witnessed so far.
Sora wows social media users
The hyper-realistic videos generated by OpenAI’s Sora stunned social media users. Many described the results as “out of this world” and a “game changer”, expressing awe at the level of detail and accuracy achieved by the AI model. The videos sparked a viral sensation, igniting widespread fascination and discussions across various online platforms.
Adding to excitement, OpenAI CEO Sam Altman invited users to propose prompts for Sora, leading to the creation of realistic videos such as two golden retrievers podcasting on a mountain peak, a grandmother preparing gnocchi, and marine animals participating in a bicycle race atop the ocean.
Sora examples
This video generate by Sora shows a woman in red dress and leather jacket confidently walking down a Tokyo street filled with animated city signage in the background.
The model can convert text prompts such as wooly mammoths walking through snow to generate original video.
Sora created an aerial image of the Amalfi Coast that showcases historic and magnificent architectural details and tiered pathways and patios, with waves seen crashing against the rocks.
Sora can generate impressive animation. This animated scene features a close-up of a short fluffy monster beside a melting red candle.