OpenAI launches Sora AI model that can create realistic videos from text

From text to images, and now to videos, AI continues to expand its capabilities. OpenAI, the creator of ChatGPT and DALL-E, has unveiled a new tool, called ‘Sora,’ that creates highly realistic videos from text prompts.

The text-to-video model, named Sora after the Japanese word for “sky”, can generate realistic 60-second videos based on a simple text prompt.

In a blog post on February 15, the company said Sora is capable of generating videos up to a minute long while maintaining visual quality and adherence to the user’s prompt. Sora excels in generating complex scenes featuring multiple characters, precise motion dynamics, and meticulous attention to detail in both the subject and background.

“The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world,” the company said. OpenAI said it intends to train the AI models to “help people solve problems that require real-world interaction.”

According to OpenAI:

  • Sora can create “realistic” and “imaginative” 60-second videos from text prompts.
  • Sora can incorporate multiple scenes within a single generated video. The model’s deep understanding of language allows it to accurately interpret prompts and generate compelling characters that express vibrant emotions.
  • Sora builds on the research in DALL·E and GPT models, using the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data.

Who can access Sora?

Sora will initially be exclusively available to a select group of visual artists, designers, and filmmakers, who will provide feedback to improve the model for creative applications. Additionally, it will be made available to cybersecurity professors, called “red teamers,” who will assess potential risks.

OpenAI is also working on tools that can detect when a video is generated by Sora, and plans to embed metadata to include the origin of a video into such content if the model is made available for public use in the future

Sora limitations

OpenAI has acknowledged that Sora has certain weaknesses, including challenges in accurately simulating the physics of complex scenes and understanding cause-and-effect relationships. For instance, it may fail to depict a cookie with a bite mark after a person takes a bite. Additionally, the model may mix up spatial details, such as left and right, and encounter difficulties in providing precise descriptions of events unfolding over time, such as tracking a specific camera trajectory.

This is not the first time such videos or audio have been created. Google is testing Lumiere, Meta has a model called Emu, and AI startup Runway is also developing products to help filmmakers create videos. However, AI experts and analysts said the length and quality of the Sora videos went beyond what has been witnessed so far.

Sora wows social media users

The hyper-realistic videos generated by OpenAI’s Sora stunned social media users. Many described the results as “out of this world” and a “game changer”, expressing awe at the level of detail and accuracy achieved by the AI model. The videos sparked a viral sensation, igniting widespread fascination and discussions across various online platforms.

Adding to excitement, OpenAI CEO Sam Altman invited users to propose prompts for Sora, leading to the creation of realistic videos such as two golden retrievers podcasting on a mountain peak, a grandmother preparing gnocchi, and marine animals participating in a bicycle race atop the ocean.

Sora examples

This video generate by Sora shows a woman in red dress and leather jacket confidently walking down a Tokyo street filled with animated city signage in the background.

Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

The model can convert text prompts such as wooly mammoths walking through snow to generate original video.

Prompt: Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.

Sora created an aerial image of the Amalfi Coast that showcases historic and magnificent architectural details and tiered pathways and patios, with waves seen crashing against the rocks.

Prompt: A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.

Sora can generate impressive animation. This animated scene features a close-up of a short fluffy monster beside a melting red candle.

Prompt: Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. The art style is 3D and realistic, with a focus on lighting and texture. The mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with wide eyes and open mouth. Its pose and expression convey a sense of innocence and playfulness, as if it is exploring the world around it for the first time. The use of warm colors and dramatic lighting further enhances the cozy atmosphere of the image.
Related Posts