OpenAI officially releases Sora, an article to see what's so great about its Vincennes video feature?

At 2:00 am Beijing time on December 10th, Sam Altman and several OpenAI employees demonstrated Sora's functionality and real-world use cases via live streaming. Following the release of the video sample in February this year, Sora triggered a boom in the global AI community, and since then domestic and foreign AI companies have launched Vincent video products. As a pioneer in this field, Sora has finally unveiled its mystery today.

Overall, Sora demonstrated a range of product features that show it is far beyond current Vincennes video products in terms of quality of video generation, originality of functionality, and sophistication of technology.

On top of the basic functions of text and graphic video, it adds storyboards (equivalent to creating your own story through split-screen), adjusting the original video with text, and blending videos of different scenes (equivalent to adding special effects directly to the video), etc. The whole product function design seems to make the video closer to the creator's self-expression, and to help them complete an ideal lens story.

Later on December 9, local time, users in the U.S. and most other countries will be able to visit the official website to experience Sora, which is included in the ChatGPT Plus and ChatGPT Pro membership subscriptions at no additional cost. ChatGPT Plus generates up to 50 top videos at up to 720p resolution and 5 seconds, while Pro generates up to 500 top videos at up to 1080p resolution, 20 seconds, and removes watermarks.

Sam Altman describes three main reasons for doing Sora:

One is from a tooling perspective, OpenAI likes to make tools for creatives, which is important to the company's culture;

Secondly, from a user interaction perspective, AI systems should not only interact through text, but should also understand and generate videos to help humans use AI. This is similar to what the big modeling companies in China talk about, "Every time a model expands its modality, user penetration goes up."

Third, from a technical perspective, which is critical to OpenAI's AGI roadmap, AI should learn more about the laws of the world, which is exactly what is known as a 'model of the world' that understands the laws of physics.

Both changing the world with technology and promoting human creativity with products is what Sora is doing.

In addition to generating videos, you can also split-screen, add special effects, and create unlimited

Sora's most basic, first and foremost, is the text to video and picture to video feature.

Opening the main interface allows users to view and manage all of their video-generated content and switch between grid view, list view, as well as create folders and favorites, view bookmarks, and more. The researchers claim this main interface is designed to better help users create stories.

At the bottom of the center of the main page is Sora's text-born video and graphic-born video features.

For example, Sam Altman was first given the text input, "Woolly mammoth walking in the desert, shot with a wide-angle lens". Then, you need to select the aspect ratio, resolution, duration (5-20 seconds), and the number of videos to be generated (up to four for selection) in order to get the generated video.

In the end, you can see that the resulting video effect is very realistic, textured, and largely follows the input instructions. It is perhaps no surprise that Sora's video generation is so good.

After typing in the text "Woolly mammoth walking in the desert, captured with a wide-angle lens," Sora generated four videos | Image credit: OpenAI

But this time, Sora also released a series of exclusive and advanced product features. In Geek Park's opinion, these features are basically centered around a more accurate representation of video, which means that people can create a story they want to tell through video by subbing, adding effects, and so on.

First up is storyboarding, which researchers have described as a 'brand new creative tool'.

In terms of product design, it is equivalent to slicing a story (video) into multiple different story cards (video frames) in a timeline fashion. Users only need to design and adjust each story card (video frame), Sora will automatically patch them into a smooth story (video) -It's a lot like a subplot in a movie, or a manuscript for an animation; when a director draws a subplot, a film is made, and a cartoonist writes a manuscript, an animation is designed.

For example, the first subplot the researchers envisioned was, "The beautiful whooping crane stands in a stream and has a yellow tail." The second subplot is, "The crane pokes its head into the water and catches a fish." What he did was to create these two story cards (video frames) separately, with an interval of about five seconds between them. This interval was important to Sora, giving it room to play with connecting the two sets of actions.

Eventually, he got a full video footage of "the beautiful whooping crane standing in the creek, which has a yellow tail. Then the crane pokes its head into the water and catches a fish."

With two story cards (video frames), Sora generates a complete story (video) | Image credit: OpenAI

What's even more amazing is that on this storyboard, the creative elements are not just story cards, but can also be direct pictures and videos. This means that any picture or video can be pulled onto the storyboard and combined with a story card to create something about it.

In the case of the video, for example, the researchers cut down the video of the aforementioned whooping crane and imported it into the storyboard, making cuts which left gaps in the front and back of the video for continued creation, meaning that there could be new beginnings and endings.

What this brings to mind is that storyboards can be created indefinitely. That is to say that the 20 seconds of video generated by Sora can be created, cut, and created ...... until it is exactly the desired shot in mind.The process is like an editor, a director, who slowly cuts out the film in his mind through the constant generation of cuts on the split-screen design and the footage material.

Unlike in the real world, Sora offers unlimited footage. And unlike other Vincennes video products, Sora's videos can be modified and processed. This makes the videos it generates will definitely be more in line with the imagination, creativity of the user's mind.

This seems to be at the heart of Sora's approach to this product: to maximize the possibility of generating a video that matches the idea the user has in mind.

This allows us to better understand Sora's other features, such as the ability to directly modify the video through text, seamlessly blend two different videos, change the style of the video, etc., which is equivalent to directly adding "special effects" to the video. This is equivalent to adding "special effects" to the video. In general, a text-to-speech video product may need to constantly adjust the prompt and regenerate the video.

By adjusting the text, users can directly adjust the video | Image credit: OpenAI

Sora can merge two two-part videos into one seamless clip | Image credit: OpenAI

Overall, Sora, besides being unsurprisingly great at generating videos, brings more exclusive video creation product features that amount to adding split-screen, editing, and special effects to videos. This means that everyone has the opportunity to create the expression they really want and get closer to being a director.

"If you go into Sora with the expectation that all you have to do is click a button to generate a movie, then I think your expectations are misplaced," the OpenAI researcher said. said the OpenAI researcher.

Sora, he said, is a tool that allows people to be in multiple places at once, to try multiple ideas, to try things that were completely impossible before, "and we actually think of it as a super-special extension for creators."

Serving the public is not yet a separate charge, but still relies on the capabilities of the underlying model

As a pioneer in the Vincentian video track, Sora is considered the latest to be launched. In response, the OpenAI research team said that in order to deploy Sora widely, it needed to find ways to make the model faster and cheaper. To that end, the research team has done a lot of work.

During the livestream, OpenAI announced the launch of Sora turbo, a new high-end accelerated version of the original Sora model. It has all of the features OpenAI talked about earlier this year in its "World Simulation Technology" report, plus the ability to generate video from text, animated images, and hybrid video. This is the technology behind the features of this Sora product.

It may seem like video is more expensive to reason about than text, but OpenAI isn't charging separately for Sora this time around. 20$/month for ChatGPT Plus members, and 200$/month for ChatGPT Pro members, all have access to Sora.

The former entitlement includes up to 50 top videos, up to 720p resolution, and 5 seconds in duration, while the latter entitlement includes up to 500 top videos, unlimited regular videos, up to 1080p resolution, 20 seconds in duration, and downloads without watermarks.

Sora's significance to OpenAI doesn't stop there. The team found that the video model exhibits a number of interesting new capabilities when trained at scale, allowing Sora to simulate aspects of real-world people, animals, and environments. "Our results suggest that extending the video generation model is a promising path to building a generalized simulator of the physical world."

Perhaps that's why making Sora available to the general public as soon as possible, and using the data to better train models of the world, is so important to OpenAI's ultimate AGI dream.

On the way to iterating technology, it also incidentally advances human creativity.

"This version of Sora makes mistakes, it's not perfect, but it's gotten to the point where we think it's going to be very useful in enhancing human creativity. We can't wait to see what the world will do with it," said OpenAI, which built it. said OpenAI, who built it.

Newsflash

The copyright of the article belongs to the author, please do not reprint without permission.

OpenAI officially releases Sora, an article to see what's so great about its Vincennes video feature?

In addition to generating videos, you can also split-screen, add special effects, and create unlimited

Serving the public is not yet a separate charge, but still relies on the capabilities of the underlying model

OpenAI's super-powerful inference model o1 and ChatGPT Pro: 10 cases to take you through!

Interview with Google's Quantum AI team: why Willow is a landmark breakthrough

Related articles

Influenced by DeepSeek? OpenAI goes live with O3 Mini, inference models free for the first time

AMD, OpenAI Jointly Release Ultra-Powerful AI Chip with 35x Improvement in Inference, Surpassing NVIDIA B200!

Interview with Google's Quantum AI team: why Willow is a landmark breakthrough

"Strongest AI on Earth"? Musk's xAI officially releases Grok 3 big models

No comments

Popular Articles

Popular Sites