Domestic video generation breakthrough again! From movie and television short films to ocean-going family connections, AI makes the sky a stone's throw away!

artifact7hrs agoupdate AiFun
13 0

Up fierce, AI is now able to generate martial arts scenes, or the kind where people and tigers fight!

Recently, an AI short film called "Wind in the Pines" was unveiled during the annual AI event WAIC, attracting a lot of attention. The movie reinterprets the classic story of "Wu Sung fights the tiger" in a modern wasteland style.

The movements of the protagonist are smooth and powerful, the tiger's fur rises and falls with the situation, and even details such as the dust flying and the turning of the corners of the coat are clearly visible. This is not the result of repeated post-production tinkering, but a one-off result that makes you wonder!AI Video GenerationThe rapid progress of technology is gradually moving into the practical stage of professional film and television production.

Wind in the Pines is produced by China Film Directors Center andChina Telecom Artificial Intelligence Research Institute (TeleAI)Co-created using none other than TeleAI's VAST Video Generation Big Model.

TeleAI, led by Prof. Li Xuelong, CTO and Chief Scientist of China Telecom Group, was officially inaugurated at the WAIC conference last July. Under the leadership of Prof. Li Xuelong, TeleAI team has created a big model system including VAST, which is the first full-modal, full-size, and nationally-produced "triple-full" big model in China, and promotes the innovation and application of related technologies.

From a setting perspective, the short film Wind in the PinesIt's imaginative in its own right, but what's most amazing is how AI technology can turn wild imaginings into realistic images.

As a technological support, TeleAI's VAST video generation big model was released last December, and it was ranked in the authoritative video generation review list VBenchIt took the top spot and continues to iterate and upgrade.

From basic screen generation, to complex action, to camera control and character consistency optimization, the boundaries of its capabilities are constantly expanding, and it has the potential to be "on top of the game" in professional creative scenarios.

 

01.Video generation says goodbye to opening blind box games thatA good AI is one who can act and shoot.

To truly understand the breakthrough achieved by the short film Wind in the Pines, we first have to look at what kind of AI tools are actually needed for film and television production.

High resolution, smooth action, realistic details, these are only the basic ability of the technical level, far from enough to support a truly meaningful movie and television works.

To be useful in the real movie and television production process, it is even more critical for AI to understand theThe director's creative intent, keeping up with the pace of the narrative, mastering the language of the camera, and mobilizing the emotional atmosphere, truly integrated into the system of expression of the audiovisual language.

In other words, AI should not only be able to draw, but also be able to shoot and act like a filmmaker, and be able to collaborate to complete the characterization, scene scheduling and narrative advancement, becoming a "creative partner" with audiovisual expression.

In The Wind in the Pines, TeleAI's VAST video-generated macromodel has demonstrated strong graphic presentation and narrative control.

At the beginning of the movie, when the modern "Wu Sung" rides a motorcycle in the desert, AI portrays every detail: the roar of the engine and the wind and sand form a shocking sound wave, the motorcycle makes a perfect arc in the air when it jumps over the obstacles, and the wheels drive over the sand and stir up the delicate sand waves, with real and natural light and shadow.

国产视频生成再突破!从影视级短片到远洋亲情连线,AI让天涯变咫尺

In the intense tiger fight scene, AI simulated every tiger hair swinging with the action, muscle lines in the pounce to show amazing dynamic details; the protagonist and the tiger fight with the fist to the flesh, the action is powerful, no help. These images, which once required months of polishing by the top special effects team, are now realized with movie-level realism through AI.

国产视频生成再突破!从影视级短片到远洋亲情连线,AI让天涯变咫尺

TeleAI's VAST video generation model has already captured a group of professionals with its movie quality. The team involved in the production of Wind in the Pines has rich experience in the film and television industry, and it is understood that they have given high recognition to TeleAI's VAST Video Generation Large Model after using it, which highlights the significant breakthrough TeleAI has achieved in the field of video generation.

 

02.How to create a cinematic video generation model?Revealing the three core technologies behind

So what exactly are the key supports needed behind the scenes to create such a large model for video generation? From the footage of Wind in the Pines, we can clearly see one of theThree core technologies.

firstlyMotion Migration TechnologyIn the short film Wind in the Pines. In the short film "Wind in the Pines", TeleAI's VAST Video Generation Large Model demonstrates strong image expressiveness and narrative control. There are no common problems such as "through-molding" or distortion in the tension-filled fight scene.

Motion migration technology allows the producer to upload a first frame and a reference action, and AI can make the character's action performance in the first frame exactly the same as the reference video. This technology has successfully overcome the problems of difficult to control the rhythm of action and hard expression performance in AI-generated videos, making AI-generated videos more natural in character movement and more vivid in expression.

The industry's mainstream motion migration solutions are based on skeletal binding, but TeleAI decided to take it to the next level by upgrading it from 2D to 3D skeletal point binding, making the movements more spatial, layered, and even allowing for natural control of animals or cartoon characters.

Another core technology is controlled three-dimensional lens operation, which gives AI the ability to accurately understand and utilize the "language of the camera". For example, in just a few seconds in the lower screen, "Wind in the Pines" presents a complex lens operation with multiple angles, rapid switching, and smooth connection of telephoto, elevation, close-up, etc. These shots originally required the cooperation of professional photography directors and teams. These shots, which originally required professional photography guidance and teamwork, were accurately realized by AI.

国产视频生成再突破!从影视级短片到远洋亲情连线,AI让天涯变咫尺

This is not by piling up cue words "blind" out. Through the controllable three-dimensional lens technology, TeleAI to three-dimensional reconstruction and video generation depth fusion, endowed with the model of the spatial structure of the ability to perceive, and then through the physical parameters of the camera internal parameter, external parameter, such as fine control of the effect of the lens. ai not only understand the content, it gradually began to know how to shoot, so that the ai really have a director's point of view.

Will act, will shoot is just the beginning, many industry insiders comment, AI video is currently one of the biggest problems is the "consistency of poor", the same character in different shots of the image, dress, temperament is often inconsistent, very easy to wear help.

But the protagonist in The Wind in the Pines has maintained a consistent exterior style, thanks to the behind-the-scenesCharacter Consistency TechnologyTeleAI's VAST video generation macromodel uses a step-by-step generation approach, whereby intermediate data such as first-phase mirrors, depth information, etc., are used to generate a fine-grained picture on top of that.

国产视频生成再突破!从影视级短片到远洋亲情连线,AI让天涯变咫尺

This generation process greatly improvesConsistency of characters, images and narrative controllability, the process is almost identical to the logic of the movie industry that uses storyboards to build the frame of the picture and then renders it using computer graphics technology, leaving no room for AI to play around.

The upgrading of the aforementioned bottom layer capabilities is the knocking brick for AI video to really step into the film and television industrial system. What's more, TeleAI has obtained a large number of feedback needs from frontline practice in its cooperation with professional directors: for example, more delicate performance control of actors' expressions, emotions, and character rhythms. These professional needs, which developers originally did not anticipate, are gradually being transformed into new directions for technology development. With these advanced technologies, TeleAI brings unprecedented convenience and advantages to the film and television production industry.

 

03.video generation + communication to unlock new scenarios that"Brainstorming" images to realize oceanic video calls

While promoting video generation technology to empower the film and television industry, TeleAI is also actively exploring broader application scenarios for this technology. After all, the essence of video generation is not limited to movie creation, but a way of reconstructing visual information.

From the general perspective of AI, "the essence of intelligence is compression" has become an industry consensus. Whether it is a language model, an image model or a video model, the core task is to extract patterns and laws from massive raw data and encode them into the model parameters in an efficient and compact way, so that the model learns to represent infinite possibilities with limited parameters.

However, true intelligence lies not only in compression, but also in "restoration". That is to say, a system with a high level of intelligence must be able to accurately reconstruct the original content, and even complete reasonable complementary and future prediction under the circumstance of receiving limited information.

China Telecom's Artificial Intelligence Research Institute (TeleAI) has deeply combined the VAST video generation model with one of its key technologies, AI Flow, which is being developed and researched.A new communication technology, Generative Intelligent Transmission (GIT), is proposed, which is the exchange of "computation" for "bandwidth".

AI Flow is the convergence of three key technologies: artificial intelligence (intelligence), communication (transmission) and network (network)., through a network layered architecture, based on connectivity and interaction, to realize the delivery and emergence of intelligence.

Under the leadership of Prof. Li Xuelong, TeleAI has built a strategic scientific research layout of "One Governance + Three Intelligence" including the AI Flow technology system (including generative intelligent communication technology, etc.), and other directions including AI governance, intelligent optoelectronics (including embodied intelligence), and intelligent bodies.

The breakthrough of AI Flow is expected to solve a long-standing problem in communication services - how to efficiently transmit high-quality video and multimedia content under extremely limited bandwidth conditions.

This is a technical bottleneck that has plagued the communications industry for many years - traditional video communications technology relies on a high bandwidth and high stability network environment, theOnce the network is not powerful, immediately stuck as PPT, audio and video are not synchronized.

Ordinary users like you and me often encounter similar problems in their lives. For example, in extremely densely populated places such as concerts and exhibitions, network congestion often results in video calls not being able to connect, live streaming lagging, and even basic video uploading and downloading becoming difficult.

Not to mention the fact that video communication has almost become a luxury in extreme conditions such as high-speed rail, underground, airplanes or oceanic and remote mountainous areas. Behind these problems, it is the existing communication technology in the bandwidth, stability and data transmission efficiency bottlenecks.

At this year's WAIC conference.TeleAI demonstrated a typical case of generative intelligent transmission technology based on AI Flow - Oceanic Communications, cracking the bottleneck.

Ship-to-land communication has long been a great challenge in the development of global maritime technology. As the satellite network on board is very slow and the signal is very poor, the crew can only type through WeChat to report safety to their families, and cannot make video calls or swipe short videos or go on video websites.

The Generative Intelligent Transmission Technology (GITT) makes ocean-going video calls no longer a luxury, but simple and efficient. The application of this technology not only connects the sea and the land, but also allows the crew to leave monotony in their spare time and have as rich colors as on shore.

Traditional video compression transmission methods carry the entire video as is, theGenerative Intelligent Transfer technology, on the other hand, takes a smarter approach:Through TeleAI's multimodal grand model, the most critical feature information of audio and video is extracted, compressed and coded, and then sent to the receiver.

And on the receiving end, TeleAI's locally deployed multimodal generative macromodel can, based on the above information, theSelf-contained "brainstorming" to create a complete picture.It's like restoring a painting from a sketch.

This compressed transmission method shows obvious advantages over traditional schemes (e.g. H.264+5G LDPC). When bandwidth resources are scarce in satellite scenarios, this scheme will reduce the bandwidth demand byCompressed by a full 1-2 orders of magnitudeThe video data volume can be reduced to up toOriginal 1%, and even up to a thousandth of a percent in scenes with small background changes.

After the intelligent reduction of audio and video is accomplished locally by the multimodal generation of large models on the shipboard server, the picture and sound quality can beMaintaining the level of "subjective non-destructiveness"This allows seafarers to have a clear and smooth video call experience even under very low bandwidth. This allows ocean-going seafarers to achieve a clear and smooth video call experience even under very low bandwidth conditions.

The technology is also very hardware friendly from a deployment standpoint - the vessel only needs to carry a boat equipped with aLocal server with 4 consumer graphics cardsIt can then provide stable Wi-Fi video call service for the crew. This lightweight deployment provides a realistic foundation for future scaling.

Generative Intelligent Transmission is not only suitable for oceanic communication, but also a complete set of solutions that can be adapted to different communication environments. The system can intelligently select video decoding models of different sizes according to the bandwidth, arithmetic and other resource conditions in different scenarios.

For example, in most call scenarios, a smooth and clear communication experience can be achieved by using only 480P resolution and a small model, thus realizing the optimal synergy between arithmetic power and bandwidth.

It can be said that this technology shows not a single point of breakthrough of a certain technology, but a kind ofSystematic convergent innovation.If there is no VAST video generation technology is increasingly mature, the video signal is simply not compressed to this extent; if there is no intelligent transmission network (AI Flow) basis for building, and then the exquisite picture can not be sent to the user's eyes.

In the future, communication will no longer be a mere signal carrying, but a process of understanding and reconstruction, encoding no longer bits, but meaning itself. Generative Intelligent Transmission Technology has been verified in the application scenarios of Ocean Communication, and in the near future, this technology will be further promoted to realize dialing high-definition video conference in the cabin of an airplane without affecting the work due to the signal, and watching a wonderful ball game while camping in the wild without leaving regrets for life.

This is a "two-way run to" AI and communications, but also for the construction of high-quality, low-cost future multimedia communications infrastructure, provides a solid technical base and practice model.

© Copyright notes

Related posts

No comments

none
No comments...