Midjourney Releases First Video Generation Model V1: Supports Up to 21 Seconds, $10 Per Month

Early this morning.AI image generationtopsy-turvyMidjourneyReleasing its firstAI Video GenerationModel V1. V1 supports users to upload Midjourney-generated or external images for video generation, with two options of manual and auto-generated action cues, and can be set up for high-speed motion with faster camera movement and low-speed motion with relative static.

From the point of view of its generating effect, V1 can ensure that the background of the screen changes at the same time, the protagonist action behavior is still coherent and smooth, even if the monsters created out of thin air, sci-fi image movement is also smooth and natural.

Users can experience Midjourney's image generation interface by clicking on the "Animate Image" option, which allows them to generate up to 20 seconds of video. V1 is currently available to all Midjourney subscribers, with a subscription fee of $10 (RMB 71.9) per month, and each time a user generates a video, they will need to deduct credits from a preset monthly credit, similar to a "per-credit consumption" mechanism. Midjourney is testing an "Unlimited Easy Mode" for members with a $60/month subscription. Perplexity AI designer Phi Hoang commented on X: "It exceeded all my expectations.

▲Phi Hoang commented on X

The release of V1 also meansMidjourney opensA major shift from image generation to full multimedia content creation. However, compared to the old players in the video generation track, V1's function is not perfect, it can only generate video, can't generate the corresponding audio, the soundtrack needs to be added manually by the user using a separate tool in the post-production, and its video is stillEditing timelines, scene transitions, or continuity between clips is not supported.

Experience Address:https://www.midjourney.com/explore?tab=top_month

01. Generating 20s of fluid movements in one breath.Fast generation

Just after V1 was released, netizens' enthusiasm for creation was ignited, and they uploaded many of their own generation effects on social platforms. To summarize, the features of V1 include the ability to generate long and smooth movements, support for different formats such as vertical and horizontal screens, and general feedback from netizens that the generation speed is very fast.

A double exposure portrait of a majestic lion's side profile, set against a backdrop of trees and a sunset sky. The image has a black background, with a photorealistic, hyperrealistic, and cinematic lighting style, created using Octane Render. The image has a black background, with a photorealistic, hyperrealistic, and cinematic lighting style, created using Octane Render.

This double exposure portrait of a lion flanked by trees and a sunset sky in the background. The image has a black background and is rendered in a photo-quality, surrealistic and cinematic lighting style using Octane Render.

The double exposure of the lion with the trees and the setting sun is realistic, but the "cinematic lighting effects" in the cue are not shown when the lion is moving.

The video below is 17 seconds long and shows the main character going from standing on the roof of a building, taking a leap, reversing his body, to flying through the air, all in one shot with no distortion.

Some of you have been on fire comparing the video generation results of V1 and Runway.

With the same cue word, the butterflies around the V1 generated video were completely still and Runway captured the overall picture in a more harmonious way.

Another set of comparisons, Runway in the lake water performance effect is better, V1 in the character movement is more smooth; behind the animation effect of V1 more blockbuster feeling, Runway effect is like the production of the game page is not well enough.

A user has animated old photos using the V1, and his comments are that the visual effects are amazing and the movement is natural, but less effective compared to the Veo 3.

02. can be generated for a maximum of 20s at a time.Cost per second equivalent to one still image generation

Users can generate a new image in Midjourney and then click the "Animate" button to make the image move.

Specific settings include an "Auto" animation setting to help generate "action cues", and a "Manual" button to allow the user to describe to the system how they would like the image to change. change.

From a creative standpoint, the V1 has two options: high-speed motion and low-speed motion. Low-speed motion is more suited to environmental scenes where the camera is essentially stationary and the subject is moving slowly, such as a scene where a character blinks or the breeze is blowing, with the disadvantage that sometimes the thing being photographed is also stationary.

High-speed motion is good for scenes where you want the subject, the camera, and everything else to move. The downside is that all this motion can sometimes go strangely wrong.

Users can choose to extend the video they feel more satisfied with, each time about 4 seconds, a total of 4 times can be extended, that is, it can generate 20 seconds of video.

Midjourney allows users to upload an external image, drag the image to the cue bar and mark it as the "start frame", and then enter a motion cue describing how they want it to move.

When it officially goes live, Midjourney will only offer a web version of v1. Its blog mentions that it will charge roughly eight times as much for video production as it does for image production, and that each video task will support the generation of four 5-second videos, which is roughly equivalent to the cost of generating a static image per second since each video task generates 20 seconds of content.

Midjourney will also be testing a video "Easy Mode" for Pro and higher level subscribers.

03. The goal is to simulate open world models in real time

This V1 release is seen by Midjourney as the first step in the exploration of building an open-world model capable of real-time simulation.

Their goal is simply to have an AI system that generates images in real time, which the user can command to move through 3D space, with the environment and characters moving along with it, and the user can also interact with all objects.

Based on this, the image model is responsible for the visuals, the video model is responsible for making the image move, the 3D model allows the character to move in space, and finally the real-time model is needed to quickly complete these processes.

So next year, Midjourney will build and release these models separately and then slowly integrate them into a unified system.

04. Conclusion: Midjourney breaks into the video generation track

Video generation model of the competition curtain has been opened. in May Google released Veo 3 to achieve audio and video synchronization, in June byte jumping launched beanbag video generation model Seedance 1.0 pro, yesterday MiniMax update conch 02 to break the global video model effect cost record, today Midjourney first video model debut.

Previously, Midjourney's products have emphasized accessibility and ease of use, but now that it's moving toward a more sophisticated simulation framework, they describe V1 as a "technical stepping stone" to more complex systems.

Behind the ambitious goals.Midjourney is also currentlyFacing a serious legal challenge from two of the world's largest entertainment studios, Disney and Universal Pictures, who allege that Midjourney has used copyrighted characters to train its models without authorization and continues to allow user-generated derivative content, which has cast a layer of uncertainty over its future.