An article reviewing the OpenAI launch series: 12 days of OpenAI evolution from tools to AGI

trade4mos agoupdate AiFun
206 0
OpenAI The 12 days of continuous Devday updates at the end of the year have finally come to an end, and squatting to watch the launch every day is like opening a blind box of chocolates, not knowing what flavor is next.

In the first 11 days of the launch, most of the truth is very modest, only three products still have some wonderful "flavor".

To summarize, updates that can be called heavy include:o1 Official version, Sora, Canvas, they are mainly focused on the first 4 days of release.

Among them, the official version of o1 is really a big improvement, Sora is adding a number of product modes that make changes to AI-generated videos, and Canvas can be seen as OpenAI's first product attempt to challenge the AI workbench.

Secondly, relatively still somewhat interesting: the deep partnership with Apple, the video calling feature, and the enhanced fine-tuning of the o1-mini.

The o1-mini's enhanced fine-tuning has a lot of potential in the professional field, and the fine-tuning is easy to improve significantly. The video call function is the amazing "HER" officially launched. The deep cooperation with Apple is also a big deal for OpenAI, and it stabilizes its position in the AI industry.

Other minor product updates are the ones that make people think - "This is worth a launch?"

这些产品包括“Projects”项目功能、o1 图像输入和4o高级语音API正式开放、ChatGPT Search 升级和给GPT打电话 的功能。它们都是相对比较小,也都和竞争对手没什么差异的更新。

On the last day, OpenAI finally threw out a king bomb: GPT-o3. In one fell swoop, it broke the suspicion that the development of AI was in a bottleneck, and all the performances went straight to AGI.

We made a table to sort through this roller coaster of twelve release days based on the importance of releasing the product.

一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天进化论

Below, we go into a little more detail about the core points of these updates.

Important Product Updates

o1 Complete (Day1)

In terms of capability, o1 does show a relatively big improvement over the Preview version. It has improved over o1- preview by 501 TP4T on the International Mathematical Olympiad qualifying questions (AIME 2024), and on the Programming Aptitude Tests (CodeForces). it has reduced the major error rate when dealing with complex problems by 341 TP4T.

一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天进化论

It also adjusts the processing time according to the difficulty of the question, which makes the user waiting time drop more than 50%.

What's more the o1 can support multimodal recognition too. This makes its usefulness skyrocket. Doctors can use it to analyze medical images, engineers can get it to help read drawings, and designers can get it to provide creative suggestions.

But it's also quite expensive, with only the $200 ChatGPT Pro Edition subscribers enjoying unlimited usage, and the rest of the regular $20 subscribers only enjoying access to 20 uses per day.

As a day one debut, the o1 really makes a statement.

Sora (Day 3)

After 10 months of waiting, Sora finally arrived.

一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天进化论

But this isn't a model version upgrade, but more of a product polish. The official version of Sora can generate videos up to 20 seconds long and up to 1080p. The generation is not much different from what was just released in February.

一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天进化论

But product-wise OpenAI did put some thought into it, and storyboarding is the most innovative feature in this release and Sora's most ambitious attempt. It provides users with a timeline interface similar to professional video editing software. Users can add multiple scene cards to the timeline. Users can string multiple cues together, and the system will automatically handle the transition effect between scenes.

一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天进化论

In addition to this, OpenAI provides three specialized tools, Remix, Blend and Loop. Swap out elements in a video, or mix two videos, and also auto-complete to make an infinite loop video.

The product is pretty good, but the unupgraded model is less than impressive. In post-release reviews, Sora flopped frequently, with movement, interaction and physics often handled in a mess. There are also people and ghostly figures that appear out of nowhere.

一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天进化论

OpenAI is also stingy with the amount of access it gives, with a $20 Plus subscriber getting 50 sessions per month. Only Pro users, who pay $200 per month, get unlimited "slow" generation.

Sora is finally here, but it's pretty disappointing.

Canvas (Day 4)

In a word, Canvas is the AI version of Google Docs created by OpenAI.

That's because Canvas has evolved into a complete workbench that combines intelligent writing, code collaboration, and AI intelligences. It shows OpenAI's product ambitions beyond Chatbot.

As a writing assistant, you can provide editorial input.

一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天进化论

For programming functionality, Canvas creates a virtually latency-free programming environment with the built-in WebAssembly Python simulator. It also demonstrates the ability to understand the intent of the code.

Like the recent updates to Cursor and Devin, it goes live with the ability to customize AI intelligences. It can perform a range of actions to help you send Christmas letters to your friends.

一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天进化论

These three dimensions of Canvas do not operate in isolation. In practice, they tend to work in conjunction with each other, and this seamless integration makes Canvas a versatile AI-driven creation studio prototype.

But from a purely front-end presentation point of view, it's not as good as Claude's Artifacts, and it's not as easy to program as Cursor, so fusion is where it shines.

General product updates

o1-mini intensive fine-tuning (Day2)

This product is a heavy release if not a narrower utility.

It changes the past logic of fine-tuning just by adding specialized data, but instead fine-tunes models with reasoning ability in the direction of reinforcement learning. The models are induced to have deeper thinking ability when facing complex problems.

Now, it only takes "a few dozen examples" or even 12 examples for a model to effectively learn domain-specific reasoning. According to OpenAI's research data, the o1mini model with reinforcement fine-tuning has a test pass rate 241 TP4T higher than the traditional o1 model, and a full 821 TP4T higher than the o1mini without reinforcement fine-tuning.

Unfortunately, it can only be fine-tuned o1-mini, and the application is all those complex field tasks, such as medical, legal or financial and insurance. The generalization is poor.

Advanced Video Voice Mode (Day6)

It's another old pie on the table. on May 13th, during the GPT-4o demo, OpenAI staff were able to video call 4o and see the contents of our real-time phone screens, or chat with us or answer questions based on the live feed from the camera.

一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天进化论

This time it's just a real live install with no upgrades. But the feature itself is still very important.

But because this pie has been cooking for a bit too long, Vision, which was launched by Microsoft two days ago, and Astra, which Google is still cooking, have followed suit. openAI's lead is being eroded a bit.

Cooperation with Apple (Day5, Day11)

ChatGPT and Apple Intelligence's, more like an official depth result. What Apple can't handle can only be ceded to OpenAI.

Integration consists of three main aspects: the first is synergy with Siri. When Siri determines that a task may require ChatGPT's assistance, it can hand over the task to ChatGPT for processing;

一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天进化论

Secondly, there are enhancements to the writing tools; users can now use ChatGPT to write documents from scratch, as well as refine and summarize them;

The third is the iPhone 16's camera controls, which give the user a deeper understanding of the subject through visual intelligence.

The latter, eleventh-day Mac integration, gives GPT access to more Mac tool calls.

The only thing I don't understand is why these two can't be announced on the same day and have to be split over two days?

Capability patches and minor feature updates (Day 7, 8, 9, 10)

The few remaining updates are scrappy at best. A simple sentence says it all.

"Projects" project feature: it allows users to create specific projects, upload related files, set custom commands, and keep all conversations related to the project in one place. It's basically the same as Claude's.

ChatGPT search upgrade: able to search in conversation, support multimodal output. perplexity's Pro mode is supported long time ago.

4o Hotspot: US users can call and use 4o now! Quite honoring the old and loving the old, I think it's kind of like giving them a reminder of the sun.

o1 Image Input and 4o Advanced Speech APIs are officially open: I'd suggest this be put in the last sentence to finish on the day of o1's release.

It's really kind of gone into a procrastination cycle these days.

The Final Bomb

GPT-o3 (Day 12)

If it wasn't for the finale of GPT-o3 on the last day, I really think OpenAI had 12 days of launches in a row purely to muddy the waters.

Because during this period, Google released Gemini 2 Flash, super fast and super strong; Astra, looks like a real Agent model; Voe2, crushing Sora; Gemini 2 Flash Thinking, o1 people also have. Just three announcements and a couple of videos were released, and all of OpenAI's releases in the first 11 days were lifted off the table.

But on Day 12, OpenAI still got back its vigor. With o3 to prove to the industry: Scaling Law is not dead, OpenAI is king.

o3 is the next version of o1. Just three months after the release of o1 in September, this new version significantly outperforms OpenAI's previous o1 model in a number of benchmarks, including coding, math, and the ARC-AGI benchmark test.

Look at a few data comparisons:

Codeforces Rating: 2727-- Equivalent to 175th place in the global coding competition for human programmers. More than 99% of human programmers.

Doctoral level scientific questions (GPQA): 87.7%--Doctoral General Score 70%

Toughest Frontier Math Test: 25.2%-- No other model has exceeded 2%, and math genius Tao Zhexuan says the test "could be hard on AI for years"

Proof of attainment of AGI title ARC-AGI: 87.5%--o1's score of 25%

一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天进化论

Most notable is this last test, ARC-AGI, which demonstrates the model's new mission adaptability. As a comparison, the previous ARC-AGI-1 only improved from 01 TP4T in 2020 GPT-3 to 51 TP4T in 2024 GPT-4o. This means that the model is not rote, but is actually solving problems.

Although it performed well in the ARC-AGI test, this does not mean that o3 has reached the AGI level, as it still fails in some very simple tasks that are fundamentally different from human intelligence.

一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天进化论

But either way, this proves that OpenAI's paradigm shift of choosing enhanced reasoning has succeeded. The development of AI shows no signs of slowing down.Scaling Law still works.

Those fears of AI stagnation were swept away by OpenAI's year-end Christmas gift.

While o3 costs up to $20 to perform a low-computing power computation, high-computing power may even be as high as $3,000, to use it is almost impossible at this stage. But the computing power will go down and Scaling Law will continue.

Three months, two top models, and OpenAI gave us another taste of the speed at which AI will pounce from ChatGPT to GPT4 in the period from late 2022 to early 2023 on the last day of these 12 days.

Perhaps as Noam Brown, an OpenAI scientist who previously worked on the development of o1, said in an interview, "In 2024, OpenAI is experimenting, while 2025 is the year of full speed ahead."

OpenAI's 12-day launch was a tumultuous process that wrapped up perfectly. Burying hope for AI in 2025.

This article was written by Hao Boyang , source:  Tencent Technology (developers of the QQ instant messaging platform),原文标题:《一文回顾OpenAI系列发布会:从工具到AGI,OpenAI的12天的进化论》       

© Copyright notes

Related articles

No comments

none
No comments...