NVIDIA heavily open-sources ultra-powerful model Nemotron: trumps GPT-4o, second only to OpenAI o1
Global AI leader Nvidia has open-sourced its ultra-powerful model, Llama-3.1-Nemotron-70B-Instruct.
According to the test data, this model has beaten more than 140 open and closed source models such as GPT-4o, GPT-4turbo, Gemma-2, Gemini-1.5, Claude-3.5 sonnet, and is second only to the latest model o1 released by OpenAI.
The base model of Nemotron was developed based on Llama-3.1-70B, and there is nothing new about this. However, a new hybrid training method is used in the training process, where Bradley-Terry and Regression are used together to train the reward model.
It's worth noting that NVIDIA open-sourced the Nemotron's training dataset as well, which is important for developing models of the same type or more than the Nemotron, as this one is key to using a hybrid training approach.

Some netizens say that NVIDIA is keen to keep open-sourcing super-powerful models, on the one hand, there is a large amount of money to fund the research and development of their researchers, on the other hand, the main purpose is still to sell GPUs as well as to cultivate the development ecosystem. And Meta, relying on his social empire, has no worries about commercialization and funding.
The most worried are those big model startups, money can't compete with these giants, not to mention business landing and fame. Therefore, many small businesses may be crushed by the giants, and will soon experience various problems, such as capital breaks.

It's great to see competition in the AI space that is moving the industry forward at an alarming rate.

This is heavy open source.

Get two 4090's for the new model and have a blast.

The models are free, but the hardware to run them isn't.

I'm testing this model and I'm an advanced AI user to say what I've learned from using it: it seems to be a bit smarter than Claude3 and ChatGPT when it comes to business writing. But it still makes some mistakes and is indeed smarter compared to the regular 3.1 70b Instruct.

Nvidia could do this for 1000x less. If Nvidia really intends to do this, then no one will be able to compete.

Innovative hybrid training methods
In the process of training the large model, in order to ensure that the model can accurately understand and follow the user's prompted instructions and accurately carry out tasks such as translation, text generation, and Q&A in real life, the reward model plays a very important role, which is mainly achieved by scoring the model's output and guiding the model to generate higher-quality answers.
Currently, the two main mainstream reward modeling methods are Bradley-Terry and Regression:The Bradley-Terry style of reward modeling has its roots in ranking theory in statistics, by maximizing the reward gap between the selected response and the rejected response. This approach emphasizes which response the user will choose given a cue, thus providing a direct, preference-based feedback to the model.
Regression, on the other hand, draws on rating scales from psychology to train models by predicting the score of a response to a particular cue. This approach allows the model to assess the quality of the response in more detail, but may not be as intuitive as preference-based approaches.

However, both of these methods have obvious drawbacks, theBradley-Terry requires the user to choose one of two responses; while regression style models require scoring data, the user needs to score each response in order to help the model improve performance. So, NVIDIA solved this dilemma by directly using the best of both models together.
The first step was the need to develop a dataset HELPSTEER2-PREFERENCE containing ratings and preference annotations.The researchers were adding preference annotations to HELPSTEER2.
These preference annotations included not only the direction of the user's preference for one of the two responses, but also the user's rating of the strength of that preference. To ensure the quality and interpretability of the data, annotators were also asked to provide written descriptions of their preferences.
In training this novel hybrid method, the researchers used the AdamW optimizer to train the model, which improves the stability and efficiency of training by introducing weight decay and gradient trimming.
To further improve the model performance, ExPO was used to extrapolate the weights of the model during training, which can further improve the model performance. The model can be made to pay more attention to those response pairs with large differences during training, thus improving the model's ability to discriminate.
In addition, the researchers conducted an extensive hyperparameter search to find the optimal learning rate and KL penalty term. These hyperparameters are crucial for the training of the model as they directly affect the convergence speed and final performance of the model.
HELPSTEER2-PREFERENCE dataset
In order to develop this diverse meets new hybrid training method dataset, each pair of responses was evaluated by 3-5 annotators during the data annotation process. These annotators were required to rate each response on multiple dimensions, including usefulness, accuracy, coherence, complexity, and redundancy.
In order to better understand the reasons behind theThe marker also needs to provide a short textual explanation of why a response was chosen as a better answer. This approach not only enhances the transparency of the data, but also provides rich contextual information for subsequent analysis.
The researchers also used rigorous data preprocessing steps to ensure data quality. For example, they identify the three preference annotations with the highest degree of similarity in each task, and then take the average of these three annotations and round to the nearest whole number as the overall preference score for that task.
At the same time, in order to exclude samples with large differences in annotator opinion, the researchers filtered out tasks whose annotations differed from each other by more than a certain range. Together, these measures effectively improve the reliability and consistency of the data.

According to the test data, the performance of the model trained with the HELPSTEER2-PREFERENCE dataset is very strong, reaching a high score of 94.1 in the RewardBench review, outperforming almost all other models in the same period.
Source of this article:AIGC Open Community, original title: "Beat GPT-4o, second only to o1! NVIDIA heavy open source ultra-powerful model - Nemotron
© Copyright notes
The copyright of the article belongs to the author, please do not reprint without permission.
Related posts
No comments...