At ALL THINGS OPEN 2024, the Open Source Initiative (OSI) officially released version 1.0 of the Open Source Artificial Intelligence Definition (OSAID), marking the birth of the world's first open source AI standard.
OSAID will serve as the basis for measuring whether an AI system meets the criteria for "open source AI," providing a unified guide for community-driven public assessments and aiming to provide a framework to help AI developers and users determine whether an AI system is open source, covering code, models, and data information.
The release of OSAID is an important milestone in the open source field and has been widely supported by the global open source community.Co-designed by over 25 organizations, including Microsoft, Google, Amazon, Meta, Intel, Samsung, Mozilla Foundation, Linux Foundation, Apache Software Foundation, and more.
"The co-creation process of this open source AI definition was rigorous, thorough and fair," said Carlo Piana, Chairman of the OSI Board of Directors, who is convinced that the definition meets the criteria of the open source philosophy and the four fundamental freedoms.
According to OSAID, open source AI systems should grant users the following freedoms:
1. Use the system for any purpose without requesting permission.
2. Study how the system works and examine its components.
3. Modify the system for any purpose, including changing its output.
4. Share the system with others for their use for any purpose (whether modified or not).
The above applies to both fully functional systems and stand-alone elements of the system.
For an AI model to be considered open source, it must provide enough information to allow anyone to "substantially" reconstruct the model. The model must also disclose any important details about its training data, including where the data came from, how it was processed, and how it was accessed or licensed.
OSAID also lists the usage rights that developers should have when using open source AI, such as being able to use and modify the model for any purpose without having to obtain permission from others.
Stefano Maffulli, executive vice president of OSI, said that the main purpose of developing an official definition of open source AI is to get policymakers and AI developers on the same page.
"Regulators are already looking at this area. We have clearly reached out to stakeholders and communities on all sides and have even tried to reach out to organizations that regularly interact with regulators to get early feedback."
"Open source AI is an AI model that allows you to fully understand how it is built, which means you have access to all the components, such as the complete code used for training and data filtering. Most importantly, you should be able to build on top of it."
OSI does not force pressure on developers to adhere to the OSAID definition, but intends to flag models that are described as "open source" even though they don't meet the definition. "We hope that when someone tries to misuse the term, the AI community will say, 'We don't recognize this as open source,' and correct it." Maffulli said.
Meta: I object.
Currently, many startups and large tech companies, especially Meta, call their AI model release strategies "open source," but they rarely meet OSAID's standards. The researchers found that many "open source" models are actually open source in name only, but the data needed to actually train the models is kept secret, and the computational power needed to run the models is beyond the capabilities of many developers.
For example, Meta requires platforms with more than 700 million monthly active users to obtain a special license to use its Llama model, and Maffulli has publicly criticized Meta for calling its model "open source". Google and Microsoft, in discussions with OSI, have agreed to stop calling models that are not fully open source "open source", but Meta has not done so.
In addition, Stability AI, which has long touted its models as "open source," requires an enterprise license for companies with annual revenues of more than $1 million, while French AI startup Mistral's license prohibits commercial use of certain models and outputs.
Meta naturally disagrees with this assessment. Although the company was involved in the drafting process of the definition, it took issue with the OSAID's wording.A Meta spokesperson said that Llama's licensing terms and accompanying acceptable use policy provide protection against harmful applications.Meta also said that the company's approach to sharing details of the model at a time when California's AI-related regulations are evolving is "prudent ".
"We share the position of our partners at OSI in many ways, but we, as well as the rest of the industry, disagree with their new definition, and we believe that there is no single definition of open source AI because past open source definitions have not been able to encompass the complexity of today's rapidly evolving AI models. We make Llama free and openly available and secure through our licensing and usage policies. Regardless of the technical definition, we will continue to work with OSI and other industry groups to increase the ease of use of free AI."
The analysis suggests that Meta's reluctance to disclose training data is likely related to its own and the way most AI models are developed.
AI companies collect large amounts of data, such as images, audio and video, from social media and websites and train models on this "publicly available data". In today's competitive marketplace, the methods used to collect and optimize datasets are seen as a competitive advantage, and companies often use this as a reason for refusing to disclose them.
But the details of the training data could also expose developers to legal risk. Authors and publishers claim that Meta used copyrighted books for training. Artists have also filed a lawsuit against Stability AI, alleging that it used their work without acknowledging it, likening their behavior to theft.
OSAID's open-source definition of AI, therefore, could pose a problem for companies trying to successfully resolve lawsuits, especially if plaintiffs and judges find the definition reasonable enough to cite it in court.
© Copyright notes
The copyright of the article belongs to the author, please do not reprint without permission.
Related posts
No comments...