Conceptualization|What is Big Model
I. Introduction
ChatGPT, Open AI,Large ModelThe black words of artificial intelligence such as, cue word engineering, Token, illusion, etc., have repeatedly washed over everyone's cognition in the common yet miraculous year of 2023. While making one part of the population lie completely flat, it makes another part of the population start to get anxious for fear of losing at the starting line in this miraculous year of AI.
If you happen to be somewhat interested in this track but don't quite understand the specialized vocabulary, we recommend bookmarking this document and pulling it out when you have time.
In this paper, through the comparison with the human brain, through a figurative analogy to let the novice white people can also have a sense of understanding of the big models
II. Basic concepts of large models
2.1 What we are talking about when we say big models
Big models and big language models are two concepts in the field of artificial intelligence.
Large Model: Refers to large, complex algorithmic models used in machine learning that process and analyze large amounts of data for a variety of tasks such as image recognition, natural language processing, and so on.
Large Language Model (LLM): Big Language Models are a type of big models, specialized in processing and understanding natural language, such as text generation, language translation, etc. They master the laws and structure of language by learning a large amount of text data. In a nutshell, Big Language Models are the application of big models in language processing.
The bigness of a large model refers to the size of the model in two ways:
- Number of model parameters: More parameters in a model means a more complex structure of the model and a richer set of data features that can be captured so that more complex tasks and more accurate predictions can be handled.
- Amount of data required for model training: to ensure that the model learns enough knowledge and laws to avoid overfitting.
So the bigness of big models means huge demand for data resources and computational resources.
Training ChatGPT such generative AI requires at least 10,000 NVIDIA A100 graphics cards, the price of a single card is currently 60,000 to 70,000, the performance of the better V100 single price of 80,000 yuan, that is, the light computing power investment will have to reach at least six or seven hundred million or more!
2.2 Using the human brain to understand large models
The big model is composed of three levels, which are algorithm (model structure), model parameters (number and value), and training data. To better understand the big model, we can map these three levels to the human brain
- Algorithms (model structure)Imagine this as the basic workings of the brain or "instruction manual". Just as we learn to walk or talk by following certain basic rules, the algorithm tells the big model how to process and understand information in a basic way.
- model parameterThis can be compared to your life experiences and memories that make you unique. For example, when you learn to ride a bicycle, your brain remembers the "settings" (parameters) of how to keep your balance. In a big model, these parameters are the "lessons" it learns by looking at large amounts of data, which help it make decisions!
- Training data: It's like a person learns new things by seeing, hearing, and feeling everything. Let's say you've traveled to many countries, and your brain understands the world based on those travel experiences. For a large model, the training data is the information it uses to learn, and this information helps the model "experience" the world.
In this way, we can think of the Big Model as an "electronic brain" that is learning about the world by Observations (training data), Memory (model parameters), **Basic rules (algorithms)** to understand and predict the world, just as a person learns and grows through life experiences.
III. Fundamentals of the Big Model
3.1 How the Big Model Works
When a large language model answers a human question, the process can be described in the following easy-to-understand steps:
- Reception issues:First, the Big Model receives a question, just as the human brain hears a question posed by someone else through the ears. In this step, the macromodel "reads" the text of the question and begins to understand what is being asked. Analogous to the human brain receiving information through hearing or seeing, the brain then begins to process this information.
- Understanding the problem:Next, the Big Model analyzes the intent and keywords of the question, just as the human brain understands the meaning of a question when it hears it based on known language rules and vocabulary. The analogous brain will understand the intent of the question based on previous experience and knowledge.
- Retrieve information:Once the question is understood, the Big Model searches its "memory" for relevant information, much like the human brain searches its memory for answers. The "memory" of the big model is made up of a large amount of data learned during previous training. Analogous to the human brain rummaging through memory to find relevant information to answer a question.
- Organizational Answer:Once the relevant information is found, the Big Model will begin to construct the answer, organizing the found information into a coherent piece of text. This process is like the human brain finding pieces of an answer and starting to put them together into complete sentences ready to be spoken. Analogous to how the brain organizes ideas into fluent language when preparing a speech or writing an essay.
- Optimize the answer:Before the answer is ready, the Big Model also checks and optimizes itself to ensure that the answer is accurate and appropriate. This step is similar to how a person would go over and over the answer in their head before saying it or writing it down, adjusting the wording to make it more accurate and appropriate. It's like double-checking the content of a report or an important email to make sure there are no errors before you submit it.
- Provide an answer:Finally, the big model outputs a response, just as humans eventually speak or write their responses. This answer is based on the model's understanding of the question, the information it retrieves, and how well it organizes that information. It's like answering a question in a conversation or filling out an answer on a test, where your brain translates all the prepared information into verbal output
3.2 Where do the capabilities of a large model come from?
Imagine that the process of training a large language model is like teaching a child language and knowledge. We can understand this process in a few simple steps, while comparing each step to the way the human brain learns.
1. Data collection
First, just as a child learns from books, conversations, and television, we need to provide the Big Language Model with a large amount of textual material. This material comes from online articles, books, news, etc. and covers a wide variety of topics.
Analogous to the human brain: this is like providing children with verbal stimulation in a variety of books and environments, exposing them to a wealth of information and knowledge.
2. Data pre-processing
We then need to organize this information to ensure that it is clear and useful. This may include removing duplicates, fixing errors, etc.
Analogies to the human brain: teaching children to distinguish between useful information and noise, e.g. teaching them to understand which are important words and sentences and which are background noise
3. Model training
Next the big language model learns this data through training, which in turn can be divided into 3 steps:
-
unsupervised learning
In unsupervised learning, the Big Model is like a child exploring the world without explicit instructions. It tries to figure out relationships and patterns between words, phrases, and sentences by looking at large amounts of textual data, rather than being told the exact meaning of each word or sentence directly.
Analogous to the human brain: it's as if children learn how objects interact with each other by playing with toys and observing their surroundings on their own, without an adult around to guide them every step of the way.
-
Supervised learning
In the case of supervised learning, a large model is trained as if it were being guided by a teacher. The model is provided with a large number of "question-answer" pairs, and its task is to learn a pattern of how to find the correct answer from the question. In this approach, the model learns by comparing its answers to the correct ones, constantly adjusting itself to minimize errors.
Analogous to the human brain: this is the equivalent of a child doing homework where a teacher or parent will tell them which answers are right and which are wrong and help them understand the reasons behind the correct answers.
-
Enhanced learning
Reinforcement learning, on the other hand, is more like a reward mechanism when training a pet or a child. In this process, the larger model learns through trial and error, receiving a reward whenever it makes the right decision, and possibly a penalty or lesser reward for making the wrong decision. This approach encourages the model to explore on its own and find the best path to reach its goal.
analogical human brain: Just like a child learning to ride a bicycle, they receive praise or rewards from their parents when they find a way to keep their balance and ride successfully. This positive feedback encourages them to continue practicing and improving their skills.
4. Iterative training
The Big Language Model needs to be practiced with these materials, trying to improve them each time, until it can fluently "understand" and generate text.
analogical human brain: Just as a child needs to constantly practice speaking and reading, comprehension and memory are deepened through repetition and practice.
Note that iterative training is not a standalone process; unsupervised learning, supervised learning, and reinforcement learning mentioned in model training all have their own iterative training processes.
5. Fine-tuning
Sometimes the model doesn't perform well enough on a specific task. That's when we fine-tune it on a specific dataset, like special tutoring for a child's weaknesses.
analogical human brain: This amounts to providing more practice and guidance on a child's learning difficulties to help them make progress in a specific area.
6. Applications (deployment)
Finally, the trained and fine-tuned macrolanguage model is ready to demonstrate its capabilities on a variety of tasks, such as answering questions, writing, or translating.
analogical human brain: It's like a child learning the language and knowledge to be able to do well on a school test or communicate effectively in everyday life.
Through the above analogy, we can see that the training process of a large language model bears a striking resemblance to the human learning process. They both require a large amount of material, constant practice and learning from mistakes, as well as targeted instruction and fine-tuning to achieve a good learning outcome.
3.3 Is the big model always right?
Large models sometimes produce inaccurate outputs, a phenomenon known in the jargon asfigment of one's imagination
To better understand this, let's delve into an everyday scenario: a child is facing a teacher's question because he or she has not completed his or her homework. At this point, the child needs to search for possible excuses from his or her pool of experience, which may include
- I forgot to write it.
- Yesterday I was so engrossed in helping my grandmother cross the street that I delayed writing my homework.
- My homework was eaten by my cat.
- There was a fire in my house. My homework got burned.
The child will then pick an answer to respond to the teacher based on probability. For exampleMy cat ate my homework..
For his teacher, it was actually an illusion. Though the possibility exists as well. But judging by the underlying human knowledge base, the odds were that it was false
This example reflects the mechanism by which big models work when processing information. When big models are confronted with questions they don't fully understand or have insufficient data to support an accurate answer, they try to provide an answer that seems to make the most sense.
This does not mean that the large model is intentionally "lying".Rather, it is because they are trying to make their best guess based on the information they have learned. However, if the training data is full of errors, biases, or inaccuracies, or if the model attempts to make judgments with incomplete information, they may produce misleading or inaccurate outputs.
This situation reminds us that while big models are powerful tools that can provide useful insights and information, we should also critically assess their outputs and be aware of their possible limitations and biases.
3.4 What are the limitations of large models
While the development of large language models has made significant progress, they still face a number of limitations. These limitations are discussed below through several categories and are compared in layman's terms with the way the human brain works.
1. Depth of understanding and context
- Limitations of Large Models: Large language models can struggle with complex contexts or understanding deeper meanings. They can match patterns and generate grammatically correct sentences, but sometimes cannot fully understand complex human emotions, humor, or metaphors.
- human brain comparison: Imagine a child who is just beginning to learn language. While they can copy adult words, they may not yet fully understand the complex emotional exchanges or puns between adults. A child's comprehension increases with experience.
2. Data bias and impartiality
- Limitations of Large Models: Large language models learn based on the data on which they are trained. If these data are biased, the models may also reflect these biases, leading to unfair or biased outputs.
- human brain comparison: It's like if a person is only raised in a specific social or cultural environment, then their views may be influenced by that environment, consciously or unconsciously, reflecting the biases of the society around them.
3. Transparency and interpretability
- Limitations of Large Models: Big language models are like a "black box" and their decision-making process is difficult to track and explain. It may not be clear why the model generates a particular answer.
- human brain comparison: It's like when someone asks us why we have a certain hunch, we sometimes have a hard time explaining it. Our brains take a myriad of factors into account when making a decision, but the process isn't always completely clear or explainable.
4. Depletion of resources
- Limitations of Large Models: Training large language models requires large amounts of computational resources and power, which is both environmentally and economically burdensome.
- human brain comparison: It can be analogized to a student preparing for an exam that requires a lot of time and energy to study and revise. Although the human brain does not require electricity, the time and energy spent in the study process is huge.
5. Security and Privacy
- Limitations of Large Models: Large language models may inadvertently disclose sensitive information in training data or be used to generate harmful content.
- human brain comparison: It's as if we might accidentally reveal someone's secret when we share a story or information, or spread inaccurate information when we don't know the full story.
IV. How to make better use of large models
In order to better use the big model, we can not get around a concept: Prompt (Prompt), what is the prompt?
If you compare a large model to a person, the cue words are the language used to communicate with that person
If the big model is compared to a computer, the cue word is what we call a programming language (Java, Python, etc.)
It can be said that in the age of AI, if you want to use the big model well, you can not understand the algorithm, you do not understand the underlying principles of the big model, but you absolutely can not not understand the cue word, because this is the only way you communicate with the big model.
Because prompts are so important, a specialized discipline called Prompt Engineering has arisen, which seeks to craft and optimize the prompt statements fed to AI models in order to guide the models to generate more accurate, relevant, or creative outputs.
V. Summary
The article delves into the core concepts of AI macromodeling, vividly explaining how macromodels work, the complexity of their training process, and the limitations they face through comparisons with the human brain.
Most importantly: in the age of AI, it's critical to master the "cue words" for communicating with big models.
Author: big saint AI superindividual Link: https://juejin.cn/post/7331021519965978663 Source: rare earth nuggets The copyright belongs to the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please give credit where credit is due.
© Copyright notes
The copyright of the article belongs to the author, please do not reprint without permission.
Related posts
No comments...