We're the leading agency in artificial intelligence & machine learning

   +31 6 53 64 34 73   St. Jacobsstraat 12, Utrecht

How does ChatGPT works?

ChatGPT rules the media and many users have tried the service and are amazed at the capabilities of ChatGPT. ChatGPT is great at making language that reads as if a human wrote it. This article won’t give you any prompts or tips on how to use ChatGPT, but it will give you an insight into the workings of the artificial intelligence ChatGPT uses and why the texts sound so human. OpenAI explains a bit about how they trained ChatGPT, but in this article I’ll lift the hood and show you how ChatGPT works.

Predict word for word

ChatGPT works with prompts, which are questions or commands that you enter as a user. These prompts are the starting point for ChatGPT to complete texts, reply, or perform tasks. An example of a task is, for example, writing texts or programming code.

ChatGPT has no understanding of the texts it writes. It can write total nonsense with the same certainty as it gives correct texts. This has to do with the way it is made. The model predicts the text word for word. For example, with the text ‘The dog goes into the basket…’, it will probably predict the word ‘lie down’.
ChatGPT has no understanding of the texts it writes.
Actually, ChatGPT doesn’t even predict word by word, but token by token . First, the text is cut into chunks based on the spaces between the parts (these aren’t necessarily words, they could be numbers or something else). Long words can be made by connecting tokens.

The unique thing about ChatGPT is that it preserves context. The model has a temporary memory and remembers the answers given during the conversation. ChatGPT uses this memory to predict the next word. If ChatGPT knows the dog is a puppy that is potty trained, the next word is more likely to be “pee”.

The predecessor of ChatGPT

ChatGPT started based on GPT-3. GPT-3 is the predecessor of ChatGPT and has been in beta since summer 2020. ChatGPT is built with the knowledge they have gained from GPT-3. This model has been trained with large amounts of internet text. The capacity of such models as GPT-3 is expressed in the number of parameters. The more parameters a model has, the better it can find complex patterns in data. GPT-3 has 175 billion parameters.

In general, the more parameters a model has, the more data is needed to train the model. According to the creators, the GPT-3 model has been trained with about 45 TB of text data from multiple sources, including Wikipedia and books.

About 3% of the data on which GPT-3 is trained comes from Wikipedia, about 16% from books and the rest is sucked up from the internet.

Today ChatGPT is based on GPT-4 of which lesser is known about the data and the number of parameters. It’s even more powerful than GPT-3.

Human-sounding answers

ChatGPT answers surprisingly humanely to many prompts. It is sometimes difficult to tell whether a human or a machine wrote a text. I have more than a year of experience with GPT-3 and have written both texts and code with ChatGPT as an assistant. I was surprised how human the results sounded. So I wondered, “How did they do that?”
ChatGPT sounds human because human knowledge has been added to the model.
ChatGPT sounds human because human knowledge has been added to the model. The model is trained in roughly three steps that make extensive use of human knowledge. For example , people are used to add characteristics to data or they assess the results of a model . These people are called labelers.

Training in three steps

OpenAI trains ChatGPT in three steps, using a pre-trained model for the first step. In the case of ChatGPT, that is GPT-4. Prompts are entered and people label the results. In short, they indicate whether the outcome fits the prompt. With this new knowledge, a first version of the new model is trained.

In the next step, human preferences are simulated. A list of prompts is selected from the new model and the labeler is presented with multiple outcomes of the prompt, anywhere from 4 to 9 for each prompt. The labelers rank the outcomes from best to worst. In this way you build up new knowledge, which can be used to train a next version of the model.

In the final step, the model is optimized through long-term training. This last step uses Proximal Policy Optimization (PPO). This is a way for a computer to learn how to perform a task. Just like you learn new things, the computer has to try different things and learn from its mistakes. PPO is a special way the computer learns, making it learn faster and better.

Think of it like playing a game where you have to jump over obstacles. At first you may jump too far or not very far