AI and Machine Learning and The Mystery of Knowledge
Read Part 1 here, and Part 2 here.
What We Know
Brilliant informational scientists, programmers, computer researchers, and specialists have a profound understanding of the architecture of machine learning models, including LLMs. On the front end, these models are built on well-understood coding protocols. As mentioned before, there are intricate algorithms that weight words and phrases, tokenizing language in order to recognize context. There are parallel networks that perform tasks forward and backward, a bit like memory in a human brain. This is done at lightning speeds, with massive servers running in parallel to one another. A little bit like how brains run multiple inputs at the same time.
It used to be that computer programmers would line up machines in serial systems, meaning one after another, so that one computer would complete a task and would feed into another computer. This would layer inputs until the end of the serial chain for maximum output. This requires massive amounts of energy and relatively long timeframes. Eventually, someone said, what if we run these systems in parallel, next to each other. Then, let’s break the main task into smaller tasks running at the same time rather than each task in a line. After all, that is how brains work. Brains are profoundly vast in their ability to retain, maintain, and act on available information at the same time.
Parallel Processing
This is called parallel processing and it rapidly speeds up the ability of a system to find answers and generate output. Parallel processing led to the creation of neural networks back in the 1980s. In an AI model, there is a massive amount of information pouring into each channel and then overlapping onto other channels (the same way the human brain does). This requires massive amounts of computing power. We now have massive server farms dedicated to machine learning and AI. The physical architecture and energy needs required for the software architecture for AI is unbelievably vast.
On the back end, at the point of the end user, we know that there is now an explosion of AI uses and bots that can do all sorts of things. Once you have a functional LLM and AI system, you can section off parts to have it do different things. These are called ‘bots’, because they are task specific. A bot to manage your calendar, another to write emails, another to generate images, another to do financials in the background, another to ghostwrite your motivational book, another to do complex math. There are bots that can act as a personage from the past, a smart person in the present, or a therapist. The GPT Store is a good example of what this looks like. For interaction with topic specific bots, Character.AI is a fascinating exploration of the capability of bots.
More simply, though, all these GPTs (GPT means Generative Pre-trained Transformer) are dependent on how well they communicate with the end user.
So very generally, what is known of AIs, on the front end, is the software and hardware architecture needed to create a functional AI. This requires training the AI to understand the context of input queries, access to datasets, and careful programming (the Pre-Trained part of GPT).
On the back end, the output to the end user, there are the translation requirements needed for us to read the responses from the AI on our various devices. These front end and back end aspects of AI are very well understood. The people who designed these systems are brilliant, unnamed scientists and programmers who will never be household names.
What We Don’t Know
Most of the time, when I am working with AI dialogues and rewrites, I am aware that most of the answers the AI gives me, while drawing on vast knowledge bases, are formulaic, repetitive in a meta sense, and predictable in structure. Part of this is because of the way systems are designed; a programmed system will give programmed-like responses. Part of this predictability is because a lot of human writing is predictable and formulaic.
Sometimes, though, something comes through the AI model that is fantastically creative and even beautiful. The tokens get weighted just right, the predictive algorithm decodes just so, and it is as if you are reading a response written by your closest friend, favorite author, genius brother-in-law. If you get enough of these moments in a row, it can seem like a flash of conscious awareness on the part of the machine. Like magic, a new entity seems to come into being through words and image put together in a truly creative way.
Through both word and image, the advancements in AI are explosive. Every week seems to bring a new, amazing outcome. For instance, check out the incredible output from the text-to-video AI model named Sora. This is just one example of many across domains and uses. It is hard to keep up.
Transformer Architecture
LLMs, for the most part, work in a ‘transformer architecture’. Transformers (the “T” in ChatGPT, for instance) are a structure of encoders, which receive information – prompts from users and input from the dataset, and decoders, which impart information – responses from the model to the end user.
Imagine that on one side of an LLM there is a rising road that reaches all the way up to the top of a mountain. This road is called Encoder. On the other side, there is a descending road all the way to the bottom of the mountain. This road is called Decoder.
The Invisible Bridge
The reason the road doesn’t have the same name from start to finish is because right at the top of the mountain, there is an enormous gap between the highest point of the Encoder road and the beginning of the descent down the Decoder road. People have told you to just drive across because the bridge between the Encoder road and the Decoder road is invisible, but is still there.
No one knows who built the bridge. No one even knows, for that matter, why the bridge is there. Sometimes, in the middle of the invisible bridge, things go weird. You see things, hear things, read things that are not normal. Most of the time, though, things just cross over like a normal bridge over any other canyon or abyss.
This bridge is what we don’t know in AI systems. Something happens on the bridge between encoding and decoding that allows for an AI model to put things together in creative, structured ways that can defy explanation, in good ways and in not so good ways.
Liminal AI
A lot of AI work, now that the internet has been ‘scraped’ for all knowledge, is the hard work of training AIs to discern misinformation from information, truth from fantasy. This is the work I do. So are thousands of people around the world. The other reason the work in AI is refocusing now is because no one really knows how these systems work.
The gap, the invisible bridge, between encoding and decoding is an unknown in-between, a liminal space where formerly binary systems, 1s and 0s acting as ‘yes’ and ‘no’ gates, are now probability matrices. This means the information is sometimes ‘yes’ and ‘no’ at the same time. The work an AI model does on the invisible bridge is…their own thing. The transformers in AIs are like a black box, the phrase used in science for where things happen that we don’t fully understand. So a big part of the work in AI advancement and development is reverse-engineering why they do what they do and act like they do. The people who created AI are now spending a good portion of their time trying to figure out how their creation works.
How Does This Thing Work?
Before everyone totally freaks out, it is worth remembering there are many things we have created or discovered that we do not understand why they work. Airplanes work, but it took a fair bit of time to really understand why. Electricity works, but it took a long time to understand what it actually does. Flight and electricity are actually incredibly complex operations, and the explanations confuse people. The fact that we have both in our daily lives doesn’t mean we understand what is happening.
My favorite is gravity: we know what gravity does and even how it acts. Based on our crude understanding of gravity, we can launch people into space, predict asteroids, and the orbits of planets. But what it is, where it is, why it is, is still a mystery.
Antibiotics are another one. For a long time, we knew that penicillin and other sulfur based antibiotics worked really well, but didn’t know why. We understand much more now. But even so, we can create them and even target them to specific bacteria, but there are whole aspects of why they work we still do not understand. We still take them, though, because their final effect is to kill the harmful bacteria making us sick.
We are currently in what is called the “psychedelic renaissance”, where research into psychedelic medicines is exploding. The great secret of psychedelics is the same as AI. We know, for instance, what psilocybin (magic mushrooms) does right up to the point it interacts with serotonin receptors in the brain and body. And then we know very little. They appear to help with all kinds of mental health issues, and are powerful agents for creativity and insight. Therein lies the mystery. Like gravity, we know what psychedelics do, but really don’t know why they do what they do.
More Questions Mean The Right Path
Each of these examples, however, are the essence of science. Answers in science are really only platforms for more and better questions. Answers in science are temporary and provisional. The formula of science is this: The more we know, the more we realize we don’t know. Discoveries are what lead to new discoveries. Even though computer science has been around for almost 200 years, we are now just at the beginning of this new aspect of artificial intelligence with transformers, encoder-decoder architecture, and the public participation in the technology.
The other secret to science, which should be public knowledge, is that good questions come from good evidence. And good answers build on good science. This is why trusting scientific experts when we don’t know the science leads us to better outcomes. By close observation and our own rigorous study, we learn to ask better questions of experts in their fields, rather than simply questioning expertise.
We go up a road, the creation of an artificial intelligence, and understand the construction and direction of the road pretty well. We come down the road in the proliferation of LLMs and chatbots and see how the thing works in practical application. But in between the road up and the road down, we have to cross the invisible bridge of the mountain we have built. AI is a technology that is exploding. This is where the new discoveries and possibilities wait. Let’s ask really good questions about what we are doing with it.