Is Artificial Intelligence going to overthrow humanity?

Alexei Ramotar
19 min readNov 16, 2023

--

Can AI create or innovate based on “emotions” and opinions? Or is it just pattern matching? Source: https://denungeherrholm.smugmug.com/ (Kim Diaz Holm)

Introduction

“AI drone ‘kills’ human operator in simulation” — Sky News

“AI poses ‘Risk of Extinction’, Industry Leaders Warn” — The New York Times

“Runaway AI is an extinction risk, experts warn” — Wired

“The only way to deal with the threat from AI? Shut it down” — Time Magazine

Artificial Intelligence (AI) is all the rage now and apparently a doomsday point is rapidly approaching where AI will destroy its creators. “Mitigating the risk of extinction from AI should be a global priority” says the Center for AI Safety.

How real are all these warnings? That’s what we’ll help you decide.

Whether you are a technology professional, enthusiast, hobbyist or even layperson the hype about AI is pervasive. In this article we will attempt to give as simple an explanation as possible about AI with the aim of demystifying much of the technology, the jargon and media frenzy. While we have an opinion on the abilities of AI we will not be promoting our views. Instead after reading this, and the veil of mystery has been lifted, the reader will be able to decide for themselves.

You’ll be able to say whether AI is really similar to human or non-human animal intelligence. Whether humanity should be afraid that AI will gain sentience and send out Terminators to mass murder humanity; or be benevolent like Data from Star Trek a sentient AI dedicated to helping their crew members while simultaneously pursuing egotistical goals; or be a tool like J.A.R.V.I.S in Iron Man, used to enhance Tony Stark’s own intelligence. We hope that on completion of this article the reader will be able to draw their own conclusions.

What is Intelligence?

Before we start we should define the term intelligence. While most of us use this word or one of its many synonyms we believe we have a general understanding of what it is, but we may very well differ on its specifics. Some people may think that intelligence can be measured by the ability of a person to do well in standardized exams; we often see articles in the local newspapers praising the meritorious dedication of young people who do well at NGSA or CXC/CSEC. Others may think that intelligence is the ability perform further academic research requiring inventiveness and deep understanding of an academic field that is usual in PhD programs. We can ask whether chess grand masters, speedcubers (people who solve Rubik’s cubes very quickly), or math savants who can solve complicated equations without a calculator are all intelligent? What about polyglots, is the ability to learn multiple languages fluently a function of intelligence? Can the intelligence of a person be assigned with a number as is attempted in IQ tests? Is someone’s personal wealth a function of their intelligence?

The fact is that many people who are considered to be intelligent are of average intelligence, or even nincompoops, in areas that are not their specialties. A funny, yet illuminating, illustration of this is the case of Werner Heisenberg (a Nobel prize winner in physics and early pioneer of Quantum Mechanics)., while he was doing his PhD dissertation. His area of specialty, Quantum mechanics, requires a ton of mathematical knowledge and in his dissertation he aced this part of the interview. However, in questions about astronomy he was not particularly brilliant and in fact did poorly. Finally on experimental physics he was just awful. He did not know how to use some of the equipment normally used by experimentalists at that time, nor did he know how to measure the sensitivities of some of the others, he couldn’t even explain the principles of how a battery works. Would anyone dare call Heisenberg unintelligent? Definitely not, but there were areas even within physics itself where his knowledge was average if not completely absent.

The above anecdote illustrates that even within a single person, a genius nonetheless, their knowledge and intelligence varies based on different aspects of the environment in which they live. What does this have to do with Artificial Intelligence? While many of the proponents of a possible AI apocalypse work in the field their ability to predict the future should not be overstated.

Finally there is also an additional issue. Much of the fear in the headlines are also generated by people who have a vested interest in AI. Is it not strange that many of the people who endorse the halting of AI development; OpenAI, Microsoft, Elon Musk etc. are simultaneously increasing their investment in development of AI? One has to question their motivations.

Demystifying AI

It is important that before one can judge whether the headlines are prophetic or ludicrous one should have a relatively decent understanding of the topic. While what is to follow provides a bird’s eye view of the topic, it is sufficiently detailed that the reader should be able to draw their own conclusions, based on facts and not ignorance.

To look at the state of AI we will give the reader a brief, but descriptive, explanation of two of the most lauded algorithms currently available. Specifically we will be looking at the following:

  1. ChatGPT — OpenAI/Microsoft
  2. AlphaZero — DeepMind/Alphabet (Google)

Another major player in the AI sector is IBM with their Watson products. However, these products work as assistants to the user and does not purport to one day become a General Intelligence AI. Instead Watson looks to simply assist the user by speeding up processes that may otherwise take additional hours to complete. It aims to automate much of the routine tasks a user faces in their day to day jobs. Now let’s move on.

ChatGPT

ChatGPT 4.0 was released in March 2023. The GPT stands for Generative Pre-Trained Transformer. Thankfully Computer Scientists, usually, use naming conventions that gives a good indication of the algorithm used in development. The free version that is currently publicly available (https://chat.openai.com/) is based on GPT3.5-architecture. While the difference between the two are not within the scope of this article we will give some of the highlights:

  1. GPT4 is trained on a larger dataset than GPT3.5. Simply it has access to more information than GPT3.5. Remember this, the dataset and how it is obtained and the fact that OpenAI (an oxymoron of a name) and Microsoft want to profit from it is a major issue. Especially given how Microsoft gets uptight about pirated software.
  2. The parameters used to train GPT3.5 are in the range of 175 billion or more. While for GPT4 it is reportedly in the excess of 100 trillion[RTT]. Parameters are values used to control how the algorithm operates. For example to prevent the algorithm from spitting out the exact same response each time it is asked the same or very similar questions there is a parameter that controls the frequency of repetitive answers. Below we see the manifestation of such a parameter. [OAI1]
  3. The larger set of parameters also allow GPT4 to be able to be multimodal; that is it can accept Images as well as text information as input. GPT3.5, and older versions, only accept text input requests.
  4. These additional parameters also allows it to output more coherent and longer responses. We will show how using a naive based algorithm can be hit or miss below.
Figure 1: An example is a parameter that prevents from frequent repetition of answers to the same question. As seen above. Here we see can also see an inconsistency where the ChatGPT says that it is based on GPT3.5 architecture then on GPT3. Source: https://chat.openai.com

Large Language Models

Generative Pre-Trained Transformers, the algorithm in ChatGPT, is a subset of algorithms called Large Language Models[OpenAI2023]. The groups of algorithm are probabilistic in nature, in fact many of the top performing AI algorithms are probabilistic. The reason for this is because we live in a world with imperfect data and many of normal human decisions relies on “intuition”, experience or even finger crossings. These are all probabilistic models that we internalize. Thus it should not be a surprise that AI uses probabilities as well.

Large Language Models at its core is the use of probabilities to guess texts given an input text. We will demonstrate this below. The probabilistic algorithms can be of two types:

  1. Statistical Language Models: example of these can be as simple as Markov Chains, which will be demonstrated below, to more complicated ones like Hidden Markov Models which can incorporate various grammatical and linguistic rules.
  2. Neural Language Models: this is an attempt by computer scientist to imitate the neurons that make up a brain. It was first used in computer vision to categorize simple images. Neural networks create a black box of probabilistic functions that best represents the input data. It is a computer science technique of utilizing a web of computations. These computations go through several layers and iterations and each computational node is called a neuron.

ChatGPT uses Neural Networks, we will describe how Neural Networks work a little later. However, to get a good grasp of the fundamentals we will look at the use of Markov Chains to create a sentence that is relatively coherent.

Markov Chains

Markov Chains is a very simple algorithm to understand. Additionally they can be made even more powerful with data cleaning and standardization as well as adding additional heuristics such as parts of speech, using different size n-grams (explained soon) etc. However, for our example we will use the most basic of Markov Chains whose simplicity allows easy access by the reader.

In our example we will use the novel Moby Dick as our input. The following are the steps:

  1. Ingest the input and do any data cleaning if desirable. For example you can make everything lower case, remove punctuation, chapter names, entire chapters etc.
  2. Choose your n-gram. An n-gram is the sequence of words you want to use in your algorithm. Let’s say we have the following sentence “the quick brown fox jumps over the lazy dog”. A 1-gram would build a vocabulary with each word in an array like [“the”, “quick”, “brown”, “fox”, “jumps”, “over”, “the”, “lazy”, “dog”]. A 2-gram would be [“the quick”, “quick brown”, “brown fox”, “fox jumps”, “jumps over”, “over the”, “the lazy”, “lazy dog”]. A 9-gram would be [“the quick brown fox jumps over the lazy dog”]. What will become obvious if you try to implement this algorithm, is that as n increases you get more coherent sentences. But for us and to keep things simple we will use an n-gram of 1 with no data cleaning.
  3. What we do next is go through each 1-gram and record the word that follows and increment a counter creating a dictionary of sorts. This counter will give number of time that word follows your 1-gram word. Let us give an example from Moby Dick. The dictionary for the word “Sir” would look like: {Sir: [[Thomas,1], [Martin, 2], [sailor, 1], [Clifford, 4], [William, 1]]. This just states that after the word Sir; Thomas appeared once, Martin twice, Clifford 4 times etc.
  4. Now we can get the probabilities of how often a word follows another. Using the above dictionary we see that there are a total of 9 non-unique words that follow Sir. We can now add the probability of each word appearing after Sir. So Thomas would be 1/9, Martin: 2/9, sailor: 1/9, Clifford: 4/9, William: 1/9.
  5. Now we can implement the Markov Chain. If we start with the word Sir we will choose the next word at random. Since Clifford has a 4 in 9 chance of being called it is the most likely word to follow. We can then check Clifford’s dictionary and its distributions which happen to be {Constable: 1/2, thinks: 1/2}. Again we randomly pick the next word which with 1 in 2 chances could be “thinks”. Now we can check the dictionary for “thinks” … and we continue this process thereby generating text. The above process was used to generate the sentence “Sir Clifford Constable has no lightning-rods?” — you can find the code at the following link https://github.com/helsaint:

This of course is a very simple implementation of a Markov Chain. One can be more creative and utilize a larger n and many more books than just Moby Dick. However, even when doing this the Markov Chains still performs much more poorly in generating good text outputs, than Neural Networks

Neural Networks

Because GPT is a Neural Network based algorithm we will quickly describe how a neural network works. We will use the most famous example first created by Yann LeCun that allowed computers to “recognize” hand written numbers. While this is not the same as text generation as seen in GPT the idea is transferable.

Figure 2: A 28x28 pixel image is input goes through several layers where activation function and gives an output which is the networks interpretation of the input image. Source: https://mxnet.apache.org/

We will give a brief description of what is happening in this neural network and then how something similar can be applied in text generation.

  1. A 28x28 image is lengthened so that it becomes a 784x1 vector. An image in a computer is basically a matrix of numbers where each value represents the colours. It’s like drawing but instead of using lines we use a series of dots to “draw” the image.
  2. The lengthened image is called the input layer and it is set to the first layer. The circles are the metaphoric neurons of Neural Networks where an activation function is applied.
  3. The activation function, such as rectified linear unit (relu), is at its basis used to identify if any “useful” information exists in a pixel. Each pixel is sent through multiple neurons in the same layer. Additionally in these neurons multiple pixels are processed to create a new input layer. This mixing of information allows the algorithm to find relationships between pixels that it can then use to translate to a value in the output.

While the exact same thing does not happen in GPT we do have something similar. The input layer could consist of a word, character (split the words into individual characters), n-gram words etc. We can also include parts of speech, the topic class, author information, date published, country published in, popularity etc, etc, etc. Once placed in the input is sent the algorithm will predict the best possible text to generate.

A major strength of ChatGPT is the volume of data used to train the system. GPT3 was trained on 570GB of data[OAI1]. There is no official information on the size of the dataset used by GPT4 but it is estimated that it is in the region of 100,000GB. The data is gathered from a variety of places including data scraped from the internet (Wikipedia, Reddit, newspapers, manuals, websites etc.), licensed databases, and data created by OpenAI contracted trainers. We see how an incredibly simple algorithm like Markov Chains, with no data cleaning and using a 1-gram, can create a coherent sentence using a single book. Now imagine what could be done with a huge dataset, much better algorithm and better implementation.

The datasets and how GPT was trained has come under some scrutiny as well. Because the data is scraped from many sites on the internet GPT3, in its early trials, was pretty racist, misogynistic, violent and sadistic. The system needed to “learn” not to allow this kind of language so OpenAI through a San Francisco based contractor employed at least 200 Kenyans[QZFN] to train the system to recognize toxic language and block GPT3 from being an a-hole. For this work, a necessary job if ChatGPT was to become widely used and not just a novelty corrupted by junk data, the contractors were paid USD2/hour and immediately fired upon completion. So we see that people are used to manually curate the system this process is somewhat euphemistically called “fine tuning”.

There is also the case of how OpenAI got some of the datasets used in ChatGPT. Currently a group of authors is suing OpenAI, among them is Sarah Silverman, an American comedian and novelist. She alleges that ChatGPT ingested her novel “The Bedwetter” without her permission. She based her belief on the fact that ChatGPT is uncannily good at summarizing parts of the novel in great detail[APSS]. One good thing that could happen if this case goes to trial is that it may shed some light on the inner workings of GPT4.

Yes, ChatGPT is impressive, however, it is impressive because it has been able to leverage the work of people from all over the world. The people who fill up Reddit, Quora, Wikipedia, Github etc. with useful content.

Is it intelligence just to be able to regurgitate information and rearranging the words, while at the same time you have human intervention to “fine tune” issues? Yes, GPT4 performs well at standardized exam, but that just proves that in general, standardized exams are simply based on rote learning. Can GPT4 and its successors show real innovation? All animals, human and non-human, are known to be innovative, can ChatGPT become innovative?

AlphaZero

Now let us briefly describe another more impressive, perhaps, algorithm used in DeepMind’s AlphaZero. AlphaZero is an upgraded algorithm of AlphaGo created by DeepMind a UK based company now a subsidiary of Alphabet (Google). The AlphaGo algorithm was designed to play the Chinese game Go considered one the toughest and most complex board games in the world. To give an idea of the complexity of the game, Go has 10360 possible moves on the other hand Chess has 1040 possible moves.

AlphaZero is an improvement on AlphaGo in two major ways:

  1. AlphaGo was developed using supervised learning. Basically the designers input previous games and labelled them good or bad. Therefore when AlphaGo plays it can often recognize game states and make plays accordingly. This is similar to human players who memorize game openings, mid game and end game moves in chess. As it plays more against an opponent it learns more moves and recognizes what a good move is vs bad ones by storing plays that result in wins and the game states that led to this. However, that initial input is necessary. AlphaZero does not need this. All it requires is to have the rules of the game. It then trains itself by playing against itself. It starts by using random moves that follow the rules. As each game is completed it “learns” what moves lead to a win and what lead to a loss.
  2. This ability to teach itself allows AlphaZero to be used in multiple games. AlphaGo can only play Go, while AlphaZero can play Go, chess and shogi. Additionally the ability to train itself allows AlphaZero to be used in a wide variety of situations, after all it only needs to have as input the rules of the situation. It can then start training itself. If you have watched the movie War Games (1983) you will see a stylized example of this learning. WOPR is given the rules to play tic-tac-toe and quickly learns that the game will always result in a draw (if played intelligently). It learns this by repeatedly playing against itself.

Let us now give a quick and dirty, but hopefully useful, description of how AlphaZero works. The algorithm primarily uses 2 other algorithms:

  1. The first algorithm is used to search for the best possible moves given a state of the game. This algorithm is called Monte Carlo Tree Search (MCTS). By the fact that Monte Carlo is famous for its casinos you will not be remiss if you guess that this also a probabilistic algorithm.
  2. The second algorithm is called Deep Q Networks. Its a Neural Network algorithm similar to that described above. The main job of this algorithm is to simulate different outcomes given the current state of the game and rating which simulations are the best.

To give a good understanding let us look at the heuristics that would be used by AlphaZero on Tic-Tac-Toe. The game of Tic-Tac-Toe has 45 possible moves which can result in over 255,000 games that can be played[MCF]. Everyone is also familiar with the game so they should feel comfortable creating a mental picture of how AlphaZero would approach the game.

Before the use of MCTS, classical game algorithms were deterministic in nature. That is they would play an orthodox game and in simple games a relatively skilled human opponent could defeat the game by playing in an unorthodox style. The game would never learn new “strategies” to win without their developers manually inputting this data.

Monte Carlo Tree Search fixes this because it allows for the discovery new strategies by having the ability to investigate all avenues of play. For simple games like Tic-Tac-Toe a deterministic algorithm could be used, after all there are only about 255,000 games that could be played. But for games like chess, Go, shogi the number of games possible are more than all the atoms in the universe. Earth would be swallowed up by the Sun and a deterministic algorithm would still be working on figuring out all possible games.

Figure 3: Decision Tree for a game of Tic-Tac-Toe revealing one possible end result. Dotted lines represent different game states in their respective levels that aren’t shown for brevity

Thus there is a need for something different and MCTS is that something different. Here we go:

  1. Each game has a state, these are shown in Figure 3. Each state of the game just shows the positions of pieces on the board. These states of the game are stored in a tree-like structure with the root at the top. In our case the root is the initial state of the game before an X is marked and the game starts.
  2. From each state there are a number of plays that can be made. That is you can put an X in a corner, a side, middle etc. This is where the Neural Network is used. It simulates a set of games from the current state and ranks them. It attempts to simulate as many levels as possible, increasing the depth of the tree. If a winning game state is found it stops and returns this information. If, however, after n number of iterations has occurred and no end game state is found the states are ranked by their positions and information is relayed up the tree.
  3. Each game state and its ranking are then stored in a system called a Value Network. This is where the learning occurs. As a side note: when playing the game for real it is the Value Network that is used to play the game.
  4. The ranking of each branch will then be used to create a probability function that will be used by the algorithm to choose which branch to use and therefore its next play. This gives the algorithm a unique opportunity to investigate, sometimes, branches that don’t initially seem advantageous.
  5. Because games can be long and positions that don’t currently seem advantageous can become so through future moves and algorithm that only follows best (probabilistic wise) routes will not discover useful stratagems. For example in chess the Bryntse Gambit is a play where white sacrifices their queen early in the game to later gain a positional advantage against the black king exposing it to attacks. Since in chess the loss of a queen, valued at 9 points, is considered very disadvantageous, an algorithm that only looks for best next plays would never discover these types of openings. By including probabilities into the choice such stratagems can be discovered.
  6. After each game the algorithm saves information about the game and its outcomes. Therefore in future games when it encounters a state it will check to see if that state was encountered before and if it was what the outcome of future plays were. Then either it continues down that path or if that wasn’t successful chooses another.

AlphaZero and the use of MCTS is an incredibly clever algorithm. It allows for randomness which in turn can be used to find new pathways to victory.

Is AI Intelligent?

We have looked at two of the most powerful and lauded algorithms in AI. One “learns” by rote and the other learns by trial and error. Is this intelligence? The algorithm in ChatGPT is very impressive in its human interaction interface. However, its learning is currently dependent on the data that it ingested. What is impressive about the algorithm is it’s ability to interpret user input and generally get good responses. Nevertheless, are we saying that the ability to correctly recall data is the sign of intelligence? What happens when there is a requirement for innovation? History is replete with scientist having to ignore previous “facts” and becoming “radical” by pursuing new avenues of discovery. Newton and many great minds thought there existed an aether which was then overthrown by Einstein’s spacetime. Einstein, while not denying Quantum Mechanics, had deep reservation about Bohr’s Copenhagen interpretation of Quantum Mechanics, currently most widely accepted interpretation. Would ChatGPT be “brave”/“have the ego” to say “the data I have is not correct, let me suggest something else”? Rote learning is known to shun such behaviour.

In spite of the fantastic work done by the engineers who build ChatGPT this development has created a new job; Prompt Engineers. A prompt engineer will basically try to design input queries for ChatGPT so that it delivers the correct response. This includes creating constraints, ensuring clarity, and provide structure of the request. Example of a bad prompt: Describe ChatGPT. Example of a good prompt: Give an overview of ChatGPT, describing its characteristics and functions.

Now you may say, but what about AlphaZero and its next iterations? Its built in a probabilistic approach, has the ability to leverage randomness and examine areas that don’t currently seem promising. While the approach taken by AlphaZero seems the most promising current algorithm, it is still bounded by rules. As primarily a game playing algorithm it requires a set of rules as an input. Those rules aren’t self determined, they are determined by people. What if the rules are wrong? They obviously can’t be all encompassing.

In humans and non-human animals innovation is rife. Non-human animals are known to be very innovative when there are changes to their environment. They develop new foraging techniques, new mating calls, use tools etc[TRS1]. Can a computer algorithm be able to do the same? Animals innovate because they need to in order to survive. Would an algorithm be able to mimic this behaviour? Would it then gain ‘consciousness’ and act on its own behalf for self-preservation?

The fact is that AI does pose a threat, but that threat is an economic one. It’s a threat of the haves and the have nots. The hysteria of AI is the fear of open source AI. What the corporations want is to lock it up behind IP rules like they did with HIV drugs before generics were introduced. They want to keep their advantage. According to Luke Sernau, a senior engineer at Alphabet, “The uncomfortable truth is, we aren’t positioned to win this arms race and neither is OpenAI. While we’ve been squabbling, a third faction has been quietly eating our lunch … I’m talking, of course, about open source. Plainly put, they are lapping us.”[SMA1] They want to create a moat to profit from. So what’s the best way to do that? Get the government to protect the interest of the corporations by force.

Bibliography

RTT: Jurgen Rudolph, Samson Tan, Shannon Tan, Chat GPT: Bullshit spewer or the end of traditional assessments in higher education?, 2023

OAI1: Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah et al, Language Models are Few-Shot Learners, 2020

OpenAI2023: , GPT-4 Technical Report, 2023

QZFN: Faustine Ngila, OpenAI underpaid 200 Kenyans to perfect ChatGPT — then sacked them, 2023

APSS: Matt O’Brien, Sarah Silverman and novelists sue ChatGPT-maker OpenAI for ingesting their books, 2023

MCF: Michael C. Fu, Monte Carlo Tree Search: A Tutorial, 2018

TRS1: Simon M. Reader, Julie Morand-Ferron and Emma Flynn, Animal and human innovation: novel problems and novel solutions, 2016

SMA1: Dylan Patel and Afzal Ahmed, Google “We Have No Moat, And Neither Does OpenAI”, 2023

--

--

No responses yet