A Brief History of AI

The term artificial intelligence (AI) wasn’t coined until the mid-20th century, but the idea of building intelligent systems has fascinated human beings for thousands of years. It’s a concept that predates the industrial revolution and the invention of the first digital computer. It’s also been a frequent source of inspiration for creators of fantasy and science fiction.

Since it was formally defined in the 1950s, the field of AI has seen many ups and downs, excitements and disappointments, breakthroughs and stalemates. In the last few decades AI has made significant advances, but it has also failed to fulfill many of its early promises – so far, at least.

What we have seen is the creation of AI systems that have changed the way we think about human intelligence, and taught us new-found appreciation for its sophistication and complexity. In this article we will outline a brief history of AI, consider its key achievements, and also look at some of the lessons that have been learnt along the way.

We will outline all major events related to AI's development in our history, and also focus on the last 70 years which - as the busy timeline suggests - was rather intense in terms of AI breakthroughs.

Early History

From ancient mythologies to 19th century science fiction.

Engineers, artists, and scientists have always played with the idea of creating a non-human form of intelligence. Ancient Greek mythology describes a giant bronze robot called Talos, who guarded the island of Crete by throwing stones at pirates and invaders. More than two thousands years later, we find the same theme in Mary Shelley’s novel, Frankenstein, where a young scientist sets out to create his own sentient beings.

In 1950, just as AI is being formalized as a new scientific field, Isaac Asimov writes I, Robot, in which he discusses the peaceful coexistence of humans and autonomous robots with the help of his Three Laws of Robotics.¹

In cinema, we have been introduced to intelligent autonomous systems like HAL9000 in Stanley Kubrick’s 2001: A Space Odyssey and J.A.R.V.I.S (Just A Rather Very Intelligent System)², the AI personal assistant of Iron Man in the Marvel series of Avengers movies.

Creators of literature and film have always turned to contemporary scientific advances as fuel for their ideas. We imagine all sorts of fantasies for ourselves about the possibility for artificially intelligent entities – but these fantasies always have something to say about the current state of scientific progress and what we think might be genuinely possible.

We like to root our dreams in some form of reality in order to make them plausible. For that reason, let’s briefly turn to the history of AI purely as a science by looking at engineering, logic, mathematics and computer science.

Almost every civilization has tried to build human-like self-operating machines at some point in its history, however crude some of those attempts might have been. In Ancient China, Yan Shi famously presented a life-sized automaton to King Mu. While in Ancient Greece, Heron of Alexandria used early forms of pneumatics and hydraulics to build his mechanical “men”.

In the year 1206, at the height of the Islamic Golden Age, Ismail al-Jazari built a musical robot and authored The Book of Knowledge of Ingenious Mechanical Devices. The book describes 100 mechanical devices, along with instructions on how to use them.³

In the late 18th century, The Turk was presented as an autonomous chess-playing machine that could beat any human player. It was a con – a chess grandmaster was always hidden inside it⁴ and making all of the moves – but the illusion succeeded because it was built on the exciting idea that a machine might be able to play a logical game against us. And win.

In philosophy, scholars across Greece, India, and China each arrived independently at the formalization of logical systems and deductive reasoning. Aristotle is typically considered the first logician in Western culture, and over the following centuries his ideas were developed by many others, including Euclid, father of geometry, and Al-Khwarizmi, whose name is the source of the word “algorithm”, which is the cornerstone of computer science.

By the 17th century, various machines were being developed in an effort to automate the calculation of large numbers in a quicker, more reliable way than humans could manage. In 1837, Charles Babbage designed the Analytical Engine⁵, the first general purpose computer which contained many of the conceptual elements still seen in today’s computers. It had a computer program, data, and arithmetic operations all recorded on punch cards. A short time later, Ada Lovelace published the first computer algorithm – basically the first software – for this hypothetical machine.

However, it wasn’t until the first half of the 20th century that enough progress was made in the fields of formal logic and computer science to make the first digital computers possible in the 1940s. It’s this crucial development that paves the way for AI to emerge as a new discipline in the 1950s.

The Foundation of AI

The inception of the field in the middle of the 20th century.

With the invention of the digital computer in the mid 20th century, scientists started to ask whether computers would be able to go beyond simply performing computations and executing hard-coded instructions.

In other words, could this machine demonstrate intelligence? Could it think like a human? This question opened up a range of philosophical inquiries because the definition of human intelligence might include difficult concepts, like consciousness.

Alan Turing, the founder of modern computer science, proposed a method to measure machine intelligence. In his seminal work, Computing Machinery and Intelligence, Turing reformulated the question of "Can a machine think?" to ask whether a machine could exhibit human intelligence through its behavior and imitate the ability to think.

He went on to propose a test for machines, called the imitation game, that would decide whether or not a machine is intelligent. This became known as the Turing test, and in its original format it considered a machine to be intelligent if it was able to have a text-based conversation in a way indistinguishable from a human.

So, what does the Turing test look like? Imagine for a moment that you’re having an online conversation via textual messages. You can ask questions and evaluate the responses. If you are interacting with a machine, but you can’t tell it’s a machine from the responses it gives, we can say that it has passed Turing’s test and is intelligent.

There has been a lot of support, but also some criticism, for Turing’s idea. Despite concerns that the test might not be adequate for assessing machine intelligence, it is still widely used today in new formulations.

Turing’s main concern was that machines would be limited by memory and would never have enough to pass his test. He predicted that a computer would be able to pass if it could store 100MB of memory, something he thought in the 1950s would be possible by the year 2000. However, the memory in machines far exceeded 100MB many years ago, and human-like performances in Turing’s test are extremely rare.

Even then, some scientists dispute whether the instances where machines appear to have “passed” can really be considered successful imitations of human intelligence. The limitations of Turing’s test have shed light on the difficulties of measuring and comparing the complexity of human intelligence with the abilities of machines.

In the years following Turing’s test, the fields of machine intelligence and thinking machines started to take form. The term “artificial intelligence” was first used by John McCarthy at the 1956 Dartmouth Summer Research Project on Artificial Intelligence, also known simply as the Dartmouth workshop. This was a gathering of some of the most eminent scientists, engineers, mathematicians, and psychologists of the day. Over the course of two months they combined expertise and insight from across their various academic disciplines into a single field of study, which McCarthy then duly named.

The Dartmouth workshop is considered to be the founding event in the history of AI. Many concepts that are still in use today, such as “neural networks” and “natural language processing” were debated at that conference.

The First Years of AI

Chess, checkers, and the enthusiasm of the 1950s and 1960s.

The intellectual formation of AI as its own field of study soon started to produce significant advances.

Many of the early computer programs that could play games, solve puzzles, prove mathematical theorems, and perform artificial reasoning were developed in the years immediately following the Dartmouth workshop. The concepts of machine learning and artificial neural networks, two of the main pillars of current AI systems, were also formalized around this time.

Chess playing ability has always had a close association with the idea of intelligence. In 1950, Claude Shannon, the founder of information theory,⁶ described the first computer program that was able to play chess. Shannon outlined two approaches the computer could take in order to win a game:

The machine examines the board and exhausts every possible move.
The machine follows a more intelligent strategy and only considers a set number of key moves.

Shannon was also known for the invention of Theseus, a mechanical mouse that was able to explore a maze, solve it, and learn the way out.

Around the same time, Arthur Samuel, an American engineer at IBM, wrote the first computer program that was able to play checkers by learning and adapting its strategy. It was good enough to challenge amateur players and was a great success for Samuel and his employer. His self-learning program used a concept that is now known as “reinforcement learning”. Samuel is also known for coining the term “machine learning”.

Another breakthrough came in 1956 when Allen Newell, Herbert Simon, and Cliff Shaw wrote a computer program called Logic Theorist. This was the first program capable of performing automated reasoning to simulate the way humans think to solve problems.

Logic Theorist was part of what is known as “symbolic AI”, which was the dominant branch of AI in the 1950s, 1960s and 1970s. Modern approaches to AI are based on learning from experience. Symbolic AI is different in that it’s a knowledge-based approach that aims to imitate human intelligence by following a set of pre-coded, symbolic rules of reasoning. The Logic Theorist was able to prove many of the mathematical theorems in Bertrand Russel’s Principia Mathematica, a landmark work on formal logic. It was even credited as the co-author of a scientific paper with two other mathematicians.

The late 1950s and early 1960s also saw the development of artificial neural networks in learning and pattern recognition. Perhaps the best example of the work done in this area is Frank Rosenblatt’s “Perceptron” algorithm, which was able to recognize images and learn the difference between geometric shapes.

These exciting inventions and breakthroughs in the middle of the 20th century created huge enthusiasm for AI as an emerging field of study. At the time, these early achievements were so impressive they seemed scarcely believable. Shortly afterwards, AI laboratories started to appear all over the world thanks to generous funding from governments and corporations.

The AI Winter

Inflated expectations and stagnation of the 1970s and 1980s.

The early achievements of AI in the 1950s and 1960s resulted in a wave of overconfidence and unrealistic expectations. Prominent figures in the field of AI were understandably excited by what they’d already seen and expected similarly impressive breakthroughs to continue at the same rate.

The temptation to exaggerate proved irresistible. For example, in the early 1970s, several pioneers in the field of AI predicted that machines would achieve the general intelligence of an average human performing everyday tasks within 10 years. It's now 50 years since those predictions were made and they are still very far from being met.

The gap between those ambitious predictions and the underwhelming achievements of the 1970s and 1980s resulted in disappointment and a lack of interest. Major sources of funding started to dry up, causing a steep decline in research activities, both scientifically and commercially. The field of AI began to stagnate and would not properly revive until the 1990s. This period is known as the AI winter, a term borrowed from the ongoing Cold War and the idea of a nuclear winter.

Why were expectations so unreasonably high when it came to the potential of AI? First of all, the early success of AI with toy problems gave engineers the wrong impression of its capabilities. It was generally thought that scaling AI up to tackle more general problems was simply a matter of provisioning larger memory and greater computational resources. Too much focus was placed on the potential of AI, and not enough thought was given to the underappreciated complexity of many problems.

Even then, most experts tended to underestimate the amount of hardware, memory, data and computational power their systems actually needed. They knew they were short of these resources, but they had little idea just how far short they were. Moreover, early AI was still too focused on mimicking human reasoning through rules-based approaches, when it should have been analyzing the requirements for specific tasks and learning to complete them through the experience of trial and error.

The AI winter was briefly interrupted in the early 1980s by a short boom, due to the rise of commercial "expert systems". These systems are AI computer programs that try to mimic human expertise in specific areas, instead of trying to achieve a more general level of intelligence. This was another form of symbolic AI that used domain-specific knowledge and rules-based reasoning instead of learning through experience.

Expert systems became enormously popular and were deployed at scale to perform all kinds of daily business activities, medical diagnoses, computer hardware configuration, and natural language processing. The success of expert systems reawakened interest in the field of AI and resulted in renewed investment from governments.

Japan’s Fifth Generation Computer Systems was one such initiative to come out of this period. Its goal was to build powerful computer systems capable of finally realizing the early promise of the 1950s and 1960s, and it continued until 1990. Unfortunately it largely failed in its ambitions, and another period of hype and unrealistic expectations ended in disappointment.

The Rebirth

The resurgence of machine learning and neural networks in the late 1980s and 1990s.

The limitations of classical AI, with its rules-based approaches and symbolic reasoning, led to a shift in focus in the late 1980s. A better understanding of human intelligence and the complexity involved in decision making encouraged AI experts to look more closely at learning-based and probabilistic approaches.

Humans acquire intelligence by following instructions and performing deductive reasoning, but they also learn through experience and the repetition of trial and error. Why shouldn’t the same be true for intelligent machines? This is the question AI engineers started to ask themselves, and soon machines were no longer being limited to formal reasoning. Instead, they began to be equipped with the capabilities needed to learn via examples.

Machine learning, a term already coined by Arthur Samuel decades earlier, was back in a big way. This new paradigm, beginning in the late 1980s, would use statistics and probabilities to enable machines to learn from available data and adapt by building on previous experiences. Soon, AI was connecting with other mature and rigorous disciplines like decision theory, statistics, control theory and optimization.

This led to major advances in the fields of speech recognition, natural language processing, robotics, and computer vision. It was a fresh approach that yielded a new wave of impressive results, as well as a better theoretical understanding of some core concepts in AI.

Just as in the 1950s and 1960s, individual achievements and innovations started to build momentum – and more progress. Soon there was renewed interest in artificial neural networks, which represent a group of machine learning models inspired from biological neurons and introduced in the mid 20th century. This was due to the reinvention of one of the key algorithms used to train such networks – the “backpropagation” algorithm.

Following this, a new model of artificial neural networks called convolutional neural networks was invented by Yann Le Cun.⁷ This model showed great success in optical character recognition (the automatic identification of letters and numbers in images of typed or handwritten texts) and was deployed in several industrial applications – mainly automatic mail sorting by postal services.

Checkmate by a Chess AI

Remember Arthur Samuel’s algorithm that was capable of challenging amateur players at checkers in the 1950s?

Well, to give some idea of the scale of AI’s progress in the 20th century, it is worth noting here that the 1990s saw the defeat of world chess champion, Garry Kasparov, by IBM’s Deep Blue computer. This was a major milestone in the development of AI and it was achieved with classical symbolic AI, rather than the actual “deep learning” which emerged in the 21st century. The name of IBM’s machine was purely coincidental.

G. Kasparov, by then world chess champion, struggling and eventually losing against the a machine from IBM, called Deep Blue.

The Revolution

Big data and deep learning of the 21st century.

The creation of the World Wide Web and the development of the telecommunications sector facilitated the transmission and storage of data at scale during the 2000s. These developments gave neural networks and deep learning algorithms the fuel they needed to start making significant advances: big data.

The current hype surrounding AI is largely attributable to the unprecedented advances in deep learning that have been made possible by big data, and machines are now outperforming humans at complex tasks with increasing regularity.

Deep learning really starts to take off in the early 2010s, thanks to the work of people like Yoshua Bengio, Geoffrey Hinton, and Yann Le Cun.⁸ An abundance of data, advancements in learning algorithms, and increases in computational power have given pioneers like them the opportunity to make truly mind-blowing achievements in speech recognition, natural language processing, visual recognition, and reinforcement learning.

Perhaps the best example is the AlphaGo computer program, developed by the AI company DeepMind. Go is an infinitely more complicated game than chess, yet this program was able to learn the game from scratch and within months beat world champion Lee Sedol in 2016. Going from Arthur Samuel’s basic checkers algorithm to Deep Blue defeating Kasparov took 40 years. From that point to AlphaGo emerging victorious took roughly half that time.

The relatively short history of AI has valuable lessons to teach us about controlling hype and managing unrealistic expectations. We need to channel our enthusiasm for this fascinating technology in ways that will produce continuous advances, rather than developmental winters and summers.

And we will do this by coming to a fuller appreciation of both the complexity of human intelligence and the limitations of AI.

These Three Laws of Robotics were introduced by writer Asimov in 1950. They are intended to protect human from robots. First law, for instance, is “A robot may not injure a human being or, through inaction, allow a human being to come to harm”. If you’re interested, you can read the two other laws on the dedicated Wikipedia page. ↩
J.A.R.V.I.S is a sci-fi version of a general AI, one which can work at eye level with humans. We will introduce the concepts of narrow and general AI in more detail in AI - two letters many meanings. ↩
Several examples are described and illustrated in this blog. ↩
Want to see how it looked like? Have a look at a copper engraving explaining how the Turk looked like here. ↩
You may find a picture of a part of the Analytical Engine here. ↩
Information theory is the scientific study of the quantification, storage, and communication of information. Read more about it here on Wikipedia. A famous application of information theory is data compression: formats like ZIP, MP3 and JPEG were introduced to store as much information as possible while requiring the least memory possible. ↩
Yann Le Cun is a researcher in AI, well known for developing convolutional neural networks. He is the head of AI at Facebook. ↩
The Association for Computing Machinery (ACM) named Yoshua Bengio, Geoffrey Hinton, and Yann Le Cun recipients of the 2018 ACM A.M. Turing Award (recognized as a Nobel prize equivalent for computer science). Read more about it here. ↩