A brief pre-history of classical AI

To talk about Reasoning, it’s important to understand how we got here. This article covers what I call the pre-history of Classical AI — those parts of the story that happened before the invention of modern computers (pre 1950s) but are crucial to understanding why we believe that AI is possible.

This is part 2 in a series on Reasoning.

Like most things, the very beginnings of classical AI is rooted in philosophy, and starts in the ancient world (the Greeks, Indians, and Chinese all had some early forms of logic). But, as I’m not a masochist we start in more contemporary times with two big ideas that lay the foundation for modern AI:

Mechanizing intelligence — Is there a trustworthy, automatic process that takes in some information and spits out new information? Trustworthy, in this context, means that the information produced is supported by the information taken in. Automatic means that it works without human intervention. Of course, simply discovering new information may seem like a very limited definition of intelligence but it reflects AI’s mathematical origins where it was these techniques were first applied to topics like automatic theorem proving.
Computability — Is it possible to implement this automatic process on a computer?

Propositional Logic

The development of logic was humanity’s first great attempt at mechanizing intelligence, and the basis for modern logic lies with George Boole, Charles Pierce, and Gottlob Frege. George Boole, not being satisfied with laying the basis for modern computers (hello Boolean Algebra), also nearly scuttled my fledgling graduate career (I flunked my PHIL 650 Intro to Logic course) by inventing Propositional Logic. Propositional logic represents the current state of the world by a set of facts or propositions. These propositions can be combined using boolean operators to form formulas. For example: the cat is to the left of the dog, or Whiskers is a cat and Fido is a dog and Whiskers is sitting to the left of Fido.

In addition to the current state, propositional logic also has knowledge about the world stored in the form of if-then rules. if Whiskers is a cat then Whiskers has four feet and Whiskers has a tail, or if Whiskers is a cat and Fido is a dog then Fido is afraid of Whiskers

New facts are derived from currently known facts using a very important deductive tool called Modus Ponens which says that when you have a rule like so: If p then q and the fact p is true then the fact q is true.

Modus Ponens might seem trivial but it guarantees that when knowledge is represented in the form of trustworthy rules, new information generated from these rules using Modus Ponens is also trustworthy.

The mechanization process, called rule-matching, applies rules to facts. As an example, consider a situation where there is only one rule: If the cat is to the left of the dog and the dog is to the left of the human then the cat is to the left of the human. Now imagine that there are two facts you know to be true in this situation: There is a cat to the left of the dog and There is a dog to the left of the human The rule-matching process will match these two pieces of information to the rule and create a third fact: There is a cat to the left of the human.

In our example, there is only one rule that matches and it matches exactly once. But imagine having one additional fact: The human is to the left of the lamp and one additional rule: If the cat is to the left of the human and the human to the left of the lamp then the cat is to the left of the lamp Now, once that first rule has matched and the fact the cat is to the left of the human is added, our new rule will match and add a fourth fact: There is a cat to the left of the lamp.

Applying rules to the current situation creates new information. This new information triggers another round of matching (with a possibly different set of rules) that creates additional new information. This repetitive process is how propositional logic automates the finding new information.

Predicate (First-order) Logic Now we shouldn’t let George hog all the credit for the near untimely demise of my Ph.D. Charles Pierce and Gottlob Frege are just as important to this story because they invented Predicate or First-order Logic. Take the cat-leftof-dog-leftof-human example. That is not just true for cats, dogs, and humans. It’s true for any three things. With propositional logic you have to write rules for every set of three you could think of… dog to the left of cat to the left of human, human to the left of the cat to the left of the dog, cat-cat-dog, cat-human-capybara, human-tomato-birkenstocks… you get the idea. There are workarounds to represent information more succinctly but, for the most part, representing facts this way is very tedious and not very useful.

With predicate logic, you can use variables to represent facts. Instead of writing: If the cat is to the left of the dog and the dog is to the left of the human, then the cat is to the left of the human you can write: For all A,B, and C: If A is to the left of B and B is to the left of C, then A is to the left of C which covers a wide range of scenarios involving animals, humans, and inanimate objects, all sitting patiently next to each other. The “For all” is called a quantifier and allows predicate logic to represent variables that can stand in for all objects. There is one additional quantifier — “There exists” — that allows predicate logic to represent variables that stand in for exactly one thing. For example: There exists x: such that x is a person AND x is the president of the United States

Predicate (and propositional) logic can be written in a shorter, easier-to-read form: LeftOf(A,B) & LeftOf(B,C) → LeftOf(A,C). Here, LeftOf is called a predicate, and A and B are the variables that stand in for propositions. Predicate logic uses the same vocabulary of propositional logic — propositions, the boolean operators, an inference rule (usually Modus Ponens) — with the addition of variables and the two quantifiers. These additions make predicate logic much more powerful at representing rules but comes at the cost of making it more complicated to run (due to the need to match variables).

There is also second-order predicate logic where you can even use a variable for LeftOf because what applies to Left also applies to other similar predicates. So, instead of LeftOf(A,B) & LeftOf(B,C) → LeftOf(A,C), we can write D(A,B) & D(B,C) → D(A,C). Where D could be left, right, top, bottom, in front, behind and so on. Second-order logics make rule-matching much harder and I’ve never seen one used in the wild but I flunked my Intro to Logic course so what do I know.

From propositional and predicate logic, we can distill that mechanizing intelligence means three requirements:

A way to represent your knowledge (Rule for e.g.)
A way to represent the current situation (propositions or predicates)
A way to automatically keep applying knowledge (from 1) to the current situation (2) to generate/infer/deduce trustworthy new information. (Modus Ponens and rule-matching)

These requirements are not particularly onerous. Neural networks satisfy most of these conditions — The weight matrix represents knowledge, the input matrix is the current sitch, and forward pass is the way to apply knowledge to discover new information. The only place where neural networks fail is that they do not keep applying the process to new information (recurrent networks being the exception). Neural networks are not logical systems obviously. A real logical system has stricter requirements and definitions which we will encounter later. Our current purpose is to introduce the idea of mechanizing intelligence, i.e., identifying a process where intelligence (narrowly defined as discovering new information) is automatic.

A digression into soundness and completeness.

What makes propositional and predicate (and other forms of) logics interesting is not simply what you can do with them but what you can prove about them; particularly three properties called soundness, completeness, and decidability. If you don’t already know about these concepts then you should pay close attention because it’s probably the coolest thing you will read today:

A logical system is said to be sound if any information derived from that system is true. In a sound logical system, if you start off with a set of facts and run the logic machine, all information that is produced is logically provable, i.e., you can trust the machine to produce only correct information.

Completeness is sort of the opposite side of the coin. A logical system (initial facts + process) is said to be complete if all information that is true in that system can be derived from the starting set of facts using the logical process, i.e., if there is new information that logically follows what the system knows, the system will eventually find it.

Together, soundness and completeness allow for a logical system to be dependable because you can be sure that everything that comes out of it is valid and that it will find anything that is logically valid. To be truly useful, logical systems also have to be decidable. That is, there needs to be an effective method (a method that runs in a finite though possibly very long amount of time) for determining if a statement is true or false in that logical system.

Propositional logics are sound, complete, and decidable. First-order predicate logics are sound and complete but not decidable for non-trivial systems. (P. S. You can have a 5 credit graduate-level course that is just about proving the above statements)

Together soundness, completeness, and decidability confer respectability on our choice of mechanistic process because it gives us theoretical assurances about its effectiveness.

Anyway, there are three takeaways here for anyone still reading.

Boole, Pierce, and Frege laid the basis for logic, reasoning, and artificial intelligence
If you are in computer science, take the Intro to Logic course offered by the Math department not the Philosophy department. You’ll thank me later.
Suck it Boole, Pierce, and Frege. I got my Ph.D.

One final note about logic. Logic can trip anyone up. Bertrand Russell spent an entire lifetime trying to build a logical understanding for mathematics. He wanted to start from a few assumptions (four or five) and show that all of mathematics can be derived from these assumptions using the machinery of logic. His attempt was ultimately futile just like my attempt to pass my Intro to Logic course. He did manage to accomplish some other stuff while he was failing at finding a logical foundation of mathematics — wrote extensively on a broad range of subjects, created entire new disciplines, revolutionized the understanding of mathematics and logic, won a Nobel prize. What a slacker. Jokes aside, Russell is one of the titans of philosophy and liberalism of the 20th Century. He saw through the veneer of Bolshevism, made exceptions to his pacifist beliefs with the rise of Hitler, chaired the India league, and went to jail at age 89 for taking part in an anti-nuclear demonstration. Truly, someone worth learning from.

The Church-Turing Thesis

Logical Systems (propositional and predicate logic) revolutionized our understanding of how a mechanistic process can produce new information. The Church-Turing thesis convinces us that it is possible to build a machine that actually implements such processes.

Alonzo Church, an American mathematician, invented a system of mathematical logic called the Lambda-calculus as a framework for solving math problems. The expressive power of Lambda calculus allows it to represent a whole variety of logics. Alan Turing (he of the Turing Test fame) is, of course, the inventor of the Turing machine, a machine for computation. Church and Turing realized that anything represented using the Lambda calculus can be calculated using a Turing machine and vice versa. And since the Lambda calculus covers a wide range of processes including first-order logic, the Church-Turing thesis allows for the possibility of computational machines that can demonstrate inference and intelligence.

The Church-Turing hypothesis is also the reason why we refer to our brains as computers. It’s not because our brains and computers are similar in the way they compute. We refer to brains as computers because:

to the best of our knowledge, the brain is implementing a computable function,
a Turing machine can implement any computable function, and
computers are Turing machines.

Ergo the brains-as-computers analogy.

We don’t know what that computable function is or how it is implemented. But this distinction between computable function and its implementation is reflected in the divide between Classical AI and probabilistic counterparts like Neural Networks. Classical AI has a good story on what this computable function looks like but they are less convincing in their implementation. Neural networks have a really good story about what implementations can look like but are less convincing about exactly what computable function they are implementing.

Finally, it is quite possible that there are many different but equivalent such computable functions and many different ways to implement them. The history of AI is the story of trying to find one such function and implementation (while the history of Cognitive Science is of trying to identify the specific computable function and implementation that underlies human intelligence).

What’s next in this series?

I originally intended this article to be a brief history of classical AI but I soon realized that I’d have to break it up into two articles — this one that addresses the pre-history and the next one in the series that will focus on developments since the 1950s.

Previous articles in the series Six easy and not-so-easy AI pieces Part 1: What is Reasoning?

Disclaimer: The views expressed in this article are my own and do not necessarily represent the views of my employer.