Infomation Theory and Optimal Intelligence Modeling

Information Theory and Optimal Intelligence Modeling

by Martin Turon

turon@alumni.eecs.berkeley.edu

Abstract. Artificial Intelligence is a wide field of study whose breadth ranges from genetic annealing algorithms to pure philosophy. This paper focuses on the problem of programming a machine to directly translate human language text commands into a logic process the user would expect. The translation from human to machine language will always have some measurable level of uncertainty as guaranteed by Godel's Incompleteness of Mathematics theorem. The uncertainty of the translation can be made smaller by adding complexity to and increasing the effectiveness of the logic that matches the human language input to the computer meaning and action. Although limitations in mathematical systems outlined by Godel may prove it is impossible to ever solve all the problems associated with human language processing, there is no proof that a particular problem of human language processing cannot be solved by a universal machine of high enough complexity and as such this paper distinguishes a set of optimal techniques for programming a computer to effectively enact the communication process of human language.

1.0 Introduction

For all the findings and theoretical advances to come out of Artificial Intelligence research since its inception, very few of the worlds computer systems actually employ the known techniques. Most computers today do not even attempt to accept human language as input. It has been over 40 years since Alan Turing defined the Imitation Game as a way to test the intelligence of a universal machine and pitifully few computers can understand "Please copy that file I was just editing to this floppy." The goal of this paper is to clarify a set of optimal techniques for programming a computer to effectively enact the communication process of human language.

1.1 Know your limits.

The difficulty of designing a machine to pass the Turing Test is well known. In fact, limits in logic systems prove it is impossible to ever design a machine that could pass perfectly. The proof comes in the form of a paradox -- if a machine exists that understands all human language, the design of that machine will define a way for humans to amend their language so the machine will no longer understand. Such a paradox was used by Godel when he first proved mathematics itself is incomplete.

A major question for Computer Science has been whether P=NP? P is the set of problems with a deterministic solution, that are solvable by a Turing machine or describable by a computable number. NP is the set of problems that can be solved by a nondeterministic brain in polynomial (finite) time. Clearly the Turing Test is an extremely difficult example of an NP problem, and we have no way to a guarantee that a computer will ever pass or never pass. Upon closer examination, the question of whether P=NP? seems closely related to Godel's Theorem. Any Turing machine that seems to solve an NP problem fairly well, defines a new NP problem it will never be able to solve. Such conclusions lead one to believe that computers fundamentally lack the ability to handle any situation the universe might present them and therefore aren't adaptable enough mentally to be considered to have true intelligence.

How hard are NP problems anyway? Characteristic NP complete problems take O(2^n) time if attempted to be solved with brute force. Given enough complexity, any NP problem is likely to have a reasonable deterministic solution, but never a perfect one. A brute force attempt to solve a problem of NP difficulty such as human language understanding would down the internet in an attempt to analyze as much human language as possible, and would not perform well for eons.

The exponential nature of NP problems perhaps forgives the perplexing inadequacy of today's computers and the predominantly theoretical research in AI that leaves so much as an implementation exercise for multiple man year engineering teams. The large AI software projects that do exist don't actually learn, but are painstakingly taught by their programmers.

The experiences of Doug Lenat may pay testament to this. His Automatic Mathematician (AM) learned addition, multiplication, and even prime numbers, but always stopped working after a few hours. AM learned by modifying itself, and would invariably modify something important out of existence.

1.2 What is intelligence?

The main role of the human brain is to construct a representation of our environment that approaches absolute truth as closely as possible. Having a consistent and correct representation of the world around us maximizes our ability to survive, and therefore evolved naturally. An intelligent mind is able to conceive of logical explanations for the various phenomenon it encounters and can even output information in hopes to push the chaos towards states that are in the mind's best survival interest. Logical explanations of random information can be characterized as Turing machines and therefore modeled on a computer. Turing machines have limitations however due to Godel's theorem. Any machine can be described by a computable number. Any computable number defines a new definable number which is uncomputable by the machine the computable number describes. Turing insisted that a sufficiently complex computable number may one day be able to mimic or imitate a human conversationalist and could thus be considered intelligent. Crockett equates the Turing test for intelligence to finding a solution to the frame problem, a puzzle of NP difficulty.

Threshold.

I propose a measure of intelligence as the ability to generate the minimum computable number that defines a truth function for a given set of input. Such a definition would allow true intelligence machines to compete with computable numbers in a gamut of intelligence contests. The machine that can create a computable number that is most often true for the given input wins. If the input is simple enough that an absolute truth function can be found, then the machine that outputs the smallest computable number (the most elegant solution) wins.

Alan Turing proposed a test called the Imitation Game to answer this question. It turns out that the Turing problem is of NP difficulty and that computers could ultimately guise themselves really well as human conversationalist, but due to Godel's paradox, could never fully describe all human speakers. The problem goes both ways, humans could pass a reverse test, though they could probably get quite good is they so choose. Godel proved that any one system of mathematics or logical solution to a problem cannot solve all possible problems.

The goal of the Artificial Intelligence research community is to model the brain's ability to build world representations as closely as possible.

1.3 Does the incompleteness of mathematics define true intelligence?

Lucas claimed that intelligence would not be limited by Godel's Theorem and declined the notion that a computable number could ever be considered intelligent. I will call Lucas' definition that of true intelligence and note that true intelligence cannot be described logically. Solving the frame problem or passing the Turing Test, though a sign of high intelligence, is not proof of true intelligence. I believe intelligence is a fuzzy concept that can be measured only relative to a given context. In a given situation, a computable number may perform more intelligently than a true intelligence that lacks experience. This is the current situation today with chess computers. All humans possess true intelligence in the Lucas definition, but few can beat the computable numbers that specialize in chess play.

Would a deterministic, synchronous, finite-state machine that grew up in a world where everyone thought the world was flat suddenly set sail to prove otherwise? There is something about intuition that seems very fundamental in life as a spirit. Computers have what is called derivative intentionality. The programming they use so heavily depends on the purposes of the programmers, they are blocked from significant learning by the epistemologically complex frame problem. Computable numbers are like a captured train of thought unable to conceive an original thought.

I will not go into how to architect a true intelligence machine here. Godel proved a true intelligence could not be made from a synchronous finite state automaton. A true intelligence therefore would have to be an asynchronous parallel architecture. The belief network that represented the knowledge would have to update all the truth nodes instantaneously when one node changed - similar to the way chemical equilibrium happens. Needless to say, such systems would be vastly more difficult to engineer than today's computer because logic and math alone could not be used to guarantee their proper operation.

As for the philosophical questions and sociological impact of intelligent machines, such issues are usually left in the hands of science fiction writers.

Thankfully, the question is forever held open by paradox as was exemplified by Godel. It turns out that this central question of Artificial Intelligence is equivalent to asking whether our system of mathematics thinks.

Careful study of some of the modern mathematics that surrounds the study of information reveals many fundamental differences in what is possible by human minds and computers. Such a question is both central to Artificial Intelligence and a good reason to start thinking about how long it would actually take to create a Turing Machine that could pass the Turing Test.

For the rest of this paper, I will outline how a minimum computable number could be calculated that would best perform on a Turing Test -- the classical problem of human language understanding. The hope is that such a study will reveal general principles of information theory as they apply to human language, and will provide an architecture for a useful enhancement to current human-machine interfaces.

1.4 Information Theory

Returning to the task at hand, programming a computer to understand human language, let us study English from the eyes of information theory. An observed sentence is either true or false, but not both. A sentence is atomic if only one fact is conveyed. If future sentence contradicts a past sentence, both need to be reassessed for validity. The observer decides whether a sentence is true or false based on their internal belief network. The energy value of a particular statement depends on how much information it takes to represent it in the observer's mind and decreases each time it is repeated. A mind absorbs and assesses truth for all information it experiences and eventually compresses the internal representation of experiences with a certain amount of loss. The result of the fact function can be represented in various levels of complexity:

  Simplest           1 bit (hard true/false)   

  Continuous         Single probability infinite real 
                            between 0 and 1 (0.82347...) 

                                    /false   /contradiction
  Two Dimensional    Evidence Space |       ` 
                                    | 
                                    |________/true 
                                no info

observed information = unit of information = computable number = single state of mind

Such machines can be described with a sequence of logical statements that act on an stream of input information and outputs a decision. The logical statements that describe the machine can itself be characterized as a sequence of 0's and 1's that describe a computable number. Specifically, the computable number defines a mechanism to write a function that will create a infinite set of bits.

1.5 The Ideal Human Language Agent

An agent is a software program which interacts with a user in a human-like way. Specifically, an ideal agent would embody the following traits:

Autonomous utility. An agent independently performs tasks which are difficult or tedious for a human.
Social Ability. The agent interacts with the user and other agents using plain English as much as possible.
Awareness. The agent maintains a relativistic context of its conversation with the user by remembering past interactions, and adapts its behavior to the user's preferences.
Initiative. The agent acts on its perception of the user's goals and tries to complete tasks without having to be asked.

1.6 Generalized Agent Architecture

The design for the Belief Processor can be broken down into the following modules:

Pattern Matcher. This matches the text string typed by the user with a pattern from the database. The pattern is a regular expression which is described as a character string. The pattern matcher could eventually be implemented with a generalized pattern matching generator and then it could be described in terms of a BNF grammar.
A simple interpreter. The interpreter implements the simplest possible computer, namely a finite state machine. The machine is also described by the database. In each of its states, it can make one of several possible transitions to other states. Each transition has a pattern associated with it. If that pattern matches the input, the FSM
1. goes to the state specified by the transition and
2. optionally executes a command.
Code for the various commands. In this case, there are very few commands, only
1. Starting up
2. Stopping the program
3. Printing some text
4. Setting a new user name and
5. Saving a message.
The databases. These contain the patterns that the agent should recognize, messages it should output and a description of its finite state machine.

2.0 Meaning Extraction

The first step for any intelligent agent is to extract some kind of meaning from its input. This involves identifying the relevant stuff within a user command or utterance and converting it to a standard data structure that the agent can use and pass around. When the realm of an agents interaction is limited, many programming tricks and optimizations can be employed to make it seem as though an agent understands certain phrases and commands. When the scope of the agent's tasks is broadened however, the flaws of such architectures becomes readily apparent. Fortunately, the more complex solution that solves the flaws of a simpler system is often based on the simpler technique. Below is a overview of modern meaning extraction techniques from the simplest and most specific to the most complex and general.

2.1 Syntax

Syntactic investigation of a given language has as its goal the construction of a grammer that can be viewed as a device of some sort for producing or extracting meaning from the sentences of the language under analysis.

2.1.1 Regular Expressions

Regular expressions are quite powerful, and matching them is fast. Some very complex agents use this scheme and perform extremely well. Using the special characters . which matches anything, * which is an operator which matches repeats of a pattern to its left any number of times, \| which allows OR-ing of patterns, and \(and \) which allow grouping, a question about directions to Berkeley can be stated as the regular expression: .*how.*\get to\.*Berkeley.*\?

The property to note about regular expressions is that because they are static, in order to use them as conditions in a rule-based system quite a bit of redundancy is involved. For example, a two condition help agent that tells the user how to rename or delete files could use the following regular expressions:

    .*how.*\(remove\|delete\|blowaway\|nuke\).*file.*unix.*\?\  | 
    .*unix.*how.*\(remove\|delete\|blow away\|nuke\).*file.*\?\ |
    .*what.*command.*to*\(remove\|delete\|blow away\|nuke\).*file.*unix.*\?
        { // Action: respond with help on rm command } 

    .*how.*\(rename\|move\).*file.*unix.*\?   |
    .*unix.*how.*\( rename\|move\).*file.*\?\ | 
    .*what.*command.*to*\( rename\|move\).*file.*unix.*\?
        { // Action: respond with help on mv command }

2.1.2 Chomsky [(,F] grammars

The pattern matcher could be implemented quite efficiently using a finite state language to recognize possible user command strings. Such a solution would be potentially limiting however due to the known theorem: Every finite state language is a terminal language, but there are terminal languages which are not finite state languages. Instead, the Belief Processor will interpret command strings with a parser generated out of a grammatical description of possible requests. For example, one entry in the pattern database could be:

     (: #DirectionsQuestion# (NP) { 
            // 1) Process the NP sematic tree to figure out 
	    //    where user wants to go. 
            // 2) Search the knowledge database, 
	    //    and construct an answer. [2.3] 
     } F: DirectionsQuestion --> How do I get to NP | 
                                    Can you tell me how to get to NP

2.2 Semantics

The semantic phase is when the actual meaning of a sentence is extracted. First we have to know what kinds of things "meanings" are. There is no easy or uncontroversial answer to this, and indeed the question has vexed linguists and psychologists for a century and philosophers for millennia.

2.2.1 Semantic Trees

The main job of a grammar is to establish a systemized means of constructing semantic trees that categorize the individual words of a sentence and relate them to one another.

2.2.2 Grammar Design

The above example demonstrates some of the complexity in grammar design. The grammar above is ambiguous, and therefore of limited use in meaning extraction. If we however modify the grammar rules, we can create a grammar that is unambiguous and will always build a semantic tree that suits our purposes.

2.2.3 Extracting Meaning from Semantic Trees

Once a user request has been parsed, the next step is to analyze the semantics in order to extract what in particular the user needs. In the example above, this involves converting the semantic tree of the NP argument to DirectionsQuestion into a set of database queries that will isolate an answer to the user's question. The database queries will be sent through a belief network to find the destination the user is most likely is referring to, and then an answer will be generated and output as to how to get the desired destination.

2.3 Pragmatics

Pragmatics is concerned with the relation of utterance to the contexts in which they are uttered, including the identity of the speaker and hearer, their interests and intentions, the state of their knowledge, their physical surroundings, the business at hand, and so on. Whatever the literal meaning of a sentence might be, it is frequently the case that it acquires additional meaning, and sometimes a wholly different meaning, by virtue of the context in which it is produced.

2.3.1 Belief-Optimized Semantic Trees

     NP3 --> NPa Prep NPb { 
         if (NPa.builtFromRule(NP3)) { 
	    // check belief network for more likely of two options: 
            if (x contains z) { 
	         NPa = NPab; 
		 NPab = this; 
		 top = NPa; // rearrange tree. 
            } 
	    if (y contains z) { 
	        // continue with assumption. 
	    } 
        } 
    }

3.0 Acting on Knowledge

An agent built with the simple Meaning Extraction and Response Generation techniques outlined above is already quite useful. Given enough knowledge it can provides a good mechanism for interfacing with a variety of useful data, but it is very limited because it has no memory. If asked "how do I do X" the agent responds appropriately with "use command Y", but if then asked "what options does it take?", it will fail to make sense of the question because it has no memory of what the last question was.

3.1 Rule-Based Knowledge

A basic rule-based production system has three major components:

a database of working facts called working memory.
a set of production rules called a rulebase that operate on those facts
a production system interpreter that decides which rule to use in a given state.

3.2 Flexibility

The condition/action structure of rules is both simple and powerful. Using one construct, many different mental processes can be modeled through software:

3.2.1 Knowledge Acquisition

Immediately after meaning extraction, the most common action is to store the new knowledge in the agent's database.

3.2.2 Response Generation

A very common action type is the response. The agent builds a report or an English sentence for the user as an intelligent response to there request. An example of this type of action would be very appropriate after meaning extraction as shown in [2.3.2].

3.2.3 Middleware

The final type of action is a specific program that performs a task the user could likely do themselves, but would rather have the agent do for them. This program could be any specific action be it a query to a legacy database, copying or converting a file, or sending a mail message. The point is, that the realm of possible actions potentially extends to any and all software that has ever been written.

3.3 Modularity

Most computer programs would fail miserably if one were to randomly remove a function or two. In contrast, the set of rules in a rulebase are independent, self-contained chunks of information, any one of which can be altered or replaced without disabling or requiring modification to other rules. Systems built around such rulebases enjoy a robustness and modularity that is unobtainable in many other software architectures.

3.4 Plausibility

asdf

3.4.1 Bayes Rule

A portion of the design of the Belief Processor uses Bayes Rule to make deductions based on the assumed validity of available knowledge. All knowledge stored by the agent has an associated probability as to whether or not that particular piece of information should be relied on as fact. Bayes Rule provides a mechanism to make deductions about newly introduced knowledge based on past beliefs.

                     P(E|Hi)*P(Hi)
     P(Hi|E) = ---------------------------
                     P(E|Hk)*P(Hk)

3.4.2 Belief Network

There is a good article on java at http:///... Please find me any good java articles in the news today. The system could use Bayes Rule in its action to decide whether articles fall within a certain class based on keywords it contains.

4.0 Evolving Action Code

WAEra sdf

5.0 Conclusion

References

N. Chomsky. Syntactic Structures. The MIT Press, 1957.
J. Canny. Data Structures. Class Notes, 1992.
L. Crockett. The Turing Test and the Frame Problem. Ablex Publishing, 1994.
E. Dougherty, C. Giardina. Mathematical Methods for AI. Prentice Hall, 1988.
D. Gillies. Artificial Intelligence and Scientific Method. Oxford University Press, 1996.
A. Hodges. Alan Turing: The Enigma. Simon & Schuster, 1983.
D. Hofstadter. Godel, Escher, Bach. Vintage Books, 1979.
H. Lieberman. Intelligent Interface Agents. MIT Media Lab.
W. Makovoz. Applications of Intelligent Agents. Lecture Notes, 1996
M. Minsky. The Society of Mind. Simon & Schuster, 1985.
S. Pinker. The Language Instinct. William Morrow and Company, 1994.
D. Popovic, V. Bhatkar. Methods and Tools for Applied AI. Marcel Dekker, 1994.
M. Sharples, D. Hogg, C. Hutchinson, S. Torrance, and D. Young.
Computers and Thought. A Practical Introduction to Artificial Intelligence. The MIT Press, 1989.
M. Wooldrigde, N. Jennings. Intelligent Agents -- A Survey.