Coauthored with Ted Goertzel

I remember heading home from college for spring break in 1983, toward the end of my freshman year. I’d just recently turned 16, and I’d been thinking about AI a hell of a lot – even more than about my new girlfriend, Rachel Gordon, whom I was pretty darn crazy about at the time. A few days before spring break I’d tried to explain my theories on artificial intelligence to my friend Ken Silverman. Ken couldn't understand what I was talking about, so I promised him I'd work on it over spring break, and that when I got back to school I’d explain how it all worked, I’d give him a complete design for a thinking computer program. I had the idea clear in my head, but I was totally unable to articulate it in a way that Ken or anyone else could understand. I spent the whole break working on it, and during those few days I basically worked out the ideas that I’d later put in my first book, The Structure of ntelligence, six years later. I went through every aspect of the mind - reason, memory, aesthetics, intuition, emotion, etc. - and convinced myself that every one could be expressed in terms of pattern recognition and pattern formation. The mind, I concluded, was a pattern recognition system that recognized patterns in the world around it, and – very crucially -- also recognized patterns in itself. Recognizing patterns in itself, it formed patterns within itself, continually giving rise to new structures.

After the break, I still wasn’t able to explain my realizations to Ken in a way that made sense to him, but at least things made a little more sense to me. I knew I had to find a mathematical language to make sense of my intuitions, or I’d never be able to communicate them to anyone, let alone program them on a computer. My grasp of software design at this stage was extremely weak; it was formed mainly by programming games in BASIC. I was nowhere near having the skills to design a general pattern recognition system that recognized patterns in itself and adjusted itself accordingly.

Ken's dad was an extremely smart guy and a prolific and successful inventor, mostly in the area of electrical engineering; and in our college years, Ken often fantasized about becoming a rich inventor and building a mansion, with a basement laboratory in which we’d putter around day and night, wiring together intelligent robots and time machines and so forth. So it’s pretty funny that 14 years later when I decided to start an AI company (Intelligenesis Corp., later renamed Webmind Inc.) I somehow happened to turn to Ken when I needed someone to take over the job of programming my AI system, the Webmind AI Engine.

I hadn't spoken to him for years – he’d stayed in the New York area, where he had grown up, whereas I’d moved all over the world, teaching in universities in Las Vegas, New Zealand and Australia. After getting his degree in electrical engineering, he’d done a lot of different things, including real estate and computer programming. He was really psyched to finally get the chance to collaborate with me on my thinking machine project. Finally, after a decade and a half, I had figured out how to express my plan for artificial intelligence in a way Ken could understand! Ken was the lead engineer at Webmind Inc. for its first couple years, and VP of Technology for the entire lifetime of the firm. Now I’m working with a different crew of engineers, and Ken is working on his own advanced pattern recognition software, but we’re still good friends, and he definitely played an important role in the evolution of my work.

To articulate my vision of the mind in a comprehensible form was much much harder than I’d ever thought it would be. It turned out that the vocabulary for expressing what I wanted to say didn’t really exist in the field of computer science. To find the language I needed to express my ideas and to work out the details, I had to step a long way back from the world of computers and get deeply into the philosophy of mind. Although I was very young then, and even more naïve than I am now, I realized intuitively that it was necessary to get the philosophy right before proceeding to the computational details. Now, I’m jaded by a fair amount of practical experience – though I don’t have a head full of gray hair yet – and I see this far more clearly than I did then. In implementing a general vision of how the mind works, it’s very easy to be misled by the nature of contemporary computer hardware and programming languages, and to wind up implementing things that subtly deviate from the vision one started out with. The way to avoid this is to have the conceptual, philosophical vision very firmly fixed in one’s mind as one sets about the detailed design work, which is huge and at times confusing.

What I’m going to give you in this chapter is a fairly sketchy, but hopefully evocative, overview of the process of creating Webmind and then Novamente. The two AI systems are very different on a technical level, but on the level of a popular exposition like this one, the differences are really pretty small. Novamente uses more sophisticated mathematics and more efficient software structures to implement the same basic concepts that Webmind did. To keep things from getting confusing, I’ll write mostly about “Novamente” here, except where I’m talking historically about the creation of Webmind in particular; but most of what I’ll say about Novamente also applies to Webmind.

The Novamente project is far from complete. Just like every other AI researcher, I’m an abject failure so far – I haven’t yet created a software program displaying human-level general intelligence. Unlike most other AI researchers, however, I and my colleagues honestly believe we are on a path that will lead us to success at this ambitious goal. I don’t expect to convince you one way or another in these pages – my hope is merely that the story of our quest may be an interesting one … and that some of the lessons we’ve learned along the way may be of general value.

I knew from the start that I didn’t want to build an artificial idiot savant – an overspecialized, brittle system as was typical in the AI field. I wanted to build a mind.

But what is a mind, anyway?

In that spring break, sophomore year, that I spent trying to figure out how to explain my vision of the mind to Ken, I arrived at a basic working definition of the mind: a mind is the set of patterns in an intelligent system.

Your mind is not your brain, nor is it some disembodied soul somehow exchanging messages in your brain. Your mind is the set of patterns in your brain – the structures and processes in your brain, so that knowing these structures and processes allows you to explain the brain more simply than just listing the parts of the brains and their positions and states over time.

Novamente’s mind is not the C++ code that my engineering team and I type in – that’s just a code for creating the mind, a little like DNA is the code for creating a human. Novamente’s mind is the set of patterns in the billions of 0’s and 1’s existing in RAM while Novamente runs, cycling through the machine’s processors and passing through the network cables. These 0’s and 1’s themselves are not Novamente’s mind – it’s the patterns in these 0’s and 1’s, the static and dynamic patterns, that are mind. Mind is a set of patterns in a system that achieves highly patterned goals in a highly patterned environment. Everything is pattern, pattern, pattern!

Mind recognizes and creates patterns in the world and itself, achieving complex goals, goals whose definition involves a great deal of pattern.

Although these ideas were clear to me intuitively in 1983, it wasn’t till 1990 or so that I was able to write them down in a clear and comprehensible way. This is what I did in the first few chapters my first book The Structure of Intelligence. At that point I had gotten my PhD in mathematics and was supposed to be doing mathematical research, but just as I’d always been more interested in my own reading and thinking than in my schoolwork, now I was spending my time thinking about pattern and mind and the nature of the universe, instead of proving math theorems like a good assistant professor. The next step was to ask the question: What are the principles by which a set of patterns, a mind, can actually be intelligent? For sure, the precise structures and dynamics are going to vary from one mind to the next, but are there any general principles, applicable to every kind of intelligent system, be it a human, a dolphin, a computer program, an intelligent gas cloud on Jupiter? It’s not totally obvious that there are such principles, but my belief starting out was that such general principles had to exist. What are the principles by which mind’s core algorithm - pattern recognition and formation in itself and the world -- is self-regulated?

One general principle is what the 19’th-century American philosopher Charles Peirce called the “One Law of Mind”: that things in the mind tend to spread attention to other related things in the mind. This is a basic principle for attention allocation, that we can see in the brain in the diffusion of electricity. Novamente incorporates this via activation spreading similar to that in a neural network. This is what I call a “heterarchical” principle – where a heterarchy just means a sprawling network in which each element connects to a few other elements, without a hierarchical structure. A random network in which each node connects to a set of other nodes at random is a heterarchy.

Hierarchy is another important structure of the mind. We see it in the human brain all over the place, most famously in the visual system, where we have a hierarchy of progressively more abstract processes, starting with rccognition of lines and edges, then shapes, then 3-D forms, and so forth. Hierarchy in the mind has to do with increasing abstraction, and with control that’s aligned with abstraction, so that processes dealing with more abstract things control related processes dealing with more concrete things.

A general principle that I’ve thought a lot about – and that I wrote about in my second book, The Evolving Mind -- is what I call the “dual network” – this refers to the interpenetration of hierarchy and heterarchy. In the mind, hierarchy and heterarchy overlap each other, and the dynamics of the mind is such that they have to work well together or the mind will be all screwed up. The overlap of hierarchy and heterarchy gives the mind a kind of “dynamic library card catalog” structure, in which topics are linked to other related topics heterarchically, and linked to more general or specific topics hierarchically. The creation of new subtopics or supertopics has to make sense heterarchically, meaning that the things in each topic grouping should have a lot of associative, heterarchical relations with each other. In Novamente, this general “dual network” principle is reflected in many ways, when one gets down into the details of its various dynamical processes.

Another general principle is self: that minds contain parts of themselves that mirror the whole. This gives a quasi-fractal structure to the mind.

Another general principle, also discovered by Charles Peirce, is that there are three kinds of reasoning: induction, abduction, and deduction. These are all ways of manipulating hierarchical relationships. Hierarchy is about logic, whereas heterarchy is about the spread of attention and the formation of wholes. Once heterarchy has lead to the formation of new wholes, corresponding to clusters of things that all relate to each other, then these new wholes can be dealt with hierarchically, they can be reasoned about. I was very fortunate, a month after Intelligenesis got our seed funding, to get a job application from Pei Wang, who had worked out a neat computational reasoning system (NARS) based on the three forms of reason that I, following Peirce, had identified as essential to the mind.

There are also two dynamics that I believe are generally part of mind. These correspond to the basic philosophical principles of Being and Becoming.

Becoming corresponds to evolution, considered most generally as the survival of the fittest members of a population, and the reproduction of the survivors to form new population elements. Novamente contains explicitly evolutionary components – variations on the computational technique called “genetic programming.” It also contains other components that aren’t traditionally viewed as evolutionary, but really are. For instance, Novamente’s reasoning module involves logical relations (we call them “links”) that combine with each other to create new logical relations. The facts “pigs are fat” and “fat creatures are ugly” combine to create the new relation “pigs are ugly.” And in the reasoning system, unimportant relations are deleted to save memory. Thus, we have survival of the fittest, where fitness means importance to the system, and we have reproduction of the survivors, via the rules of inference. Reasoning is seen to be a form of evolution, in the general sense.

Being corresponds to what system theorists call “autopoiesis” – an obscure word that has a very useful meaning. It means self-production. Every cell in the body is produced by other cells in the body – so the body is a self-producing system. The mind is also a self-producing system. This is basically the theme of my third book, Chaotic Logic. If you remove part of the mind, the other parts of the mind that relate to it will be able to reproduce it, approximately if not exactly. If you take out the logical relation “pigs are ugly” for example, the system may be able to regenerate it by inference from the other relations “pigs are fat” and “fat creatures are ugly.” It may come out with a different strength than it had before, but it will still be reproduced, perhaps lossily. If you take out all memory of the text “War and Peace” from the mind, but retain a lot of related knowledge, this related knowledge will cause the system to want to read War and Peace, which eventually will likely lead the information about the text to be regenerated. In this case, interaction with the environment is part of the mind’s autopoietic dynamics.

Evolution changes the system in accordance with its goals and its environment; autopoiesis keeps the system the same as it was before. The mind needs both of these forces; they need to be properly balanced. The balance of these leads to productive creativity, and this was the main theme of my fourth book, From Complexity to Creativity.

I arrived at my list of general principles of the mind by a kind of unholy combination of introspection, mathematical analysis, and survey of biology, psychology and computer science. I spent a long time trying to prove mathematically that all these general structures and dynamics, and a few others, were necessary and sufficient for mind – any system having them would have a mind, and any system not having them couldn’t have a mind. But eventually I gave up; I decided that the mathematics of today is not adequate for proving this kind of thing. I gathered my various insights and intuitions and conclusions about how the mind worked, and gave the list a name: the psynet model of mind. Psynet = “mind-network”, a theory of the mind as a network of interacting, intertransforming agents. I realized that the conceptual picture of the mind that I’d developed was of significant value in itself, apart from any mathematical formalization I might give it. No one else working in the AI field seemed to me to have a similarly comprehensive and powerful conceptual analysis of the mind. I still think inventing the needed mathematics to usefully and completely formalize the psynet model is an interesting challenge – but it’s not as interesting to me right now as using my intuitions about the general structures of intelligence to build thinking software.

The general structures and dynamics of the “psynet model” can be manifested in many many different ways, in different systems. The process of building Webmind, and then Novamente, has been in this sense a top-down process. I started out with an idea about what general principles had to emerge from the system to make it intelligent, and this placed a constraint on what the system had to be like. It had to be built so as to make the right general structures and dynamics emerge. Aside from that, I didn’t care very much exactly what the system was like. I had, and still have, an attitude of being willing to learn via experimentation in this regard.

My first serious attempt to build a real AI system (earlier chatbots and abortive experiments not counted) occurred in 1994. I used a programming language called Gofer, which I later benchmarked at 1/10,000 the speed of C (the standard programming language in the commercial world). Gofer was a beautiful language, which matched up nicely to my vision of the mind. This program was called Antimagicians; it was a population of actors called magicians, and antimagician actors that annihilated the magicians in complex patterns. Just about all it ever did was produce a type of error called a “stack overflow.” This was a shame, because my model of mind was very simple and compact in this programming language. But it could only run on one machine, and it ran incredibly slowly; the only thing it did fast was use up all the machine’s memory.

Gofer was a “functional” programming language, meaning not that that it performed useful functions (far from it!), but rather that it was based on the mathematical concept of a “function.” Gofer was basically equivalent to mathematics. It appealed to my sense of formal elegance; it was perfect in the sense of a Bach fugue. Unfortunately, though, functional languages do not match well to the von Neumann computer architecture, so it is very hard to make them efficient without special hardware. After the debacle of my stack-overflowing proto-AI system, I abandoned Gofer and turned back to C++, and then to the new programming language Java. But I restricted myself to more modest programming experiments. I made a C-language version of Antimagicians, which was much simpler and less interesting than the Gofer version. In Java, I made a genetic algorithm that ran on multiple machines (coded together with Rosalind Barr at University of Western Australia), and a simple actors-based search engine (coded together with Mark Messenger, also at UWA). I could see from this experience that, while my AI system in Gofer had small, because Gofer was made for expressing systems that refer to and organize themselves, a comparable system in C++ or Java or any other practical programming language was going to be huge. It took a couple years for me to summon the guts to attempt such a thing.

One thing that occurred to me as I started to think about implementation issues, much more than it had in my days as a pure theorist, was the crucial role of specialization. My Gofer-based mind had been theoretically capable of intelligence; it was a general system for recognizing and forming patterns in itself and its environment. But its generality didn’t allow it to solve any particularly useful problems within practical time and space constraints. In that sense, it had been a miserable failure as an intelligent system. In practice, I concluded, to get reasonably efficient intelligence one needs to code specialized cognition algorithms, aimed at recognizing patterns in particular kinds of data, learning how to carry out particular kinds of actions, and so forth. The brain is very much like this: we have 30% of our brain specialized for visual pattern recognition; regions specialized for language; regions specialized for body sensations; regions specialized for social interaction; etc. etc. And then we have a little bit of general intelligence, which is what makes us uniquely brilliant among the animal kingdom – but this general intelligence relies on all the specialized stuff to give it a meaningful context within which to operate.

Specialization needs to be mediated by rich interaction between specialized parts. The different specialized parts of a system need to learn from each other, and learn about the world together whenever they can. The integration of various specialized pattern recognition subsystems has played a huge role in practical Webmind engineering.

Because of all this specialization, it seemed to me in 1994 and 95 that there was no way to build a thinking computer program on contemporary computer hardware. It seemed to me that some kind of humongous brainlike supercomputer would be necessary. And then I discovered the Internet (unlike Al Gore, I didn’t invent it!). It struck me that the millions, soon billions, of machines around the world, all hooked together on the Net, had enough memory and processor power to create a real computational intelligence. The Java programming language came out in 1995 and it seemed the right tool to use to create a networked AI engine embodying the general principles of mind: recognizing and creating patterns in itself and the world, using a variety of specialized methods integrating together into a whole, an evolving autopoietic whole.

Not only did the Internet give you the computational power to build a thinking machine; it also provided a really rich perceptual environment. A mind can’t exist in isolation; it has to achieve complex goals in a complex environment. The physical world is obviously complex but building a robot body is another huge project, comparable in scope to building a mind. The Internet is arguably rich enough in diverse details to support intelligence, and it’s a lot easier to hook your AI system into the Internet than to build it a robot body. I made up my own complex goal: To build an AI system whose body was part of the Net, and whose perceptual world was the Net itself, the Web. A mind for the Web; a Webmind.

In terms of the conception of intelligence as “achieving complex goals in complex environments,” the goals I had in mind when designing the Webmind system were roughly:



* Conversing with humans in simple English, with the goal not of simulating human conversation, but of expressing its insights and inferences to humans, and gathering information and ideas from them.

* Learning the preferences of humans and AI systems, and providing them with information in accordance with their preferences. Clarifying their preferences by asking them questions about them and responding to their answers.

· * Communicating with other AI systems, in a manner similar to its conversations with humans, but using a mixture of human language and a more formalized and precise computerized language we have created, called Sasha

* Composing knowledge files containing its insights, inferences and discoveries, expressed in Sasha or in simple English.

* Reporting on its own state, and modifying its parameters based on its self-analysis to optimize its achievement of its other goals.



Of course, my ambitions didn’t end there – that would be wimpy. Subsequent versions of the system were intended to offer enhanced conversational fluency, and enhanced abilities at knowledge creation, including theorem proving, scientific discovery and the composition of knowledge files consisting of complex discourses. And then of course the holy grail: progressive self-modification, leading to exponentially accelerating artificial superintelligence!

I remember a particular moment when my diverse ideas about AI crystallized in my mind, with amazing clarity. I could see in my mind exactly how an AI system could be built. Now all that was left was to work out a few pesky details.

At this point, it had been 13 years since I’d first set myself the goal of building a thinking machine. I now had a PhD in math, and had spent countless thousands of hours studying of cognitive science, physics, computer science, neurobiology, philosophy of mind. I’d published four books on the mind, which were idiosyncratic combinations of mathematics, philosophy and science, all pushing in the same direction, toward an understanding of the mind that was both fundamental and precise. I felt I finally had the answer. And it seemed that the hardware was finally getting there too. We had cheap computers with gigabytes of RAM, and we had high-bandwidth Ethernet and Internet, allowing distributed computing among dozens or even millions of these powerful, cheap machines, etc.

It all seemed incredibly clear to me. Mind was exquisitely simple in essence. A mind was a web of patterns, a network of independent mind actors, each one concerned with recognizing patterns in other actors, and patterns emergent between itself and other actors. New actors were created to embody new patterns. The overall network of mind was continually re-making itself via recognizing patterns in itself. The character of a particular sort of mind was determined by the assemblage of pattern recognition/creation actors inside it. The art of mind design – an as yet nonexistent art – would consist of choosing the right assemblage of types of actors so that the emergent self-reconstructing behavior of mind would get into a productive dynamical attractor. From my 13 years of thinking about human and artificial intelligence, I felt I had a good idea how to choose and design the right mind actors, so that when these actors were released to study and transform one another, the self-reconstructing, self-recognizing dynamic characteristic of mind would emerge.

And so in the fall of 1996 I started creating the Webmind AI Engine. As I’ve said, I’d been working on similar things off and on for years; but the actual design of the Webmind system as it is now was something I started in the fall of 1996, when I was in Western Australia, working at UWA as a Research Fellow. Soon enough this got more interesting than anything else I was working on -- I realized that I was on the verge of something really cool, and something that I wasn’t going to be able to implement myself, or with a couple research assistants. John Pritchard, my New York e-mail pal, was convincing me that it was plausible to get funding to start a company building software according to my designs. The idea was appealing.

At the start of 97 I quit my job at UWA and moved to the US to work on Webmind design and coding full time. I didn’t have any clear business plan in mind, but I figured that once I got some clearly intelligent behavior working, the venture capitalists would beat a path to my door. Naively enough, I figured I’d reach that point after a few months hard work. I figured that after I got some basic stuff working, I could raise a few hundred thousand dollars to pay perhaps 5 programmers, and then we’d get the whole thing implemented in 6 months time – presto! a thinking machine. Fame and fortune, and truckloads of beautiful girls, would be mine.

What I had at the end of summer 1997 was ten thousand lines of Java, largely designed as I went along. This system was never completed, and of the parts that were completed only half of them worked. There were lots of details I didn't understand. This first serious attempt at Webmind had too much of my theory of mind in it, and not enough computational practicality. It was beautiful as a mathematical and logical statement, but still horrible as a computer program. I still was too closer to Gofer, and hadn’t come to grips with what I’d have to do to make a useful, efficient implementation of my model of mind.

But still, the ideas, data structures and dynamics underlying this first Webmind were conceptually about the same as the ones underlying Novamente today. The mathematics and the software design have both changed tremendously, but the underlying vision is the same. Novamente, like Webmind before it, is based on the idea that the mind is a collection of patterns that forms and recognizes patterns in itself and the world, and in this way achieves complex goals in the world. It makes this vision concrete by defining some simple software objects corresponding to patterns and goals.

In the mid-90’s, starting out on Webmind design, I had basically a comprehensive knowledge of what was happening in the AI world. It was a mess. It’s basically the same way today. There’s no well-understood, commonly accepted body of scientific knowledge about AI. Instead, there’s a vast diversity of approaches to various aspects of the relationship between computation and mind. Some of the approaches contradict each other and some of them complement each other. Designing Webmind was a process of assembling information from various different perspectives and disciplinary areas into a coherent whole, guided by a set of governing principles.

Many different subdisciplines within the AI umbrella contributed to the structuring of Webmind, and then Novamente. Some of them I was thinking about when I first started designing Webmind, others emerged as being significant more recently, further along in the design process, in some cases only in the transition from Webmind to Novamente. Table 1 gives an overview of the sorts of things that Novamente draws from various disciplines. It may be a bit opaque to the nontechnical reader, but it will mean something to the reader with some computer science background, and perhaps to others it will be at least generally evocative.

Cognitive Psychology

From cog psych we have taken a number of high-level structural principles, for instance the notions of Long-Term Memory, Episodic Memory (memory of your own history), and Short-Term Memory; and the distinction between procedural (knowledge of how to do thing) and declarative knowledge (factual knowledge).

 

Introspective Psychology

Modern cognitive psychology is experimentally focused, but past traditions in psychology have more openly drawn their inspirations from introspection, from what each mind intuitive knows about itself.  The overall structure of Novamente owes something to ideas drawn from these traditions, from Gestaltism to Buddhist psychology and Peircean philosophy.

 

Neuroscience

From neuroscience we have taken the observation that mind can be implemented by a parallel distributed system with activation spreading  around it in complex patterns – i.e. a ‘neural net’, broadly conceived.   We’ve also taken our approach to localization from what’s known about the brain: in Novamente, knowledge is distributed, but not across the whole system; each type of knowledge is distributed across a part of the system, just as is done in the brain.



Complexity Science

The emerging science of complex systems has contributed crucial concepts such as self-organization, evolution, autopoiesis and emergence.  Novamente is a modular system in which the real intelligence emerges from interaction between the modules.  Like many complex systems, it displays behaviors like phase transitions and sensitivity to initial conditions, and evolution-ecology interactions.



Nonlinear Dynamics

One of the more rigorous subsets of complexity science, nonlinear dynamics studies the attractors and transient patterns that emerge as nonlinear systems evolve over time.  Novamente is a highly nonlinear dynamical system whose attention is allocated by complex attractor dynamics, and that specifically studies transients in its own dynamics so as to self-adaptively modify its own structure.



Statistical Pattern Recognition. 

In its analysis of numerical data (e.g. financial forecasting) and its lower-level linguistic processing, Novamente makes use of statistical pattern recognition tools.  What makes it unique is its ability to integrate statistically recognized patterns with other types of knowledge, and to generalize from this knowledge via inference and other mechanisms.



Multi-Agent Systems

With the advent of distributed and parallel computing, there is a substantial body of knowledge about how to make populations of computational agents cooperate to carry out useful activities.  Novamente is a multi-agent system, albeit a very unusual one, and its system architecture makes use of principles from this area of computer science in many ways.



Computational Linguistics

The last decade’s explosion of knowledge in computational language processing has produced many techniques of use within Novamente.  The challenge has been to get all these tools working together in a common framework focused on extracting, creating and producing meaning rather than on syntax analysis



Expert Systems

Novamente allows humans to enter expert knowledge into it via XML, Sasha or other special formal languages, similar to standard AI expert systems.  Unlike expert systems, though, it doesn’t take this knowledge as truth: it takes it as information given to it by another mind, and feels free to forget it or modify it as it sees fit.



Machine Learning and Optimization

Machine learning and optimization algorithms are not real AI systems but they do solve problems that are crucial to the mind.   Novamente uses genetic algorithms, genetic programming, and statistical machine learning techniques for various purposes, internally.



Logic

While Novamente is not a logic system in the traditional sense, it makes use of the reduction of general relationships to a simple relational formalism, which was pioneered by mathematical logicians and logic-inspired AI engineers.  It manipulates relationships using uncertainty-robust, self-organizing reasoning techniques different from those used in the logic or AI literature



Table 1 - Novamente’s Diverse Inspirations

Obviously, this laundry list of component technologies doesn’t really tell you a damn thing about Novamente. That’s because the crux of Novamente lies, not in the component technologies, but in the way these technologies are structured to form a coherent self-organizing system. But still, the presence of all these tools made the process of building Novamente very different than it would have been if none of the tools existed, and you had to build every component technology from scratch. Rather than just “how do you program a mind on current hardware and software?”, the question becomes more like “Given all these wonderful tools, and amazingly powerful distributed hardware on which to implement them, how can we tie them all together in a harmonious and mutually adaptive way to produce a mind?”

Given the general conceptual framework I’ve described, and the practical and conceptual toolbox I’ve listed, the first step toward actually designing Webmind was deciding what the “atomic mental object” should be.

Bigger than a neuron, smaller than a machine, was the first decision. I created a Java object called a Node. A node is the most basic kind of pattern known to Webmind – it’s something Webmind recognizes as a whole. A node says, “This thing is worth distinguishing from its environment as a whole entity. Here it is. It persists and maintains its boundaries over time.” We have some nodes referring to external sensed objects: TextNodes, DataNodes, WordNodes, and so forth. We have some nodes representing patterns recognized in the system itself rather than in the outside world: CategoryNodes of various kinds, AutomatonNodes representing evolved patterns, etc. There are nodes called SubgraphImageNodes that represent parts of the mind, grouped with a boundary drawn around them so as to be considered as a kind of higher-order individual. And so on, and so on, and so on.

But nodes are just the start. Webmind is also wired to recognize certain kinds of patterns involving nodes. Similarity is the most basic kind of pattern: it’s the recognition that two different things, occurring at different points in space or time, are actually a lot like each other, and can be interchanged for many purposes. Inheritance is also basic: it’s the recognition that you can substitute A for -- (though maybe not -- for A) without substantial loss of information.

How many link types to incorporate was a big question. In the AI systems known as semantic networks, you have a different type of link for every relation in the net – a link type for kick, a link type for eat, and so forth. On the other hand, in a typical neural net model you have only one link type; whereas in the brain, there are many types of neurons and synapses – hundreds of link types, if you identify a link type with a synapse that’s reactive to a certain neurostransmitter.

In designing Webmind, we didn’t want to introduce too many types of links, because this just leads to a network that represents data in ways it doesn’t understand. We chose to use a few dozen link types, representing what I think of as archetypal types of relationships.

What kinds of relationships are “archetypal” for Novamente? Here I’ll just give a few important examples. We have similarity links, representing the belief that one actor is similar to another. There are inheritance links, representing the belief that one actor is a special case of another. There are spatiotemporal links, representing the belief that one actor represents something occurring near the other one in time or space. There are containment links, representing the belief that the entity represented by one actor is contained inside another one. There are associative links, representing simply the fact that Webmind's dynamics tend to associate one actor with another. This chart shows the definitions of these links in a bit more systematic way:

Link Type Pointing from A to B

Meaning of the Link

Similarity

A is similar to B

Inheritance:

     by Extension

A is a special case of B

     by Intension

B is a special case of A

SpatioTemporal

A occurs at the same time and place as B

Temporal

A occurs at the same time as B

Before

A occurs before B

After

A occurs after B

Containment:

     Part of

A is a part of B

     Contains

B is a part of A

Associative

B is associated with A

     HaloLink

B is associated with A by Webmind's Dynamics

These link types, and others refining and extending these, are the elemental types of relationships that Webmind “understood.” They are a bit, but not a lot, like the various neurotransmitter receptors in the brain, which make different synapses different. The brain's receptors do not correspond so neatly to logical relations. But Webmind is not a brain; it is a mind that emerges out of digital computer hardware. Digital computer hardware is closer to logic than cells are.

These links are heterarchal in a sense; any node can link to any other node. But they are also organized in hierarchies of composite actors representing, not specific relationships like links, but collections of relationships. Nodes contain links; nodegroups contain nodes, lobes contain nodegroups, and the mother of them all: the Psynet, the whole Webmind, that contains a lobe for each machine in its network. The basis of it all is the node: a node containing a bundle of links expressing its relationship to other nodes, and also some basic data objects and actors and roles. Nodes sending out messages -- information gathering and information carrying actors -- of various types to help them build new links to other nodes. A gigantic network of interlinked actors, constantly rebuilding itself, extending across multiple CPU's and multiple machines.

The nitty-gritty engineering needed to make this all work is considerable indeed. But the basic concepts are elementary. It's nothing but Peirce's network of relations, each spreading attention to the other relations that it stands to in a peculiar relation of affectability. It's nothing but Nietzsche's dynamic quanta, each one defined in terms of other dynamic quanta, each one re-creating itself and each other. It's beautiful and primal -- but it's not intelligent, without more detail, more specialization. It’s like the brain of an infant. All the core abilities are there, but intelligence develops as it incorporates and processes specialized information.

It’s easy to see how both node and links are patterns in the sense that they allow one to compress information. If two parts of something one is describing are similar, one can save effort by not describing the second one in detail and just describing it approximately by reference to the first one. For instance, to describe a picture consisting of two similar heads, you can draw one head and then just say “imagine two of these next to each other.” If one of the parts of the picture inherits from the other, one can save effort by replacing the more specific one with the more general one. Of course, there is a loss of information here. Suppose half of the picture is a general human shape, and the other half is my shape. My shape inherits from the general human shape, obviously. But if you describe the picture by drawing the general human shape and saying “two of these,” you’re losing a fair bit of information, though certainly not all of it.

Similarity and inheritance are logical relations, logical patterns. We also have purely observational patterns, like temporal relatedness, spatial relatedness, and part-whole relatedness. And we look for general association relations: When the system thinks of X, what Y comes to mind? This Y stands in an associative relation to X.

Nodes in Webmind contain links to other nodes, each link embodying one of these basic inter-node relationships: similarity, inheritance, part/whole, spatial, temporal, associative. Nodes and links are the two levels of pattern that are automatically and instinctively recognized by Webmind: nodes representing perceived wholes carved out of the chaos of the world or mind, and links representing patterns perceived among the nodes.

We then have special methods of building links. The method we used most in Webmind (but have basically abandoned in Novamente) was one I came up with in 1996, inspired by Web spidering, called Wandering: we have actors that move around through the network of nodes, traveling from node to node along links, looking for nodes that are strongly related and should be joined by new links. This particular method of link formation may or may not be the best. The key point is that there is some dynamic by which new and relevant links are continually formed.

Relevance is determined by how much “activation” each node has, and activation is spread through the network by Peirce’s Law of Mind, which is the same at to say, by basic neural net activation spreading. The Java object that carries activation through Webmind, we call a Stimulus.

Associative links are built by a process we call “halo spreading,” in which a node gets active and then measure how active other nodes become as a consequence, after a certain period of time. It spreads Stimuli to other nodes and then collects them after a while, observing how stimulated they’d become.

Again, there are a lot of ways of doing these things, and the current ways may or may not be the best. The exact method of spreading activation or halos is not crucial to Webmind, but rather just the overall character of the patterns being recognized and formed.

Halo spreading and reasoning and wandering form new links, but it’s also crucial to form new nodes, and this is done by combining old nodes in various ways (fusing them, splitting them) and also be explicitly evolving new nodes to satisfy various goals using special nodes called EvolverNodes.

The achieving of goals, crucial to intelligence, is done using nodes that we now call SchemaNodes, which contain little programs that control aspects of perception, action and thought. Perceptions from the outside world come into Webmind and are translated into nodes right away. These nodes link to other nodes representing contexts that the system is operating in, and these contexts link to SchemaNodes, representing things that might be desirable to do. The goals as well as the contexts link to the schema, so that the hottest schema will be the ones that are relevant to the current goals in the current contexts. Schema look into the long-term memory of the system and grab out the various nodes and links contained therein.

There’s also a SelfNode, recording the history of the system – what psychologists call “episodic memory” – and predicting the future of the system, and selecting the system’s goals according to the metagoal of maximizing system happiness. Yes, we have a Happiness FeelingNode, and nodes for other basic emotions, complex emotions being considered combinations and mutations of simple ones. What makes the system happy – we get to decide at first, until it mutates and modifies its own HappinessNode just like we do. Right now, it likes to answer questions people ask it, it likes to save memory, and it likes to build a lot of high-strength links – i.e., to discover a lot. Schema look into the SelfNode to get their overall motivation.

Many goals involve making others happy, and for this, models of other minds need to be maintained; this is done in UserNodes.

There is a loose mapping between these data structures and things in the brain. Nodes are a bit like neuronal groups – clusters of 10,000 to 100,000 neurons, that sort of act as a unified whole. Links are sort of like bunches of neural connections between one cluster and another. This intuitive mapping onto the brain can be useful, and it’s surely not a complete fluke that the structure of the brain is a lot like the structure of the mind that emerges from the brain. On the other hand, it’s important not to overblow the very loose neural modeling aspect of Webmind. Webmind was supposed to be a mind, not a model of the human brain, and it’s a definite failure at being a model of the human brain, not surprisingly.

There’s a lot of complexity here, just like in the brain. But basically, Webmind's architecture was that of a massively parallel network, a population of many, many different information actors – nodes, links, wanderers, Stimuli spreading activation and collecting halos. The nodes continually recompute their relationships to other nodes. Queries put to the system are transformed into nodes that take advantage of WebMind's self-evolving structure to produce the needed answers.

All this – plus or minus a few critical details, and a lot of non-critical ones -- was outlined roughly and erratically in some documents I wrote during Spring and Summer 1997. Some things were designed in detail, others just hinted at. Because so many details were left out, it wasn’t quite clear to me, at that point, what a humongous system this was going to become.

This was still pre-Webmind Inc.; I was working in loose collaboration with a friend and programmer named John Pritchard, who liked my thinking in a general way, but never really came to grips with my ideas, except on a philosophical level. He wanted to approach things by first building a general Java infrastructure for dealing with AI, and then implementing my particular AI theories – an approach which makes sense, but only if the infrastructure is deeply informed by the AI theories, which wasn’t the case then.

During summer 1997, John and I parted ways, and my friend Lisa Pazer and I started the company that was initially called Intelligenesis Corp., and later changed its name to Webmind Inc. (because American businesspeople seemed to have too much trouble spelling the orignal name!). At that point I gave up coding 10 hours a day, turning that responsibility over to my newly recruited old friend Ken Silverman, and spending most of my time on design issues. I was still coding a few hours a day at that point, but not like before.

Ken learned Java in a couple weeks, and set to work. We talked on the phone several hours a day, and he coded for the rest of his waking hours. He ended up creating a new Webmind from scratch, based on reading and reinterpreting print-outs of my eccentric, tangled Java code. My first version had been useless, but had followed the concepts of my theory of mind fairly directly. Ken's version followed the structure of Java more so than my theoretical ideas. It was a colossal step backward in conceptual elegance. But it had one fantastic redeeming feature: as of February 1998, it finally worked!

OK, in retrospect, it didn’t really work, but it looked like it worked at the time. It wasn’t made to exploit multiprocessor machines, or networks of machines. It wasn't ready to serve as the infrastructure for the global brain. It was too small to demonstrate any really interesting emergences, any of the structures of mind I’d identified in my theoretical work. But it was our first working prototype, and we rigged it up to do some simple things like read in a bunch of Web pages or numerical data series, and decide which ones were similar to each other. No tremendous intelligence was apparent yet, but we hadn't expected any. We'd built the infrastructure for intelligence, but hadn't put in the specialization that would allow the system to display useful intelligence in particular areas.

It was very simple in concept, but very complex to actually implement. We had a network of mental entities, each one related to other mental entities, and each one constantly revising its collection of relationships. Each node, and each actor, was an "object" in the Java programming language, which proved very well suited to our needs. Writing Webmind meant writing Java "classes" for all the different kinds of nodes, wanderers and other objects we needed. Practical problems kept coming up, problems I had never thought of when I was writing theoretical books and scribbling notes on the back of photocopied research papers. For example, what do you do when the system has recognized too many relationships in itself and has run out of memory? How do you decide which relationships to cull? How does the system manage its time, allocating certain amounts of CPU time to each node to use in building new relationships? How does the system determine how much time to spend loading in new information into new nodes, versus building new relationships among existing nodes? And so on, and so on, and so on.

We also wanted to build up Webmind's thinking power. This meant we had to keep increasing our palette of specialized classes of nodes and links representing particular kinds of relationships and concepts. The real intelligence, I was certain, would then emerge from the interactions of all these specialized nodes and links in the self-organizing network. But before we could get there, there were dozens of mechanical issues to be worked out, debugged, tested, tuned.

In the very early days of Intelligenesis, before we got funding, the work proceeded in pairs, each pair consisting of me and someone else. Lisa and I worked on the business plan and tried to raise money. Ken and I worked on the first Webmind prototype, which ran on a single computer with a single processor; Ken doing nearly all the coding, me giving him designs and suggestions through endless phone calls and meetings. Jeff and I were taking his nonlinear prediction algorithms and making them more intelligent and flexible, integrating them with some of my own AI work. Onar and I were sending back and forth endless e-mails diagramming what would later become the language learning component of Webmind’s natural language system. And Paul, in looser communication with me than the others, was designing and coding the Pods system, a very nice system for doing self-organizing computing on multiple machines and multiprocessor machines.

In the spring of 1998, Ken integrated Webmind with the Pods system, producing the first Webmind that had a prayer of actually running on a lot of machines at once. This was a system which could serve as the foundation for a global mind. It exploited the power of Java even more fully than Ken's first version had -- it was more "object-oriented," and used Java's network-computing facilities more thoroughly.

And then things went completely crazy. In a mostly good way. Lisa finally got us funding, and we started hiring programmers and scientists. People were coding nodes and links embodying specialized kinds of intelligence. The system got smarter, and things got far messier.

The most crucial hire was Pei Wang, a Chinese computer scientist a few years older than Ken and me, who when we hired him had spent the last 12 years developing a system of probabilistic logic called NARS, the Non-Axiomatic Reasoning System. Within a few months, Pei had integrated many of the ideas of his NARS reasoning system into Webmind, providing us with a handy nodes-and-links version of probabilistic logic. He also introduced a lot of ideas into Webmind as a whole, apart from its reasoning component. For instance, it was Pei’s inspiration that every link in Webmind should have four numbers associated with it: a strength telling how significant the pattern represented by the link is; a confidence telling how sure we are of the assessed significance; an importance telling how useful the node is to the system as a whole; and a decay rate telling you how fast importance decays for that particular node.

Toward the end of summer 1998, we also hired Cassio Pennachin, who at that point was just one among a handful of Java hackers around the world whom I’d recruited through job ads on Usenet. Cassio lived in Belo Horizonte, Brasil, and first took on the job of fixing up some code I’d written for evolving new structures in the mind using a variant of genetic programming. This was the beginning of what’s become an Intelligenesis tradition: Brasilian programmers receive American code by e-mail and respond very politely with comments like “Excuse me, but would you be terribly offended if I made a few changes to this code?” Of course, you say yes, and a few days later you receive a completely new version of the software, containing exactly three lines from your original code, but much better designed and also more efficient.

Cassio proved to be an excellent manager as well as an excellent software engineer, and I let him accumulate assistants until, as of now, we have more than half our engineering staff in an office Belo Horizonte, with Cassio as our overall Director of Webmind Development. The Brasilians, so far, have not made any big AI innovations, but the disciplined approach to object-oriented design that they’ve brought us has been just as important as our AI innovations, in terms of getting Webmind, this humongous piece of Java code, to actually work. The real importance of this aspect of their work didn’t become apparent until the end of 1999, with their psycore redesign – but I’m getting ahead of myself.

The rapidly increasing size of the Webmind codebase was inevitable because the core code Ken and Paul and I had written wasn't enough for intelligence in any practical context. It was just a generic intelligence mechanism, a self-organizing, relationship-building network. As we introduced more and more specialized nodes into the system, the system as a whole changed. New problems emerged. We should have anticipated that this would happen, but we hadn't really thought about it. We'd been too busy dealing with the challenges of formulating the psynet model in Java in a network-friendly way.

To deal with this blossoming of the Webmind code, in the summer of 1998, Ken and Paul split Webmind into parts. The central part, the one they had been working on, they called Psycore. This contained the generic mechanisms for dealing with nodes, links and wanderers. In a sense, this was Webmind's operating system, the code that enabled all the parts to work together. Then there were the Psymodules, one for each specialized area of intelligence: natural language, reasoning, numerical data analysis, etc. If we were to decode the DNA code that generates the human brain, we might find that it works in a similar way. The "psycore" would be the DNA code that describes the features that are common to all neurons, synapses and neurotransmitters. The "modules" would be the DNA code which describes the distinct features of the specialized types of neurons (there are dozens) and neurotransmitters (there are hundreds), and the particular patterns of neurons, neurotransmitters and synapses that make up different parts of the brain.

The brain has hundreds of specialized parts devoted to tasks such as visual perception, smell, language, episodic memory, and so forth. Each of these parts is composed of neurons which share certain fundamental features, but each also has its unique features and capabilities that scientists are only beginning to understand. Similarly, when a Webmind is running on a computer, different parts of the computer's memory are assigned to different tasks. Each of these parts of the computer's memory draws on the psycore for its basic organizational framework, and on more specialized modules for advanced capabilities.

Each of Webmind’s modules is specialized for recognizing and forming a particular kind of pattern. And all the different kinds of nodes and links can learn from each other -- the real intelligence of Webmind lies here, in the dynamic knowledge that emerges from the interactions of different species of nodes and links. This is how Webmind builds its own self; it’s the essence of Webmind’s mind, of how Webmind’s patterns create and recognize patterns in themselves and the world to achieve their complex goals.

I’ll give a quick laundry list of modules, without going into great detail on any of them.

There was a numerics module, containing data processing actors that recognize patterns in tables of numbers, using a variety of algorithms, some standard, some innovative. DataNode embodies nonlinear data analysis methods and it recognizes subtle patterns that’ll always be missed by ordinary data mining and financial analysis software.

There was a natlang module, which deals with language processing. The natlang module represents texts as TextNodes, linking down to WordNodes representing words in the text, and other nodes representing facts, concepts and ideas in the text. It has text processing actors that recognize key features and concepts in text, drawing relationships between texts and other texts, between texts and people, between texts and numerical data sets. These actors process vast amounts of text with a fair amount of understanding and a lot of speed.

The natlang module also contained reading actors, which are used to study important texts in detail. They proceed through each text slowly, building a mental model of the relationships in the text just like a human reader does. These reading actors really draw Webmind's full set of semantic relationships into play, every time they read a text.

There was a category module, containing actors that group other actors together according to measures of association, and form new nodes representing these groupings. This, remember, is a manifestation of the basic principle of the dual network.

There were learning actors, that recognized subtle patterns among other actors, and embody these as new actors. These spanned various modules, including the reason module, containing logical inference wanderers, that reasoned according to a form of probabilistic logic based on Pei's Non-Axiomatic Reasoning System; and the automata module, containing AutomatonNodes that carried out evolutionary learning, according to genetic programming, a simulation of the way species reproduce and evolve.

In the user module there were actors that model users' minds, observing what users do, and recording and learning from this information – these are UserNodes and their associated Wanderers. There are actors that moderate specific interactions with users, such as conversations, or interactions on a graphical user interface. And in the self module there are self actors, wanderers and stimuli that help the SelfNode study its own structure and dynamics, and set and pursue its own goals.

Each of these actors involved in the modules had in itself only a small amount of intelligence, sometimes no more than that you might see in competing AI products. The Webmind core – “psycore”, as we sometimes called it -- was a platform in which they can all work together, learning from each other and rebuilding each other, creating an intelligence in the whole that is vastly greater than the sum of the intelligences of the parts.

The version of Webmind we completed in the summer of 1998 – the first multi-module version -- worked fine for about a year. We used it to build the modules essential for Webmind's core intelligence and for several impressive applications. It included a module for text-based market prediction; a natural language module for mapping texts into networks of meanings; several modules for the evolution of concepts according to different methods; a module for Webmind's self-understanding; and so forth. The development of each module was driven by requirements particular to certain application areas. The financial modules were driven by the practical need to predict the markets. The natural language module was driven by the need to parse financial text, and understand human queries. The concept learning modules were driven by the need to learn concepts relevant to financial prediction and to the processing of human queries. The self-understanding module was driven by the need to have the system proactively think about things that humans were likely to ask it about in the future.

At this point, Webmind benefited greatly from the fact that we weren't just implementing a theory, we were hard at work developing practical applications. One of the most profound pieces of advice I’ve ever received about Artificial Intelligence came from Danny Hillis, who I discussed above -- inventor of the Connection Machine parallel processor, founder of Thinking Machines Inc., and an informal advisor for Webmind Inc. throughout its lifetime. As we sat in the South Street Seaport in New York eating dinner one day, he was discussing a major AI company that had worked for 10 years to design an AI system, without considering in detail any particular application of the system. Lo and behold, the system had never done anything useful. Danny’s comment was: “They were brilliant people with good ideas, but they made a serious methodological error. They developed their system for years and years, without any contact with practical applications.” Our software was saved from this fate by the fact that we were committed to producing actual products, simultaneously with working toward the goal of real AI. We were freed up to commit other major errors instead!

The Webmind AI Engine itself was never used inside any production-version software products, but it was used to prototype a number of AI processes that were later re-implemented inside products. One of these products, the Webmind Market Predictor, will be discussed in detail in the following chapter. The reason the Webmind AI Engine wasn’t used directly in products was basically that it was too slow-running, and plagued by hard-to-excise bugs. The Novamente system, as I’ll discuss a little later, is a more mature effort and doesn’t have these problems, and it’s being directly used inside some software products we’re developing for the bioinformatics market.

Working on practical problems in parallel with grandiose long-term goals was valuable – but it had its disadvantages as well. It pushed us to overspecialize the system, hyperdeveloping those portions that were needed for products, rather than developing the whole system in a more evenly-balanced way. Most of our code was good for the specific tasks it specialized in, but we had not gotten to the stage where all the modules, all the different node and link types, were working together in one big multi-machine Webmind. We were producing cool research software, but not the global brain I had dreamed of. We hadn't yet seen the emergence of the dual network, of the self. And we weren’t able to push straight toward it because the particular portions of the system needed for the Webmind Market Predictor – our first product – needed so much attention.

But overspecialization induced by business needs was far from our only problem. The truth, as we sourly discovered, was that our core Java code, implementing the essence of the psynet model of mind, was just barely adequate for building products, let alone building real AI – it had too many bugs and was poorly documented. Ken had implemented this code brilliantly and painstakingly over a year of 15 hour workdays, but, even so, the task had been too big for any one human. We could have fixed up his code to make it product-ready, but we doubted whether we’d ever get it to the point where it could support the global brain.

So, toward the end of summer 1999, we decided to rewrite the Webmind code again. Not the whole system, thankfully -- we were too far along for that -- but this time only psycore, the central core of the system. This time around, Ken was helped out not only by me but by Cassio and several of his colleagues in Belo Horizonte, most notably Andre Senna and Thiago Maia, two masters of data structures and algorithms. At this point, there was a lot of pressure, from some members of staff on both the technical and business sides of Intelligenesis, to give up on the unified AI architecture altogether, and just focus on making individual products as good as they could be, postponing real AI into the future. But Ken and Cassio and I and others focused on building real AI resisted this pressure and plowed ahead with building a new, improved psycore. Among other beloved chunks of code, Paul’s Pods system met its doom in this rewrite.

The reasons for this redesign are somewhat interesting; they reveal a lot about the nasty realities of building big software systems doing complicated, intelligent things. The erratic bugs and lack of documentation in Ken’s code were part of the problem, and made Ken the arch-enemy of the engineering staff for a while. But this stuff was fixable. There was also a more serious problem with the system. It just wasn't flexible enough to enable a huge, multi-module Webmind to be run in a really intelligent way. When the system was only doing one thing – say, reading text, or using text to predict markets – then it was fine. But, it was very bad at regulating several activities at once.

For example, when loading in a series of texts, one would see it get slower and slower at reading. The reason was, the more texts it had in it, the more it had to think about. It had no time to read more text because it was so busy thinking about the texts it had already read! I remember once when Mark Watson, one of our Java AI gurus, noticed this problem in a Webmind demonstration he had written. Jim McLoughlin – one of our early hires who built a lot of Webmind’s numerical and financial analysis components --showed him a way around it. By hacking the code, you could get it to do anything, in any particular situation. But what was needed was intelligent self-control: the system had to know what processes were important to it, and regulate the amount of attention it spent on various things accordingly. Of course, we had always realized this would be necessary. But we hadn’t realized how deeply we’d have to code self-control into the system. Ken’s 1998-99 psycore was built to follow its whims, not to control its dynamics in accordance with goals; and imposing goals on top of this structure was like trying to get a hyper child to sit down and listen to a history lesson.

The system was so complicated that we couldn’t easily make the simple changes we needed to make to turn it into a real global brain platform. We needed to be able to turn on and off the different capabilities of the nodes and links at will -- and have the system do this automatically, adapting to its circumstances. We needed to be able to take collections of nodes and links that were stable, no longer evolving, and "freeze" them into a state that took up very little memory, providing easy access but no adaptability. We needed to be able to observe what was going on in a particular part of the system, and chart its dynamics, to see what structures were emerging.

Over the period 1998-99, psycore had evolved incrementally, getting new features whenever a module author needed them. The natural language team needed psycore to do one thing, the finance team needed it to do another, the categorization team needed it to do another, the reasoning team needed a reasoning module, and so on, and so on. None of these requests fundamentally changed the architecture of nodes, links and wanderers -- mental entities relating to each other and dynamically altering their relationships -- but they changed the details of how nodes, links and wanderers worked, and how they could be accessed and changed. The abundance of new features had made the core code more powerful, but it had made it messier too, and harder to control. Many of the new features had similar structures, and in hindsight could be consolidated into simpler structures. Engineers, charged with building specialized components of Webmind, complained that the system offered so many features and possibilities that it was difficult to figure out how to use it. They wanted something simpler, with a few good features rather than a large number of features of varying quality.

Was it really necessary to go through all these revisions? Why not just figure out everything correctly the first time, and avoid all the reworking and re-reworking? One answer is: We should have, we were just inexperienced, so we kept fucking up. But there’s also another answer, that I prefer because it’s more flattering to me! This answer is: evolution doesn't work that way. Webmind, as a software system, is an engineered system, but it is also an evolved system. It went through several incarnations, each one with some fit aspects and some unfit aspects. The fit aspects survived to the next incarnation; the less fit aspects didn’t. All large software projects evolve through multiple generations; Webmind was not unique in this regard. But the evolution of Webmind had unique aspects because what is evolving is mind itself. In this evolution we had to retain both those features that were most useful for practical applications and those that were in accordance with the abstract structure of mind.

Evolution’s good at figuring out how to make a system that can achieve its goals within a certain environment. In this case, the system was Webmind, and the environment includes the physical structure of modern computer hardware, the universe of software that has evolved to adapt to it, and the practical applications that Webmind was intended for, like market prediction, news filtering, data analysis, text analysis, and conversation. Java, wonderful as it is, wasn’t designed for mind hacking. The von Neumann architecture was designed for repetitive mathematical calculations, not for intelligence. But, by the same token, the brain was designed for sensing and acting, not for abstract thought. Fiber cells were designed for musculature, not for use as neurons. Mind can emerge from any sufficiently flexible substrate, as the features of the substrate gradually adapt themselves to the requirements imposed on them.

The new psycore had a multi-layered structure, which I invented based on some conversations with Youlian Troianov, a Bulgarian software engineer who believes Webmind can never be truly intelligent because it doesn’t make use of the fundamental quantum symmetries of the universe (but he kept working for us anyway, and even now follows Novamente work very closely). I still don’t completely understand what Youlian meant when he suggested psycore should have many layers, but the idea set off a spark in my mind, and the current psycore does indeed have three layers.

The lowest layer was what we called “abstract actors.” It was a general framework for computational actors that group other actors and transform other actors and send messages to other actors. We chose the word “actors” here instead of “actors” because “actors” seems to mean too many things to too many people. Lots of other possibilities were tossed around, including more interesting ones like “cells,” “psells”, “psions”, “psychons” and so forth. Basically, Layer 1 provides a kind of “mind operating system,” suitable to run on a single machine and a single processor, or else on a massively parallel hardware system in which each actor gets its own processing power, like in the brain.

The second layer was “distributed actors” – this deals with all the horrible nastiness of implementing a massively parallel system on a collection of multiprocessor machines networked together by TCP-IP. Scheduling of processes, sending of messages from one machine to another, and so forth. Paul’s Pods system was considered as a structuring principle for this layer, but based on extensive testing by the Brasilians, we chose some other ideas instead, which Paul wasn’t terribly happy about.

The third layer, finally, was nodes and links and wanderers and all the good stuff – all the stuff I invented in 1997 and Ken and I coded up in the beginning. This layer comes out very small once you put all the general actor interaction stuff in layer one and all the nasty multiprocessor and multi-machine stuff in layer 2. But the fact that it’s small is great, because it means it can be easily experimented with.

That’s it – three layers of psycore. If you want to you can extend the layering metaphor outside of psycore into the rest of Webmind. The fourth layer, conceptually, is the modules, all the specific node and link types for carrying out specialized functions of mind. And the fifth layer, if you want to stretch the metaphor, would be some fragments of Java code called “interfaces” that I wrote to systematize all the different learning methods in the modules. For instance, a categorization interface that groups together all the different categorization methods in the different modules.

None of this layering really adds anything to the philosophy of mind underlying Webmind – it’s just a matter of making the huge morass of complexity needed to make a practically useful mind into a workable software system. The complexity comes from two places: first, the diverse specialization needed to make pattern recognition and formation practical in any real world; and second, the fact that we don’t have a massively parallel hardware substrate like the brain, so we have to get a massively parallel self-organizing system of nodes and links to run on a hodge-podge of processors and memory units. The object-oriented design skills of our Brasilian engineering team were crucial in getting all this to actually work correctly, which it now seems to, much to my amazement and pleasure.

The next big upheaval in our conception of Webmind – a few months after the new psycore was done – had to do, not with the structure of the core system, but rather with the teaching of the system as a whole. The basic problem here was: Once you have the dynamics and structures needed for the mind implemented in an adaptable, workable, testable way – you still need to turn this mind framework into an actual particular mind, that understands the world around it. How do we get the knowledge in? We had a lot of ideas about learning and extending knowledge from grounded to ungrounded domains – ideas I mentioned above. But this didn’t seem to be quite enough. We started to think the system was going to need a bit more of a helping hand in learning how to cope with its world.

In January 2000 I read the engineering plan for the Natural Language module, written by Karin Verspoor, our Director of Natural Language -- another one of our very early hires who took over the language processing aspect of Webmind from Onar way back in 1998. Karin has a comprehensive knowledge of linguistics and computer science, but when she started working for us, she didn’t have much background in computational linguistics. She inherited from Onar and me, when she first started out, a lot of ideas about how computers could learn language by statistically studying texts. The basic framework was one that my wife Gwen and I had developed in 1994, when Gwen was working on her PhD in computational linguistics. Onar helped me extend the ideas beyond where they’d been when Gwen had dropped out of her PhD. Karin helped Onar to make practical Webmind implementations of the stuff he was working on.

Some of these language learning schemes were great and some were absolute rubbish. We’re still not sure about some of them. Anyway, after about 8 months trying to get the 1998 version of Webmind to learn language by recognizing patterns in texts, Karin gave up in frustration, and started taking the Webmind NL module in a new direction, in which the structures of language are built in, and even a lot of specific facts about English are built in, like parts of speech, grammar rules, and so forth. I always had mixed feelings about this. Webmind was really supposed to learn everything on its own, not have stuff wired in – that was the stuff of expert systems, rule-based AI, which I knew on philosophical grounds was a total dead end. On the other hand, I told myself, human language was a special case among all the things Webmind had to deal with. My theory of mind explained how Webminds could learn their own language, to communicate amongst each other. But I’d never really explicitly thought about how a mind could learn the language of a completely alien race – which is what we are, from the point of view of Webminds.

Webmind didn’t have a human body, and without one, could we really expect it to learn human language? Although Onar’s and my methods for having the system learn language from recognizing patterns in texts seemed to make perfect sense, mathematically and conceptually, this obviously wasn’t the way people learned language. You learn to talk and hear language before you learn to read it. You learn what words mean from your embodiment in the world with other people who talk to you and listen to you.

It seemed to me we were missing something – not in the core Webmind design, but somewhere else. Finally, after a few days of soul-searching, I figured out what it was: Humans learn how to be intelligent by interaction with other humans in a shared environment. It’s as simple as that. Raise a baby human in a room by itself and it’ll grow up to be a moron. Of course, I’d said as much in my theoretical books, way back in the Dark Ages, but with all the focus on getting the system to actually work, getting all the modules to work on their own and to work together, I’d let this aspect of intelligence slide. How to work this aspect into our current work on Webmind, I wasn’t quite sure, but I knew I had to figure it out fast.

I needed someone to bounce these ideas off of as they developed. I chose two of our best young AI engineers – Cate Hartley and Mike Ross – together with my Deputy CTO, Stephan Bugaj. It was important to me, in developing these ideas from vagueness to concreteness, to work with people who hadn’t played a big role in designing the current system, because I was afraid that the conclusion might be that the current system was lacking in some basic way, and I thought our old established engineers might be afraid to come to this conclusion even if it was the right one.

Our conclusion was that the current system was pretty much fine. It contains all the parts needed for learning through shared experience; the trick is just to deploy them in the right way. We designed a simple user interface in which Webmind can move objects around and watch us move objects around, and chat with us about what it’s doing and seeing. Using this “Baby Webmind” interface, we need to lead the system step by step through goals, beginning with simple goals and gradually moving to more complex ones. We need to teach the system step by step almost like a baby. Ken, Karin, Pei, Jeff and all the “old guard” quickly became deeply involved in helping us work out the details.

All this entailed some changes in emphasis from our pre-Baby-Webmind work. Before, we’d focused mainly on the system perceiving its environment, now, in the Baby Webmind context we started thinking just as much about action, about what Webmind does in its world and how its actions intersect with its perceptions. Before Baby Webmind, the evolutionary learning aspect of Webmind was focused on learning to recognize patterns in text and numerical data; now, it was tweaked so it can easily evolve schema, procedures for seeing, doing and acting, and so forth. But fortunately, we discovered, nothing big and new needed to be built for Baby Webmind; it was all just a matter of adjusting the modules that were already there, encouraging them to interact with each other in the right way.

The dissolution of Webmind Inc. in March 2001 was a class A disaster for all of us involved with the firm. I can think of sadder moments in my life, but none that dragged on for weeks and weeks. The “endgame” of the company was a torturous process of laying off one group of friends one week and another group the next, and sitting in endless argumentative meetings, occupying ceaseless consecutive 12-hour workdays, trying to find some way to salvage things. In retrospect I wish I’d spent the last 5 months of the company’s life playing Pac-man or writing poetry in machine language… but of course, at the time it wasn’t evident how things would come out, and if we’d succeeded in salvaging the firm then all the torture would have seemed worthwhile.

From one point of view, however, the tragic event could be construed as a positive thing. In the weeks following the dissolution, the legal status of the company’s “intellectual property” was not at all clear (in fact, it did not become clear for over a year). A group of us resolved to stick together and continue pushing toward a real AI, but, we didn’t feel at all legally comfortable continuing to work with the Webmind AI Engine sourcecode. We waited a few more weeks to see if the legal situation would resolve itself, but it didn’t, and so we decided to start anew.

This new start was a much bigger break from the past than the “psycore redesign” I discussed above. In that redesign, mistakenly in hindsight, we’d tried to preserve backwards-compatibility with the previous codebase. We wanted the pre-redesign natlang module to keep on working with the redesigned core. In practice, this backwards-compatibility turned out to be pretty worthless, because everyone wound up redesigning their modules for optimal performance under the new core, even though their old versions would have kept working “in principle” with only minor changes. We hadn’t wanted to admit the necessity to throw all the code out and start over, retaining only the ideas and lessons and the best of the mathematical high-level design features from the old version. We were under too much business pressure to get the Webmind AI Engine doing amazing things fast, so it could contribute to product development and moneymaking.

But as we took stock of our situation in late Spring 2001, we also realized we’d made some big mistakes that weren’t attributable to business pressures or the mistaken desire to retain backwards-compatibility. There were three big problems.

First, we had built the new core as a generalized software agents system, and done a wonderfully good job of it. But with this generality came a terrible cost in terms of computational inefficiency. Ken’s psycore had been just plain incorrect, and incomprehensible. The new Brazilian psycore was elegant and clearly structured; when there were bugs in it, it was possible to find them and remove them, because the code was well-written and well-documented. But the combination of the inefficiencies of the Java programming language and the inefficiencies of the general-agents-system design, combined, led to a system that could take several minutes for a single “cycle” to occur (a “cycle” meaning a period of time in which every node in the system got to do a little processing, dynamically relating itself to the other nodes around it via its links, building new links or modifying its old ones).

The other problem was more conceptual and mathematical than software-oriented. We had a system for representing procedural knowledge, which used SchemaNodes and related link types. And we had a system for representing declarative knowledge, using InheritanceLinks, SimilarityLinks, AssociativeLinks and related things. The relationship between procedural and declarative knowledge, however, could be expressed in the system only in an extremely complicated way. We were spending a terrifying amount of time working out the mathematical and conceptual details of this relationship. This aspect of the system just seemed wrong, because it was so bloody complex. And it seemed complex in the wrong kind of way. The right kind of complexity, we felt, was the kind that involved a very simple framework giving rise to complex emergent structures and dynamics. The wrong kind was when the foundational framework itself was too complex, and that’s what seemed to be happening with our procedural/declarative integration work.

Finally, our approach to natural language processing really wasn’t working. We were trying to hybridize a rule-based approach with a statistical-learning approach, and it was getting to be a huge mess. Accomodating the linguistic rules that human beings had made up and stored in linguistic databases, was requiring us to do all sorts of perverted things with node and link types. More and more, we were forced to conclude that you just couldn’t perform the hybridization we were attempting. Instead, we began to think, the only way to do language processing was to take the bull by the horns and go with a full experiential interactive learning approach. This wasn’t what the businessman half of my brain wanted to hear, because the pure EIL approach means that language processing comes last, after all the various cognition processes are working perfectly together – which is a problem for a AI system being built within a company whose products are based on language processing. We badly wanted the Webmind AI Engine to help our market prediction and document management products understand language better – but what our research was telling us was that there were two ways to approach language: the overspecialized, standard-AI, rule-based way, or the real-AI, EIL-based way. Our attempt to chart a middle course by fusing the two together just wasn’t going to cut it.

A number of us had been thinking for a while about better ways of doing things. All the experimentation with the new core had taught us a lot about how an integrative AI system should work. Except for the language processing and procedural/declarative interfacing issues, we seemed to have solved all the thorny conceptual problems of inter-mind-module integration – and there had been a hell of a lot of them. We felt like we knew what we were doing to a vastly greater degree than had been the case in 1999 when we’d designed the now-old, then-new psycore. Which of course meant that, in a sense, the 1999 core rewrite had been a success – because working with it had taught us a hell of a lot.

In April 2001, two Brazilian engineers, Thiago Maia and Andre’ Senna, began creating the new new “psycore.” The basic principles of this new system were outlined by Thiago and myself in New York, before he went back to Brazil, lacking a visa to remain in the US after the death of Webmind Inc. (as well as lacking a US source of income). The ideas we discussed were loosely based on many past conversations with others, including Senna, Cassio, and a few wild AI mavericks from the Webmind New York office: Anton Kolonin, Shane Legg and Youlian Troyanov. The latter three guys all had their own theories of how to build a real AI, though none of them articulated their approaches nearly clearly enough for my liking (I’m still in touch with all of them, and still enjoying witnessing the development of their ideas). Because they each had an intuitive sense for the “whole enchilada” of the real AI problem, through their own speculative work, they were extremely good critics of the Webmind AI approach. The idea of the new new core was not to make a better generalized agents system, it was rather to make a more specialized software framework, which was ideally suited for exactly the mind modules we now knew we needed. Ken’s original psycore had been specialized, but then we’d had to add onto it endlessly, because its specialization was overly limiting, and didn’t allow all the mind modules we found we needed. On the other hand the Brazilian 1999 new core had been general enough to allow us to experiment with all sorts of different mind modules, but this generality had carried too much of a price in terms of efficiency. Now we knew what we needed in terms of modules and could build an appropriately specialized system.

The procedural/declarative learning problem was a hard nut to crack, and when we started the development of our new system in late Spring 2001, I had only a general idea of how to solve it. But many months of effort paid off, and by late fall 2001 I had come up with an apparently workable solution, which used an advanced and obscure branch of mathematics called combinatory logic to bridge the procedural/declarative divide. This solution was eminently unbrainlike – it much more resembled what goes on inside the compilers for functional programming languages like Gofer -- but at this point, quasi-detailed brain emulation no longer seemed so critical to me. I was simultaneously working on a paper on what I called “Hebbian Logic,” an original theory of how advanced logical inference emerges from brain structures. As my thinking on brain dynamics and its relation to thought got clearer and clearer, I could see that, in some cases (such as the procedural/declarative interface) what’s right for the brain just isn’t going to be workable for any system running on a clustered-von-Neumann-machine hardware substrate. And one nice thing that happened was that, when I formalized the most difficult parts of natural language processing in terms of my new combinatory-logic-based framework, much of the complexity melted away. The continuity between language processing and generic cognitive processing became vastly clearer.

Thus we arrived at the AI design we call Novamente – the new mind. Novamente is currently (February 2002) only about 25-30% implemented, but I have little doubt that by the time you read these words, substantially more progress will have been made.

I’ve told you a lot about our various mistakes, oversights and revisions – and you may well draw from this tale the conclusion that I and my colleagues are a bunch of oafs who can’t get anything right! I think that a fairer conclusion, however, is that the real AI problem is really goddamned hard. Building a market predictor or a better text classification system – these were somewhat tricky problems, but we solved them relatively rapidly and unproblematically. Building a real AI is a different sort of animal. Most people who have approached the problem seemed to have begun with a certain technical approach and followed it where it led – and then stayed where their initial technical approach led, doing valuable work and making specialized applications, but abandoning the original real AI goal. On the other hand, we began with a very general, high-level conceptual picture of the type of system we wanted to build, and have progressively revised our technical approach in order to achieve a closer and closer approximation to our high-level conceptual picture.

What does Novamente do, right now? It doesn’t hold a conversation with you. It doesn’t rewrite its own sourcecode. In fact it is not nearly as impressive in its current behaviors as, say, Deep Blue, which is well known to be cognitively shallow. One of the really terrible things about the real AI problem, however, is that the approach that gives the best interim results is probably not anywhere near the best approach to the end goal. This is because good interim results are usually obtained by overspecializing one mind-module for independent performance, whereas real AI will only be achieved by interadapting an appropriate assemblage of mind-modules to one another.

We have not given up on the use of interim, incomplete versions of our AI system to yield practical results – first, because we can’t afford to; and second, because we really do believe that, as Danny Hillis says, feedback from real applications is a critical part of the AI creation process. This time around, however, we have enough experience to choose our interim practical applications more intelligently. We are not going to attempt serious language processing until a significantly more advanced phase in the system’s development. Rather, we are using the system’s inference, association-finding, and concept-formation abilities to enable highly sophisticated data mining – recognition of patterns in complex databases filled with heterogenous data. In particular, we’re applying the current Novamente version to some sticky datamining problems that arise in the analysis of genetic data. And we’re having some significant success! But this is a story that’s best told after some more background on modern genetics has been presented, and so it will be deferred until Chapter 8. There are also fascinating potential applications to the analysis of brain scan data, as will be discussed in Chapter 7, though unlike genetics this is not a type of data analysis we have actually attempted yet.

Recall from Chapter 1 the notion that there are two metasystem transitions involved in the emergence of mind from unintelligent matter. This idea was related there in a Novamente context – and now that so much detail on Novamente has been given, it may be appreciated more fully.

Each of Novamente’s modules has a certain wholeness, a certain synergetic transcendence of the whole over the parts. But the metasystem transition we’re really focused on is the next one up. The big trick is to get emergent intelligence out of the whole mess – active and productive emergent intelligence, wherein the whole mind is engaged in achieving goals by recognizing patterns in itself and the outside world. The specialized pattern recognition and formation routines in the modules aren’t capable of achieving really complex goals or of generalizing from one domain to another. Putting a few modules together can give you functions that normal AI software can’t do – things like using text to predict the financial markets, which we’ll discuss in the following chapter. But putting all the modules together can get you actual intelligence, because the modules are chosen specifically so as to allow the system to understand itself, to recognize patterns in itself. Self-understanding is not an easy thing after all. The modules in the current Novamente system represent pretty much the minimal set required to achieve it, in the particular complex environment that is the Internet. Interaction with other intelligences – us – in a shared environment is a task that uses all the modules of Novamente, integrated together tightly and generating emergent structures that are constantly tested for usefulness.

To teach our baby Novamente, we won’t chat with it about trees and flowers and teeth, because it doesn’t have direct experience of these things. We’ll chat with it about data files and shapes and MIDI music files, because these are the things that we can both experience. Intelligence has to be gained through interactive experience in a shared environment. And it’s intriguing to see how the basic task of learning to interact in the world uses all of Novamente’s specialized modules. Reasoning and genetic programming – evolution – are used to find schema -- sets of basic procedures for seeing and doing and thinking – that are useful at achieving the system’s goals and hence make the system happy. Categorization is needed to define contexts in the world – a schema has to be judged by how it achieves important goals in relevant contexts. Language processing is obviously needed to chat with humans, and although in this context most of the specific nature of human language must be learned, nevertheless the basic structures needed for language understanding need to be provided from the start; learning language as a general set of patterns is a job for millions of years of evolution rather than for months or even years of learning. Data processing is needed to turn raw numerical data files, sensed by the system, into comprehensible perceptual features. And so on. All the link building and node building methods of Novamente’s long term memory, its core, are needed to provide the data that basic behavior schema need to act intelligently.

All this complexity is not 100% obvious from the original vision of mind as a collection of patterns that forms and perceives patterns in itself and the world, in order to achieve complex goals in a complex environment. But once you dive into the details, it does fall out of this general view fairly naturally. A complex environment, including other intelligences, involves a lot of different kinds of things, each one requiring its own specialized pattern recognition and formation mechanisms. Achieving complex goals in such an environment involves forming concepts that span various kinds of things, internal and external things. This requires intense interaction between the various modules of mind.

And so it goes. The best conclusion I can think of is this: There’s no big trick to building a thinking machine, actually. A mind is a collection of patterns that recognizes and forms patterns in itself, in order to achieve complex goals. There are some universal structures and dynamics that it seems any mind has got to have. And it’s possible to build a system possessing these universal structures and dynamics in Java, running on a network of high-powered PC’s. The main problems are these. First, getting the needed memory and processing power. Then, the routine but really annoying software engineering problems of getting such a huge system to actually work in a reliable and efficient way. There’s the problem of parameter tuning – getting the system to regulate itself, all its modules together, in a way that keeps the whole huge system functioning adequately, without any part starving the other for resources. And then there’s the problem of teaching – how do we play mommy and daddy to a baby intelligence so unlike us without driving it totally batty! Fortunately we seem to have solutions to all these problems, and so the creation of the world’s first really thinking machine would seem to be only a year or two ahead of us. And as we walk along the path, we’re building lots of cool components that can – if we play our cards right -- make us money along the way. There are worse ways to spend a few years! And the possibility of our work triggering the Singularity in the fully Vingean sense is also somewhat tantalizing….

Now, after all this, where does the “Web” part of the original Webmind scheme fit in?

Refreshingly, the original vision still fits pretty damn well: the Internet, now even more so than in the mid-90’s, has the potential to give a real AI system both processing power and a rich perceivable/manipulable environment. To make this potential real, however, requires the development of specific Internet software aimed at making the Net useful for AI. Implementing Heylighen’s proposals for adaptive hyperlink weight modification would be a step in this direction. But what’s needed to make the Net truly Novamente-friendly is a good bit more than this. Toward this end, we have designed a global distributed processing framework called WebWorld, which will allow a Novamente (or any other roughly-similarly-structured AI system) to split up its thought processing across literally millions of machines. Some of these machines may not be powerful enough to run Novementes, but may nonetheless be strong enough to run smaller “Webmind auxiliary processing units,” which we call WebWorld lobes. The WebWorld framework was prototyped at Webmind Inc.; a fully functional version was never built, but a fairly complete design exists and if no one else creates something similar, in time a WebWorld variant will be implemented as part of the Novamente project.

Once WebWorld has been built, how exactly will it be used in Novamente? In the beginning, at least, a big Novemente will always have a cluster of dedicated machines as its main mind. But it will farm out various learning problems to thousands or millions of machines elsewhere. One thing that this surplus of machines will allow it to do is to read the huge amount of textual and numerical data that’s out there on the Web, and eventually picture, sound and movie data as well. So, although Novamente is starting out as a program running on a small cluster of machines and operating on a limited pool of data, its need for a rich perceptual environment combined with its limitless thirst for processing power is going to push it onto the Net, totally consistently with the initial vision of a Web mind, an Internet global brain, the Internet turned into a global brain.

Taking this vision one step closer to reality, let’s look at what this might mean in terms of the Internet of the next five or ten years. Of course, we realize that no such “map of the future” is likely to be extremely accurate. The Internet is a complex and rapidly evolving system. No one person, company or computer program can control it. But nonetheless, we can all take part in guiding it. And in order to do this intelligently, an overarching vision is required.

The figure below, drawn from my recent book Creating Internet Intelligence, is an attempt at an “architecture diagram” for the entire Net, in its Webmind-infused form. Naturally, any diagram with such a broad scope is going to skip over a lot of details. The point is to get across a broad global vision:

 
 

 

First, we have a vast variety of “client computers,” some old, some new, some powerful, some weak. Some of these access the intelligent Net through dumb client applications – they don’t directly contribute to Internet intelligence at all. Others have smart clients such as WebWorld clients, which carry out two kinds of operations: personalization operations intended to help the machines serve particular clients better, and general AI operations handed to them by sophisticated AI server systems or other smart clients.

Next there are “commercial servers,” computers that carry out various tasks to support various types of heavyweight processing – transaction processing for e-commerce applications, inventory management for warehousing of physical objects, and so forth. Some of these commercial servers interact with client computers directly, others do so only via AI servers. In nearly all cases, these commercial servers can benefit from intelligence supplied by AI servers.

Finally, there is what I view as the crux of the intelligent Internet: clusters of AI servers distributed across the Net, each cluster representing an individual computational mind. Some of these will be Novamentes, others may be other types of AI systems. These will be able to communicate via a common language, and will collectively “drive” the whole Net, by dispensing problems to client machines via WebWorld or related client-side distributed processing frameworks, and by providing real-time AI feedback to commercial servers of various types.

Some AI servers will be general-purpose and will serve intelligence to commercial servers doing a variety fo particular things; others will be more specialized, tied particularly to a certain commercial server (e.g., Yahoo might have its own AI cluster to back-end its portal services).

Is this the final configuration for the Global Brain? No way. Is it the only way to do things? No. But this seems the most workable architecture for moving things from where they are now to a reasonably intelligent Net. After this, the dynamics of societies of AI agents become the dominant factor, with the commercial servers and client machines as a context. And after that….

Recall the notion of “the Singularity,” first proposed in the 70’s by science fiction writer Vernor Vinge, referring to the notion that the accelerating pace of technological change would ultimate reach a point of discontinuity. At this point, our predictions are pretty much useless – our technology has outgrown us in the same sense that we’ve outgrown ants, beavers, rhesus monkeys and striped cockroaches. The Singularity is not just about AI, but AI may play a special role in the advent of the Singularity, because once it’s sufficiently advanced it can serve as a powerful “metatechnology,” drastically accelerating the pace of creation of new technologies of various kinds.

Eliezer Yudkowsky and Brian Atkins have founded a non-profit organization called the Singularity Institute [http://www.singinst.org/intro.html] devoted to helping to bring about the Singularity, and making sure it’s a positive event for humanity rather than the instantaneous end of humankind. Yudkowsky has put particular effort into understanding the AI aspects of the singularity, discoursing extensively on the notion of Friendly AI – the creation of AI systems that, as they rewrite their own source code, achieving progressively greater and greater intelligence, leave invariant the portion of their code requiring them to be friendly to human beings. We’ll discuss some of these ideas in depth in Chapter 12 below.

The notion of the Singularity seems to me to be a valid one, and the notion of an AI system approach it by progressively rewriting its own source code also seems to be valid. But as usual, there are a few pesky details that only become clear once one has a sufficiently-well-fleshed-out framework within which to analyze them. From a Novamente perspective, the following is the sequence of events that seems most likely to lead up to the Singularity:

1. Someone (most likely the Webmind AI Engine team!) creates a fairly intelligent AI, one that can be taught, conversed with, etc.

2. This AI is taught about programming languages, is taught about algorithms and data structures, etc.

3. It begins by being able to write and optimize and rewrite simple programs

4. After it achieves a significant level of practical software engineering experience and mathematical and AI knowledge, it is able to begin improving itself ... at which point the hard takeoff begins.

My intuition is that, even in this picture, the “hard takeoff” to superhuman intelligence will take a few years, not minutes. But – obviously -- that's still pretty fast by the standards of human progress.

The Singularity emerges, in this vision, as a consequence of emergence-producing, dynamic feedback between the AI Engine and intelligent program analysis tools like the Java supercompiler. The global brain then becomes not only intelligent but superintelligent, and we, as part of the global brain, are swept up into this emerging global superintelligence in ways that we can barely begin to imagine.

To cast the self-modification problem in the language of Novamente AI, it suffices to observe that self-modification is a special case of the kind of problem we call "schema learning." The AI Engine itself is just a big procedure, a big program, a big schema. The ultimate application of schema learning, therefore, is the application of the system to learn how to make itself better. The complexity of the schema learning problem, with which we have some practical experience, suggests how hard the “self-modifying AI” problem really is.

Sure, it’s easy enough to make a small, self-modifying program. But, such a program is not intelligent. It’s closer to being “artificial life” of a very primitive nature. Intelligence within practical computational resources requires a lot of highly specialized structures. These lead to a complicated program – a big, intricate mind-schema – which is difficult to understand, optimize and improve.

Creating a simple self-modifying program and expecting it to become intelligent through progressive environment-driven self-modification is an interesting research program, but it seems more like an attempt to emulate the evolution of life on Earth than an attempt to create a single intelligence within a reasonable time frame.

But just because the “learn my own schema” problem is hard, doesn’t mean it’s unsolvable. A Java or C program can be represented as a SchemaNode inside Novamente , and hence it can be reasoned about, mutated and crossed over, and so forth. This is what needs to be done, ultimately, to create a system that can understand itself and make itself smarter and smarter as time goes on – eliminating the need for human beings to write AI code and write articles like this one.

Reasoning about schema representing computer programs requires a lot of specialized intuition, and specialized preprocessing may well be useful here, such as for instance the automated analysis and optimization of program execution flow being done in Val Turchin and friends’ Java supercompilation project [http://www.supercompilers.com]. There is a lot of work here, but it’s a fascinating direction, and a necessary one.

Call us mad scientists if you will, but all of us involved in the project believe that the Novamente, once fully implemented and tested, will lead to a computer program that manifests intelligence, according to the criterion of being able to carry out conversations with humans that will be subjectively perceived as intelligent. It will demonstrate an understanding of the contexts in which it is operating, an understanding of who it is and why it is doing what it is doing, an ability to creatively solve problems in domains that are new to it, and so forth.

And of course it will supersede human intelligence in some respects, by combining an initially probably modest general intelligence with capabilities unique to digital computers like accurate arithmetic and financial forecasting.

We believe we’ve covered all the bases: every major aspect of the mind studied in psychology and brain science. They’re all accomplished together, in a unified framework. It’s a big system, it’s going to demand a lot of computational resources, but that’s really to be expected; the human brain, our only incontrovertible example of human-level intelligence, is a complex and powerful information-processing device.

Not all aspects of the Novamente system are original in conception, and indeed, this is much of the beauty of the thing. The essence of the system is the provision of an adaptable self-reconstructing platform for integration of insights from a huge number of different disciplines and subdisciplines. In Novamente, aspects of mind that have previously seemed disparate are drawn together into a coherent self-organizing whole.

The cliché Newton quote, “If I’ve seen further than others, it’s because I’ve stood on the shoulders of giants,” inevitably comes to mind here. (As well as the modification I read somewhere: “If others have seen further than me, it’s because giants were standing on my shoulders.”….) The human race has been pushing toward AI for a long time – Novamente, if it lives up to our aspirations for it, will merely put on the finishing touches.

While constructing an ambitious system like this naturally takes a long time, we were making steady and rapid progress until Webmind Inc.’s dissolution in early 2001. It seems Arthur C. Clarke was off by a bit -- Webmind won’t be talking like HAL in the film 2001 until a bit later in the millennium. But we’re currently scraping by with a small team, and making significant and steady progress. Accurate timing estimates remain difficult to make, but if we manage to keep well enough funded to keep the current team full-time on the project, we believe Novamente’s first moderately intelligent conversations will take place sometime in the next few years … and that’s going to be (to use the technical term) pretty bloody cool!

What are the complaints and counterarguments most often heard when discussing the Novamente project with expert outsiders? We’ve already discussed some of these above.

First, there are those who just don’t believe AI is possible, or believe that AI is only possible on quantum computers, or quantum gravity computers, etc. Forget about them. They’ll see. You can’t argue anyone out of their religion. Science is on the side of digital AI at this point, as has been exhaustively argued by many people.

Then there are those who feel the system doesn’t go far enough in some particular aspect of the mind: temporal or causal reasoning, or grammar parsing, or perceptual pattern recognition, or whatever. This complaint usually comes from people who have a research expertise in one or another of these specialty areas. The Novamente system’s general learning algorithms, they say, will always be inferior to the highly specialized techniques that they know so well.

My feeling is that the current Novamente design is about specialized enough. I don’t think it is so overspecialized as to become brittle and non-adaptable, but I worry that if it becomes more overspecialized, this will be the case. My intuition is that things like temporal and causal reasoning should be learned by the system as groundings of the concepts “time” and “cause” and related concepts, rather than wired in.

On the other side, there are those who feel that the system is “too symbolic.” They want something more neural-netish, or more like a simple self-modifying system as I described in Chaotic Logic and From Complexity to Creativity.

I can relate to this point of view quite well, philosophically. But a careful analysis of the system’s design indicates that there is nothing a more sub-symbolic system can do that this one can’t. We have SchemaNodes embodying Boolean networks, feeding input into each other, learning interrelationships via neural-net-like mechanisms such as Hebbian learning, and being evolved by a kind of evolutionary-ecological programming. This is in fact a sub-symbolic network of procedures, differing from an evolutionary neural net architecture only in that the atomic elements are Boolean operators rather than threshold operators – a fairly insubstantial difference which could be eliminated if there were reason to do so. The fact that this sub-symbolic evolving adaptive procedure network is completely mappable into the symbolic, inferential aspect of the system is not a bad thing, is it? In fact, I would say that in the Novamente design we have achieved a very smooth integration of the symbolic and subsymbolic domains, even smoother than is likely to exist in the human brain. This will serve the system well in the future.

There’s the complaint that Baby Novamente won’t have a rich enough perceptual environment with just the Internet. Maybe. Maybe we’ll need to hook up eyes and ears to it. But there’s a hell of a lot of data out there, and the ability to correlate numerical and textual data is a good correlate of the cross-modal sensory correlation that is so critical to the human brain. I really believe that this complaint is just plain old anthropomorphism.

There’s the complaint that there are too many parameters and it will take forever to get it to actually work, as opposed to theoretically working. This is indeed a bit of a worry, I can’t deny it. But we’ve gone a long way by testing and tuning the individual modules of the system separately, and so far our experience indicates that the parameter values giving optimal function for independent activity of a mind module are generally at least acceptable values for the activity of that mind module in an integrated Novamente context. A methodology of tuning parameters for subsystems in isolation, then using the values thus obtained as initial points for further dynamic adaptation, seems very likely to succeed in general just as it has in some special cases already.

Finally, there are those who reckon the design is about right, but we just don’t have the processing power and memory to run it, yet. This complaint scares me a little bit too. But not too much. Based on our experimentation with the system so far, there are only two things that seem to require vastly more computer power than is available on a cluster of a few dozen powerful PC’s. The first of these, the learning of new procedures for acting appropriately in various situations (“schema learning,” in our lingo) is something that can be done offline, running in the background on millions of PC’s around the world. WebWorld. And the second, real-time conversation processing, can likely be carried out on a single supercomputer, serving as the core of the AI Engine cluster. We have a very flexible software agents system that is able to support a variety of different hardware configurations, and we believe that by utilizing available hardware optimally, we can make a fairly smart computer program even without the massive advances that Moore’s law will quickly bring. Of course, the more hardware we have, the cleverer our system will become… and soon enough it will be literally begging us for more, more, more!

I’ll close this chapter with a quote I found in the book “Conversations with a Mathematician” by algorithmic information pioneer Gregory Chaitin. Chaitin is not an AI researcher, although his mathematical work was inspirational for some parts of the Novamente design, and in his career at IBM he has kept apprised of their diverse AI work. When an interviewer asked him about AI and its relationship with some mathematical ideas, he said:

“[M]y personal opinion is that AI is not a mathematical problem, it’s an engineering problem…. To me a human being is just a very complicated piece of engineering that’s exquisitely well-suited for surviving in this world….

“[I]t’s very often the case that theoreticians can show that in theory there’s no way to solve a problem, but software engineers can find a clever algorithm that usually works, or that usually gives you a good approximation in a reasonable amount of time. And I think that human intelligence is also a little bit like that, and that it’s a matter of creeping up on it little by little, a step at a time, until we can usually do a good job imitating it.

“In fact I think that we may be almost halfway there, only we don’t realize it, and that fifty years from now we’ll be close to a real AI, and then people will wonder why anyone ever thought that it was difficult to create an AI. This AI won’t be the result of a theorem, it’ll be a mountain of work, a giant engineering project that was built piece by piece, little by little, just like what happens in Nature. As the biologists say, God is a tinkerer, he cobbles things together, he patches things up, he makes do with what he has to create new forms of life by experimenting with sloppy little changes one step at a time….

“We humans aren’t artistic masterpieces of design, we’re patched together, bit by bit, and retouched every time that there’s an emergency and the design has to be changed! We’re strange, awkward creatures, but it all sort of works! And I think that an AI is also going to be like that….

“[A] working AI is going to be like some kind of Frankenstein monster that’s patched together bit by bit until one day we realize that the monster sort of works, that it’s finally intelligent enough! “

I think he overstates the case a little bit. There is a kind of elegance and order to complex adaptive systems with emergent behavior, which is different from the elegance and order in modern mathematics. But still, I like his articulation of a point that has always seemed to me a piece of “AI common sense,” but that yet seems to elude most academic AI theorists. Building a mind is hard. But so was building the Apollo rocket, so was building a computer. The magic that we subjectively feel – we minds, we emergent patterns in our brains – is mind-level magic, not brain-level magic. The presence of this subjective experiential consciousness-magic doesn’t imply that it takes magic to build digital brains. Building a brain is hard work, requiring a lot of really smart people collaborating together, and the integration of results from many different kinds of science. But this kind of hard work is exactly what the ongoing sci-tech revolution is all about.

When I forwarded this Chaitin quote to some friends, one person replied, basically: “But that doesn’t say anything! Of course, anyone who believes that the mind is a machine, automatically believes building a digital mind is an engineering problem.”

Of course, there is a little truth to this retort. Chaitin actually made the quoted statement in response to a question about Roger Penrose’s claim that the mind is not a machine in the standard sense, but rather some kind of cosmically nonlocal nondeterministic quantum gravity system. However, I think there is a little more substance to Chaitin’s view that just “mind is mechanical.”

Yes, of course everyone who believes that the mind is a machine believes that there is an engineering problem involved in building a mind. But the question is: is the problem primarily one of engineering, or primarily one of mathematics, or primarily one of neuroscience, or primarily one of cognitive psychology, etc.

I know a few individuals who believe the mind is a machine, but who believe there is some simple mathematical trick underlying mind operations, and that if we just find this trick, then creating an artificial mind will be easy. So they believe that figuring out the math is the main thing required. My former Webmind collaborators Shane Legg and Youlian Troyanov (commonly known as the “Bulgarian Madmind”) have at various times held this opinion with varying levels of strength.

Ray Kurzweil and others believe that the main problem is figuring out exactly how the human brain works. Once this is known, they reckon, it's just a matter of emulating the brain on a sufficiently powerful computer, using a simple neural simulator program (feeding in the exact distribution of neurons, neurotransmitters, synapses, etc. as inputs).

Gregory Chaitin, on the other hand, is making the statement that mind is mechanical, but he's also making the statement that the task of constructing a thinking machine requires primarily engineering-type thinking.

Chaitin's view is a bit like that of Danny Hillis, who stated that he thinks intelligence is just "a lot of little things all working together." Marvin Minsky's Society of Mind theory of AI is somewhat in this direction as well. These guys don't place much stock in emergence, and in the need for different structures and dynamics to be exquisitely harmonized together.

Also, it should be made clear that when Chaitin contrasts engineering with mathematics, he is taking a pure mathematicians view of mathematics. In my own AI work, I have not yet had the opportunity to apply "deep math." There have been no profound theorems proved about Novamente or Webmind components, critical to the AI work. On the other hand, there have been plenty of applications of known math, e.g. probability theory, combinatory logic, nonlinear dynamics, and so forth. To a real mathematician like Chaitin, working out fairly straightforward applications of known math is not "doing math." I’m well aware of the pure mathematician’s attitude, having started out my career in this environment; and when I first started I hoped that pure math could provide the solution to AI – just prove the “Fundamental Theorem of Mind” and you’re in like Flynn … real AI is yours. But it doesn’t seem to work that way; mind isn’t that sort of thing.

Of course, the “it’s all engineering,” “it’s all neuroscience and fast hardware” and “it’s all the right math formula” type views are extremes. Most of us involved in Real AI theory or practice probably hold views that are intermediate between these extremes. It's the extreme views that get remembered and propagated because they're so compact to state. My own view is an intermediate one: I think it takes a mixture of philosophy, neuroscience, math, engineering and psychology. When I started out I underestimated the importance of the "engineering" part, but recognizing that importance doesn't mean denigrating the importance of the other aspects. It is precisely the need for integrative input from so many domains of inquiry that makes digital mind design (as opposed to simple digital emulation of the brain cell by cell or molecule by molecule) so hard – and so delightful.