Ninth Speaker: Douglas Hofstadter

Course Syllabus

Further Information

Invited Speakers

Article Summaries

Listserv Archive

The Neural Theory of Language Project

metaphor

blending and conceptual integration

xxxxx

Questions and Summary

by Alexandra Horowitz and Adrian Robert

Brief, and sometimes over-pithy, summary of chs. 4 and 5, Fluid Concepts and Creative Analogies
Alexandra Horowitz
Cognitive Science Department, UCSD
ahorowit@cogsci.ucsd.edu


In chapter 4, Hofstadter et al quickly review the technological and ideological work that preceded their creation of the analogy-forming "Copycat" program. They are motivated both by their confidence that analogies are a core creation of the human cognition we are interested in; and, too, by the failure of previous attempts to capture this cognitive accomplishment. A few important definitions and delineations channel their approach. First, no one in cognitive science at UCSD will be alarmed to hear their stout insistance that perception is importantly influenced by -- and influences -- other, "higher-level," cognitive processes. Further, they want to distinguish both low and high ("semantic:" where "concepts" come in) levels of perception, for while they will claim that many analogy programs to date have neither, Copycat will rely on the separation thereof. High level perception is characterized by its flexibility, and thus can be seen as an important part of the creative process: part of deciding what is relevant in input, and the ordering of the information being considered. A mistake that most AI researchers have made, the authors claim, is of assuming that perception and cognition are separable at all: that you can consider one without the other, and then later go back and fill in the missing half.

A second important definition made in this chapter is that of analogy itself. To Hofstadter, an analogy succeeds when it gets to the "essence" of two situations or objects. The authors go on to divvy up the processes of analogy into two (non-temporally separate) parts: the representation, and the mapping. Specifically, salient attributes of the situations at hand must be chosen for a kind of working-memory representation; next, mapping across corresponding elements of the situations occurs. Ta-da, a lovely analogy. This distinction informs the crux of their complaints about other computational models of analogy. Most make _explicit_ the representation, by hand-coding in exactly the bits of knowledge that are helpful to forming an analogy. Thus only the mapping (for H, the less interesting -- and certainly easier -- element) is left to the computer. A typical program might, e.g., consist of predicate logical structures describing "Iran-Contra" that have _exactly the same_ shape to those describing "Watergate;" all the program must do is draw the lines between the corresponding shape-parts. The representations are rigid; as a result, the program can only do exactly as much as their programmers code into them, even if to the human eye there are other appealing and obvious connections to be drawn. As another example, the success of an earlier program in "discovering" Kepler's third law of planetary motion anew clearly seems to lie in the fact that the program was provided with just the information about planets and periods of rotation that are constituents of the principle, with no extra, irrelevant facts intervening. It is Hofstadter's strong contention that it is, instead, the ability to filter through the range of facts and conceptual frameworks available, to the _relevant_ ones, that gives a discovery such as Kepler's power.

some idle questions: -- One of the reasons H rejects previous models of analogy is that the representations are built in ready-to-cook -- i.e, the relevant information is provided. A counter might be levied that such programs could in theory take large representations instead, in which the program must search through the attribute shapes, and that this would solve H's problem. In fact, many programmers have gone ahead and provided their programs with dummy information that they must learn to ignore; this seems to show the ability to match shapes of predicates, even over a non-trivial representation space. Is this computationally plausible? if so, would it be a satisfactory model of analogy?

-- Copycat ignores low-level perception. Can it, thus, amount to anything substantively better than what other analogy-making programs do (i.e., ignore all perception)?

_____________________________________________________________________________

Chapter 5, as told by Alix (for comparison, see Chapter 5, as told by Adrian)

Mitchell and Hofstadter created Copycat, a microdomain analogy maker (or, as occasionally caveated, a "fluid concept" maker), in response to the sorts of issues raised in chapter 4. A brief survey of its superficial output is in order to get the gist of the program: if the string of letters "abc" is changed to "abd", how can you need to change the string "ijk" in "the same way"? M&H are interested in the fact that we quick-simile-makers will easily come up with "ijl", and, if pushed, will maybe come up with other possibilities (which we will likely judge to be less satisfactory, in some nebulous way). Given strong intuitions about how this process takes place in humans, M&H starts to characterize the types of considerations that go into our making of the analogy. It is these very complex and well-formulated intuitions that give Copycat its interest; they, too, might be the source of its downfall, in considering its applicability outside the tiny microdomain of strings of letters.

A central concept in M&H's formulation is that there are pressues on the analogy-making system -- be it human or computer -- that make us see features of the input that _stand out_ to us. In "abc", it is clear that these are successive alphabetic letters. The "succession" property, in particular, is what allows us to start to connect its form to that of "ijk". M&H will give Copycat a finite number of similar such properties, as well as relationships and other attributes, to look out for. Very generally, the strength of a property will vary according to what other concepts are nearby, what has been activated so far, and the abstractness of a property (consider, for example, that what we see as the "nicest" analogies often are correlated in very deep ways -- not just on the mapping of letter to letter). Interestingly, many possible attributes of strings-of-letters are _not_ given to the system to consider as possibly relevant to the formation of analogies: Copycat will never form an analogy that "a:e" as "l:p" (where the correlation is the number of letters that separate the objects). This would not be a difficult analogy for a human, after a moment's thought.

The architecture of the program is threefold: (1) there are "concepts" -- nodes that represent a kind of core idea, with a halo of more context-dependent ideas surrounding it -- in the "Slipnet" memory-site. Concepts are: a, b, left, sameness, etc. Links betwen concepts are variable in strength and duration, and are what bring concept-halos to overlap. A good feature of this architecture is that it accommodates closeness -- and thus co-activation of -- types of concepts that might make for good analogies but that are meaningfully opposite: e.g., allowing the simile "abc:abd as xyz:wyz", where the concepts "successor" and "predecessor", and "leftmost" and "rightmost" are linked. (2) In the "Workspace," objects are considered by little "codelets" , from (3), the "Coderack" space, who either scout out the possible future object trails or construct or destroy concepts or bonds. Activation is reciprocal between the workspace and slipnet: just the sort of communication that might mirror a perceptual-conceptual communication in humans.

A number of important features emerge from the Copycat set-up. The codelets are chosen, on any given run, randomly. But over time the behavior of the system as a whole starts to look deterministic, because it gauges its place by measures of pressures, "temperature" sensitivity to goodness and to amount of randomness, and so on. Thus the system has feedback between levels, a (in some sense) non-deterministic course of decision-making, an appreciation of deep concepts over more superficial ones, the emergence of "themes" in analogy-forming: i.e., fluidity, just as a human mind does.

there are infinitely more details. but! on to Questions!

-- Although H starts with definite intuitive notions of how we form analogies, in the end he sidesteps the comparison of Copycat's data to human data. Would such a comparison hold up?

-- In formulating "concepts" as he did, H chooses a kind of prototype-based model of concepts. Would a model like his work with some other equally-viable notion of concept (say, exemplar-based)? i.e., if not, the model is perh overly committed to the notion of concept chosen, and thus the conceptual form is imbued with a kind of untested power.

How is the delineation of the features (symmetry, alphabetically-first, etc) that the program may attend to _not_ giving it exactly the pre-determined knowledge -- the "Representation" -- that undermines the other analogy programs? There are an infinite number of possible conceptual predicates; isn't the most mysterious question how we sort through all of those and come up with something coherent?

-- One of the reasons this program is of such interest, besides being a success in its own tiny world, is if -- as alluded to several times -- it is extendible beyond that microdomain. Have any tests been done to see if the strategy scales up (e.g., to large numbers of features)?

-- The method with which Copycat deals with a "snag" (as when presented with the example "abc:abd as xyz:???", for the program does not see "a" as following "z" in its helpful way) is portrayed as something of an epiphany; is it substantially different than other strategy-changes?




Here is a little bit more on the Hofstadter readings, overlapping somewhat with what Alix sent but hopefully still useful! The last chapter is not really covered, so people will have to read that for themselves. :-)

Adrian Robert
Cognitive Science Department, UCSD
arobert@cogsci.ucsd.edu
----------------------

1) The Approach

Copycat is actually only one of a family of similar models that have been built by Hofstadter's group (henceforth H) over the years, but it is one of the most extensively developed ones and good for presenting their methods. H's approach is shaped by two beliefs: first, that pattern-finding is at the core of intelligence and that analogy-making is central to pattern-finding, and, second, that cognition and perception are inseparably intertwined.

The first belief determines *what* they model: SeekWhence extrapolates number sequences, Jumbo and Numbo solve anagrams and number puzzles by finding and combining based on known patterns, Copycat solves letter sequence analogies, Tabletop solves tabletop utensil arrangement analogies.

The second belief determines how they model it. How you cognize about something is determined by how you represent it, and representation is founded ultimately in perception. H argues that the essence of human problem solving often -- if not always -- comes down to finding the (a) right way of looking at the problem, as opposed to, say, doing some long combinatoric manipulation on a given initial way of looking at it. This view will sound familiar to followers of Ed Hutchins's work, and in fact it was also the view of the Gestalt psychologists, particularly Max Wertheimer. However, while Ed focuses on transformations of representation that occur outside individual minds, and Wertheimer studied mainly one-step, "aha" transformations, H likes to look at temporally-extended representation-building, of the sort that occurs when solving, for example, anagrams or letter sequence analogies.

H identifies in these cases a kind of accretive process, in which representations are gradually built up in a quasi-hierarchical fashion, and which is plausibly employed by humans habitually in all sorts of more serious domains (such as hacking together linguistic theories). For example, in solving an anagram, we might first start by noticing combinations of letters that form common suffixes, like 'ion' or 'ed' and then holding these in mind while we check if the other letters form a stem. This can be hierarchical as well -- we might add 'ot' to 'ion' and look for stems, then if this doesn't work, break it up but still hold the part 'ion' for further research.

Another example is a typical "Copycat problem". Given the pair "abc->abd", what does "wxy" map onto? Here, one way to go would be to 'see' "abc" as an ascending sequence and then "abd" as a sequence skipping one at the end (giving solution "wxz"). But if you had "xyz" to start with instead of "wxy", you would be foiled here, but holding onto this idea of sequence, you might build a further representation of "abc" as a sequence 'away from one end of the alphabet', and so answer "wyz".

So, introspectively, you have this process of gradually building up a way of looking at something in some kind of goal directed manner. H's goal in modeling is to try to make something like this and, by being forced to work out details to get it to work, gain a better idea of what might be going on beneath this introspective view.

This leads to the concept of using an artificial "microdomain" rather than some part of a real domain: much of the action seems to lie in the hierarchical building up of representations, so you want to be able to have structures at several levels in your model, and you want each level to be sufficiently rich to give play for flexibility in building combinations.

2) The Architecture

Copycat works by building up perceptual/conceptual representations in a kind of working memory or 'current view' space called the "workspace". The initial workspace contains the raw input, and the representation is built by the action of operators called "codelets". Codelets are short recognition-action pieces of code that evaluate a particular perceptual grouping and possibly implement it by binding the elements together. For example, a codelet might check whether two letters form an alphabetic sequence and if so bind them into a new unit. Typically binding occurs in several stages: early on, things are provisionally bound and their subelements can still participate in other, possibly conflicting combinations; later, they are strongly bound and the subelements are parts of this group only. The set of codelets that can potentially be activated at any given time sit on the "coderack" and are probabilistically chosen for execution based on their urgency weightings. Certain codelets, if they succeed in their groupings, will cause other ones to be placed on the rack. Usually, for example, a codelet producing a weak binding will generate a codelet that could produce a stronger binding of the same type. Importantly, a bound set of elements acquires new interactive properties that change how codelets can evaluate and operate on it, and also, a bound set can be broken down again under certain conditions.

Thusfar this sounds a little like a production system, and even more like a biochemical situation in which there are a number of chemical components (the representations) and enzymes (the codelets) which act on them. But added to this is a further twist, a network called the "slipnet", which guides the generation of "top-down" codelets. (A bunch of "bottom-up" codelets, which build and bind low-level structures, are automatically thrown into the soup at the beginning.) The slipnet is a permanent structure with nodes which stand for per/concepts like predecessor or successor which, if highly activated, place particular codelets on the coderack. Links connect the nodes, allowing spreading activation, and there are also connections from nodes to links, which modulate their weights. Basically, the slipnet represents permanent conceptual knowledge and the structure of associations between concepts. For example, when we think of successor, we may also think of predecessor, because it is a complementary relation. This would be represented by a link between nodes for "predecessor" and "successor" in the slipnet, and also a link to this link from a node for "complement".

3) Evaluation

All of this just to solve letter analogies, you say! (And I have left out a lot!) One of H's points is that you NEED this kind of complexity to really capture what is interesting about human thought processes. For this reason they criticize models like SME. It is not that SME is not valid for doing what Forbus and colleagues say it is doing, modeling one aspect of human thought mechanisms. But they question how much insight you get from just looking at one component in isolation. ...

Q: How many levels do you need in a model to really do what H wants to do? Maybe the microdomains are still too shallow and impoverished to lend any insight other than into the activity of hacking up programs to solve a certain class of problems... Also related to this, some modelers such as Edelman and colleagues (Neural Darwinism) have argued forcefully for the need to include a motor component in any model, otherwise important interactions are left out. (They make the same argument for perc-motor as H does for perc-cog. In the field of neuroscience, at least, evidence for the utility of this motor "philosophy" has been building...)

Q: Regarding the slipnet, the idea of having modulating links from nodes to other links seems to capture very well the idea that in thinking you can trigger _relations_ by association, not just other entities (see example above). But I wonder whether this mechanism is too powerful, in that it can potentially generate useful associations far more readily than a human with the same knowledge. Or maybe it is too weak, and humans are able to leap intuitively to associations that the slipnet could never dream of. From both intution and discussions of the other models (SME, etc.), this seems to be a delicate area.

Q: One of the most distasteful features of the slipnet to many is the fact that it is hardwired, and some of this problem may come from the fact that many of its nodes represent quite high level things that clearly have a complex structure. Has the idea been explored of building the slipnet with a kind of hierarchical feature-entity-proposition structure like the LISA network or something similar? With more structure visible, it would begin to seem more plausible as the kind of thing that humans could acquire through experience, and perhaps it would open a way to add learning to the model.

| Syllabus | Speakers | Article Summaries | xxxxx | xxxx |


Contact: Gilles Fauconnier
Web Design: Omar Alhassoon