Seventh Speaker: Arthur Markman

Course Syllabus

Further Information

Invited Speakers

Article Summaries

Listserv Archive

The Neural Theory of Language Project

metaphor

blending and conceptual integration

xxxxx

Questions and Summary

by Mike Hayward and Ezra Van Everbroeck



MARKMAN SUMMARY
Mike Hayward
Dept. of Cognitive Science

This summary covers the higher-level aspects of Markman's work on analogy: the constraints governing the mapping process (the "Constraints on analogical inference" paper) and some of the issues he and his colleagues debate with Hofstadter (the "Analogy just looks like..." paper). The lower-level details of the MAC/FAC and SME models are described in the other summary.

Paper 1: Constraints on analogical inference
--------------------------------------------

This paper focuses on identifying the principles/constraints guiding the following key processes: (1) Creating a mapping between corresponding elements of a base and target (2) Carrying out a set of inference procedures which copy over some subset of the structure of the base into the target.

Specifically, Markman is hoping to identify principles which determine *which* information will be copied over.

Mappings are correspondences between structured mental representations, which consist of entities (e.g. the sun), attributes (e.g. hot(sun)), relations (e.g. orbit(planets,sun)), and functions (e.g. mass(sun)). All of these are "elements" which may be mapped, copied, substituted, etc.

According to Markman, the most general description of the candidate inference generation process is "copying with substitution and generation" (CWSG): for any element in the base domain with a corresponding element in the target, copy over all of the representational structure "attached" to that element. In the copy, substitute target-domain entities whenever a base-domain entity with a known mapping occurs. If no such mapping exists, copy the base-domain entity unchanged.

Here's a simple example. Given:

base: English Department
facts: causes( obtain( Eng_faculty, grant ), hire( Eng_faculty, RAs ) )

target: Computer Science Department
facts: obtain( CS_faculty, grant )
we would get the inference

causes( obtain( CS_faculty, grant ), hire( CS_faculty, RAs ) )

because the obtain() predicate matched, so the entire structure it was embedded in (the "causes" predicate) was copied. "Eng_faculty" and "grants" had mappings into target domain objects (by virtue of their being arguments of the shared predicate "obtain"), so they were substituted. "RAs" had no such mapping, and so was copied unchanged.

But, this CWSG is not very particular about what gets copied - even the most insignificant correspondence between entities (like "grant" in each domain) might cause huge structures to be copied over. Unwarranted inferences may result.

So, what additional constraints are needed? Markman focuses on two:

(1) Systematicity. Matching on minor entities (like "grants") is not good enough; we need correspondences between relational structures. Matching predicate structures are referred to as "shared system facts", and only material connected to shared system facts will be copied over. The bigger the shared system of interrelated facts, the better.

(2) One-to-one mappings. Sometimes correspondences can occur from many base elements to one target element (or vice-versa). This is problematic for CWSG, because there may be more than one possible substitution for an element. The key question here is: do we build a single many-to-one homomorphism, or many one-to-one isomorphisms? Markman suggests that humans do the latter, generating several candidate interpretations each of which is fully specified and internally consistent.

The paper then goes on to describe a series of human experiments exploring the behavioral evidence for these constraints.

The Experiments
---------------

The basic design is as follows: subjects are given descriptions of three departments at a base school, and three (different) departments at a target school. The base school descriptions contain 2 key conditional statements of the form "X causes Y" for each department. The target school descriptions contain 2 key facts per department, which correspond to antecedents of the conditionals found in the base descriptions. So, if "X causes Y" in the Computer Science department of the base school, and "X" is true of the Music department in the target school, then an appropriate analogical inference would be that "Y" is true of the Music department in the target school (with appropriate substitutions made into Y).

The studies mix up the correspondences, so that a single target department might have facts that correspond to antecedents in two different base departments, or vice-versa.

Subjects were then asked which departments were in correspondence with each other (with explicit instructions to allow many-to-one mappings), as well as what outcomes they might predict given the key facts in the target domain.

Results
-------

On the question of systematicity:
Many more of the subjects' inferences were based on shared system facts than would be expected by chance. That is, they commonly matched key predicates in the target domain to the identical antecedents in the base domain. This is taken as evidence that systematicity plays an important role in determining which inferences are made. In should be noted, however, that in many (most?) cases shared system facts accounted for only about half of the inferences made.

On the question of the one-to-one mapping: In all of the studies, there was not a single instance of an inconsistent object substitution (of the kind CWSG was prone to, if it were to allow many-to-one relations). Markman takes this as evidence that people were in fact strictly conforming to the one-to-one mapping constraint. Note that, as I understand it, such an inconsistent substitution would require a subject to respond with something as implausible as: "If the CS department received the grant, I predict the Music department would hire more RAs".

Markman concludes with a brief review of models, summarized here:
SME (Forbus, Gentner, etc) - provides both of these key constraints
IAM (Keane et al) - provides both of these key constraints
ACME (Holyoak & Thagard) - doesn't enforce the strict one-to-one mapping; therefore subject to inconsistent substitutions
LISA (Hummel & Holyoak) - unconstrained in current design, but these constraints could be integrated



Paper 2: Analogy just looks like high level perception:
Why a domain-general approach to analogical mapping is right
------------------------------------------------------------

Doug Hofstadter has levelled many criticisms against structure-mapping and related approaches that Markman & many others use to model analogy. This paper is presented as a reply to those criticisms, though much of it is spent actively attacking Hofstadter's model, Copycat.

The structure-mapping approach decomposes analogy into representation, access, mapping, evaluation, adaptation, verification, and schema- abstraction. SME implements this theoretical framework, applying it to structured representations composed of formal logic predicates reminiscent of symbolic AI. Ezra's summary should describe SME in some detail.

For the rest of this summary, I will preface comments by Hofstadter's camp with "CFH:" (Chalmers, French, & Hofstadter), and by Markman's camp with "FGMF:" (Forbus, Gentner, Markman, & Ferguson).

CFH:
* analogy is a form of high-level perception;
* the mapping process cannot be separated from the perceptual process;
FGMF Reply:
* cognitive and perceptual processes are mutually dependent but not inseparable.
* perceptual processing *can* be modularized (see example on pg. 16)

CFH:
* the key to analogy is gist extraction
FGMF:
* the key to analogy is mapping & inference

CFH:
* SME uses very small representations with the key information already handpicked & brought to the forefront; this isn't remotely as rich as real human experience & most of the hard work is already done.
FGMF Reply:
* SME *is* given some irrelevant information that it must ignore
* SME *has* been used on some "large" representations (Markman lists some examples, such as Phineas, which learns physical theories by analogy).
* SME does not necessarily require hand-constructed data; it has been run on the output of other programs (again, Phineas is an example of this).

CFH:
* SME is not psychologically realistic
FGMF Reply:
* SME has been used to explain the observed developmental shift from object matching to relational matching
* SME predicts the confusion observed when two equally attractive mappings are possible
* SME has led to a new framework for understanding "dissimilarity" (as discussed in Markman's talk of 02/25)

CFH:
* SME is too limited to model complete "discovery" of an analogy.
FGMF Reply:
* those simulations were not intended to model the full discovery process, just the mapping subprocess.

CFH:
* analogies can flexibly & gradually evolve in a situation; Copycat's simulated annealing system deals with this. SME, on the other hand, has to match predicates exactly & inflexibly.
FGMF Reply:
* SME can create more than one interpretation (e.g., one that matches objects and one that matches relations).
* SME allows non-identical function mappings in the case where higher-order structures are being mapped.
* SME uses domain-general constraints (like structural consistency), as compared to Copycat's domain-specific ones

CFH:
* research task is to get output that matches human behavior, as observed in "casual discussions with a handful of friends".
* aims to model the analogy-generation of brilliant minds, not typical ones.
FGMF:
* must model the *processes*, not just the output.
* a range of methods are needed, and traditional techniques from cognitive psychology apply.
* a general model applicable to *all* humans is the goal

FGMF on the offensive:
* Copycat *can* be decomposed cleanly into perceptual/representational and cognitive/comparative components.
* Copycat has a restricted an input format that doesn't much resemble real-world perceptual stimuli.
* Copycat, with its dependence on perception, has left out memory access as a source of input.
* Copycat lacks schema-abstraction/learning mechanisms
* Copycat can only make classes of correspondences that were foreseen & hard-coded by its designers.
* As a direct consequence of its unification of perception and comparison, CFH's model predicts that there should be *no* domain-independent theories of analogy.


Some Additional Issues to Consider
----------------------------------
* CFH argue that SME has no "depth" to its concepts; they're just labels. The FGMF reply is basically: "neither does Copycat". For instance, Copycat has no representation of geometric similarities of letters, only identity and sequencing. Is the argument that Copycat could *not* capture geometric similarities, or that it simply doesn't yet have that component? Could it completely characterize the conceptual space of such a small microworld, or is it the case that even the microworld approach cannot escape the combinatorial explosion?

* In a related issue, structured representations typically cannot represent similarity between predicates intrinsically, like a distributed representation can. Does this make the handling of "fuzzy" matches (like "big()" to "tall()") inelegant/impossible?

* The paper briefly alludes to a process of re-representation which might help with the n-ary restriction (the fact that the unary predicates tall(A) & short(B) can't match the binary predicate taller_than(C,D) the way humans might). What principles might govern such a general process as re-representation?

* Both groups recognize the need to scale the models up to larger, more complex (and thus more psychologically realistic) knowledge bases. What might the major hurdles be, when trying to scale up SME?

* When Holyoak visited, he commented on SME's "utter disregard for the capacity limits of working memory and attention"; can this model work in some serial manner that wouldn't overstress working memory with a giant "copy with substitution"?





Summary and questions by Ezra Van Everbroeck
Dept. of Linguistics
ezra@ling.ucsd.edu



Here's the complement to Mike's summary. I'll be dealing with the lower- level, implementational aspects of the models described in the MAC/FAC paper, and the 'Analogy just looks like ...' paper. I'll start with the former, because it's older (and presupposed in the latter), and because it provides a much more detailed description of a working model.

MAC/FAC: A Model of Similarity-based Retrieval ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Forbus, Gentner & Law (henceforth FGL) set out to build a comprehensive computational model of analogy. This is somewhat harder than one might at first expect, as they point out that there are actually three kinds of analogy:

* Analogies just based on surface similarities between two items: e.g. someone's round glasses (i.e. the target) may remind you of the wheels of a bicycle (i.e. the base), two round objects which are also connected by a metal frame.

* Analogies just based on structural similarities between base and target: e.g. visualizations of the structures of molecules always reminds me of the linguistic analysis of sentences; in both cases, you have independent elements (atoms/words) which are grouped into more complex structures.

* Analogies based on both structural and surface similarities: e.g. seeing one computer may remind you of another one. In this kind of 'literal similarity' match, base and target both tend to appear alike for our senses, and also share many other functional and structural properties.

Any model of human analogy will have to be able to come up with these three kinds of analogies. Moreover, psychological experiments have also shown that surface similarities dominate in retrieval processes, but that structural similarities are considered more important for higher-level inference processes. So, despite the roundness of both bicycle wheels and glasses, you are unlikely (I hope) to infer that glasses need to be inflated once in a while. On the other hand, knowing that the words in sentences are bound together by certain forces like semantic restrictions, syntactic subcategorization frames, and pragmatic factors is much more likely to suggest to me that the atoms in a molecule are also kept together because of some other forces. Thus, the inferencing process works when you have the target item in working memory, you are reminded of the base item (stored in LTM), and you transfer knowledge you have about the base to the target. In the framework of FGL, such transfer is subject to two constraints, however:

* One-to-one mapping: one and the same item in the base cannot be linked to more than one item in the target - so, trivially, there is much less chance of a monocle reminding you of a bicycle than someone's glasses.

* Parallel connectivity: if you map function(A, B) in the base onto function (C, D) in the target, then the properties of A and B should also hold of C and D.

According to FGL, the two older generations of analogy models fail to account for the known facts about human analogy: (symbolic) case-based reasoning models are very good at working with structured representations of memory items, but are not scalable, and they cannot account for the fact that people pay more attention to surface similarities in retrieval; (connectionist) feature-based models, on the other hand, are much more scalable and can pay attention to surface similarities, but have serious problems modeling the structural relations which are so important for inferencing. Hence, the need for MAC/FAC (Many Are Called/but Few Are Chosen) to combine the best of both worlds.

The MAC/FAC model works in two stages: in the first one (MAC), a computationally cheap process looks for similarities between the item in working memory and the items in LTM, and then selects the one which matches best, plus any other items which match closely (within 10% of the winner), for further processing by the FAC. The similarity matching at this stage is implemented by comparing normalized feature vectors: each item in LTM has a vector associated with it which specifies which predicates (+- concepts) play a role in the definition of the item, and how often each predicate occurs in the definition. Crucially, though, these vectors do not contain structured representations, just dumb sums. It is not hard to see that it is much faster to compare such feature vectors than to compare intricate structures the latter inherently requires serial processing, while the former can be done in a parallel fashion. The speed of MAC does come at the price of loss of accuracy, because there is no guarantee that the items being compared share any complex representation whatsoever, just that the same predicates are used somewhere in their definitions.
Once MAC has found the best overall matches in memory for the target item, the FAC module takes a more critical look at its output. Using parallel SME (Structure Mapping Engine) processors, a full-blown structure comparison is made between the target and each of the items which were output by MAC. The aim of this comparison is to find the best global match, as determined by the number of correspondences which can be made between the target and the base, their soundness (i.e. whether the individual correspondences lead to motivated structural correspondences), their depth (i.e. elaborate structural mappings are more important than a collection of more superficial correspondences), and by the number of candidate inferences which the base suggests for the target. It is at this stage that the constraints of one-to-one mapping and parallel connectivity mentioned above play an important role in making sure that the mappings remain sound. In this way, FAC looks for literal similarity between target and base, acting as a structural filter for the more superficial similarity-oriented output of MAC. Just like its companion, FAC outputs the best match it has found, along with any other items whose match scores are within 10% of the winner.

With all this is mind, it is reassuring to find that FGL found that MAC/FAC scored qualitatively similar to human subjects on a story retrieval and inference task. The setup was that the subjects had to read a large number of stories, and were later asked which of these stories they were reminded of when they read new stories. Roughly, the following retrievability hierarchy was observed:

+Surface +Surface -Surface -Surface

+Structure >= -Structure >= +Structure >= -Structure

However, when these same subjects were asked to judge how good the inferences between stories were, they rated the ones which shared structural features as much more useful than the ones which just shared surface similarities. FGL are happy to report that MAC/FAC was able to simulate these two results: the MAC part was somewhat more sensitive to surface similarities and output a number of candidates; the FAC part then further reduced the number of candidates by selecting the ones with the best structural match. (No comparisons of numbers between human subjects and MAC/FAC, though, because FGL don't provide them.)

In the second half of the paper, FGL present three sensitivity analyses to give some insight into how MAC/FAC works. First, they show that the output selection percentages (i.e. the 10% window for MAC and FAC) are important, because the model stops working if the MAC percentage goes above 20% (so it's not the case that the model is so powerful that it could learn anything anyhow). Second, they demonstrate that the feature vectors of MAC can be changed considerably without truly affecting the qualitative behavior of the model (so nothing crucially depends on the input representation). Third, they illustrate that both surface and structural information is needed in the representations for the model to work correctly (so it's like humans in needing both).

Then they move on to a comparison of MAC/FAC with the ARCS model of Thagard et al. Unsurprisingly, we find that MAC/FAC outperforms ARCS in many different ways (speed, accuracy, similarity to psychological data.

Finally, the discussion section presents a number of weaknesses of MAC/FAC and suggestions as to how the model could be extended to deal with them. On the double, the issues raised here are:

* Retrieval failure: sometimes people are not reminded of anything at all, so MAC/FAC should be able to fail in a similar manner;

* Goal awareness: there is some evidence which suggests that people's goals will influence (perhaps even drive) their cognitive behavior, so MAC/FAC could benefit from a type of representation which takes overall goals into account;

* Vector size: in their current incarnation, the MAC representations would grow without bounds as the number of predicates increases, so these vectors may need to either become more abstract, or factorized into a small set of primitives to remain computationally cheap and efficient;

* Inter-item effects: there is some psychological evidence that different possible analogies for a given target interact, so MAC/FAC should probably not be processing different candidates in complete isolation of each other;

* Iterative access: psychological evidence suggests that people can use one analogy to find a better one, so MAC/FAC may need an output-to-input connection, so that it can reconsider its own previous output at a later stage;

* Performance-oriented models: MAC/FAC only models analogy, so it should be part of something bigger and more comprehensive in terms of human cognition and behavior;

* Expert behavior: psychological studies show that intensive training and encoding of certain information makes it easier to access and more likely to be used for structural similarities, so MAC/FAC needs a way of distinguishing expert knowledge from other types of knowledge.

So, it seems that there is still some work left for FGL.


Comments
^^^^^^^^
* What is the relationship between analogy and metaphor? The latter too is usually described as 'understanding in terms of', with knowledge about one domain helping you out in another one. Presumably, metaphor is just a type of analogy, but then it would be interesting to see whether MAC/FAC can work with poetic materials.

* The one-to-one mapping constraint may be overly strict: it could prevent us from mapping the three people making up a Roman triumvirate onto a single dictator.

* The choice of the vocabulary for the different representations is of crucial importance. FGL point at re-representations as one way of dealing with this (e.g. ORANGE(X) and RED(X) are unrelated, but COLOR(X, RED) and COLOR(X, ORANGE) are not), but this brings us to COLOR as a primitive term.

* The same issue arises with the potential complexity of the MAC content vectors: they should contain all the information associated with an item (including zeros for the non-relevant predicates), but that would be an enormous amount for a psychologically plausible model. Either using more abstract terms or more primitive ones (or both) makes it even more important to know how we can know to choose which terms to use.

* The content vector mechanism would also rank an item defined with 10 statements and an item defined with the negation of these 10 statements as highly similar - they share all features, and the second one has some extra ones. Do we want opposites to be this salient?

* MAC and FAC work on different representations of the same items. To what extent is this dissociation desirable?

* With a MAC selection window of 10%, the FAC window can go up to 100% and the model data still conform to the human retrievability hierarchy. Hence, we can get rid of the FAC part?

* Far out: should the weird associations one experiences in dreams also be explained by a theory of analogy?


Analogy just looks like high level perception ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Mike's summary of this paper already covers most of the material in this paper, and a lot of the technical details are repetitions of the ones given in the MAC/FAC paper. So, I'll just describe Phineas here, a model developed for learning physical theories by analogy.

Phineas has a memory of physical systems, with qualitative descriptions of their behaviors (e.g. what happens to an object in terms of volume or temperature when it is heated or cooled). When it is presented with a similar description of new objects, it first checks whether it can straightforwardly apply one of its existing models. If Phineas can't, it tries analogy. Using the SME mechanism, it compares the structure of the new phenomenon with the those of the phenomena it already knows about, again looking for correspondences between both surface and structural properties, trying to develop sound and deep structural correspondences, and checking how fruitful the overall match is with respect to the number of candidate inferences. Such inferences can lead Phineas to take another look at the new object and see whether they apply or not. If they do, Phineas stores what it has learnt about the new object for future use. To make it somewhat more concrete: the example in the paper has Phineas map heat flow onto the flow of liquids.

Hofstadter's Copycat is a completely different beast. It lives in a world populated with alphabetic strings and faces problems like: "If the string 'abc' is transformed into 'abd', then what happens to 'aabbcc'?". In order to find the answer, it has to use its blackboard architecture in which multiple parallel and probabilistic rules exchange hypotheses and conclusions. These rules are also only useful for the domain of alphabetic strings: e.g. that 'd' follows 'c', and that 'z' is the last letter in the alphabet. In Copycat, all the concepts employed by the rules are connected in a Slipnet, which specifies the likelihood of a concept 'slipping into' another one; e.g. 'successor' becoming 'predecessor'. When it's running, Copycat uses a form of simulated annealing to guide its processes: at the start, it has a high temperature, which makes it easier for concepts to slip; as the system cools down, Copycat will settle on a solution.

As its critics point out, the weights on the connections of the Slipnet are hand-coded, the concepts have been determined at the start, and Copycat cannot learn at all. Its domain-specificity also prevents it from being applicable for any other purpose (unlike SME), and raises the specter of the unscalable toy model. And even in terms of model complexity, Phineas is already capable of dealing with larger representations than Copycat is. Taken together with the higher-level issues, this suggests that Copycat is not the panacea Hofstadter proclaims it to be.


Comments
^^^^^^^^

* What exactly is the difference between a single module doing perception and cognition, and two modules which work interleaved? Bandwidth?

* Sometimes, the discussion slips from theoretical issues into implementational issues: demonstrating that Copycat is no good either, does not make Phineas a better model.

* Phineas has actually published three papers already! (See page 20.)

* To argue for the superiority of SME, the authors use Phineas. But Phineas has many more components than SME, so it's unclear which components are doing the (good) work.

* The authors mention (top of p.7) that not all information about a concept will feed into MSE, only the part relevant to the task at hand. But how do we determine this without first checking the relevance of each and every bit of information? If it's a general mechanism of attention, then we would like to see a model of that too.

Ezra Van Everbroeck - - - ezra@ling.ucsd.edu - - - Linguistics
Computing Support - - +1 (619) 534-8239 - - UC San Diego

| Syllabus | Speakers | Article Summaries | xxxxx | xxxx |


Contact: Gilles Fauconnier
Web Design: Omar Alhassoon