Gradual Strategy Acquisition in a General Architecture for Intelligence

Christopher R. Waterson
Artificial Intelligence Lab
University of Michigan

EECS 592 Final Project
April 30, 1996

1. Introduction

In this paper, we present our implementation of a cognitive model for gradual strategy acquisition in a general artificial intelligence architecture. Specifically, we have implemented Jones and Van Lehn's (1994) GIPS algorithm in Soar (Laird et. al., 1987), replacing the probabilistic learning mechanism with Miller's (1993) Symbolic Category Acquisition (SCA) system. SCA is an incremental inductive learning algorithm similar to decision-tree learning that has successfully been used to model other human learning phenomena. In building this model, our intent was to (1) provide a process model for probabilistic learning phenomena that was descriptively modeled in GIPS, and (2) explore the use of SCA as the basis for a learning subsystem in a procedural context.

Background

Siegler and Jenkins (1989) provide a detailed analysis of the individual behaviors of several children, each of whom is just learning to add small numbers. They focused on changes to the procedure that each child used to perform addition, noting that there were significant high-level changes which occurred over large time scales (days and weeks). They showed that over large periods of time, the children seemed to discover a series of more efficient methods for computing the sum of two addends, and this series seemed to follow a fairly regular progression. Siegler and Jenkins termed this process of exploring alternative procedures strategy acquisition, defining a strategy to be a goal directed but non-compulsive procedure. In other words, a strategy describes a method by which a task can be accomplished, but this method may be one of several exhibited by the learner under different circumstances. They note that the progression of behaviors exhibited by the children is not well described by current computational learning theory; specifically, the learning does not seem to be a case of simple speedup, nor does it seem to be driven by problem-solving impasses in the normal sense.

In 1994, Jones & Van Lehn proposed a system called GIPS (for General Inductive Problem Solver), which provides a process model of the phenomena that Siegler and Jenkins observed. At its heart is a means-ends analysis performance system that learns operator selection and execution criteria with a probabilistic mechanism under supervised learning conditions. The system is remarkable in that it is able to accurately model the sequence of strategies through which most children seem to normally progress, and does so with a tight set of independently-motivated mechanisms.

Motivation

In this paper, we describe our system "Gipsosoar", an implementation of GIPS in the well-known Soar cognitive architecture. Our primary motivation for doing so is to provide a stronger process model of the gradual shift in a problem solver's high-level strategy over large periods of time. A process model is one that explains a phenomenon by specifying the underlying mechanisms might exist to produce the phenomenon. This can be contrasted to a descriptive model, which describes a phenomenon "externally", but does not make any claim as to why the phenomenon might arise.

A secondary motivation was to explore a the use of the SCA inductive learning system as one possible mechanism for providing knowledge-level learning of procedures. Most learning in current Soar systems is symbol-level learning (Diettrich, 1986), whereby Soar chunks the results of problem solving to avoid performing the problem solving again in the future. In general, this type of behavior is known as explanation-based generalization, and is limiting in that a system can never learn anything from explanation-based generalization that is not already entailed by its existing knowledge. Knowledge-level learning, on the other hand, is learning whereby a system can learn things that are not necessarily entailed by its knowledge base. Rosenbloom, Laird, and Newell (1987) describe data chunking, which is a process through which Soar can perform knowledge-level learning, and is the cornerstone upon which SCA is built. Our "grand hope" is that it will be possible to use these foundations to build a feasable performance architecture that can perform strategy acquisition in complex environements.

2. Technical Description

In this section, we will present a high-level overview of Jones & Van Lehn's GIPS system, pointing out differences between GIPS and our implementation where appropriate.

Problem Solving Process

Jones & Van Lehn use the term flexible means-ends analysis to describe GIPS' performance mechanism, which works roughly as follows (Jones & Van Lehn, Table 5, p. 19):

  1. Transform(Goals). First, the current state is compared to the specified Goals: if it can be determined that the current state of the world satisfies the Goals, Transform returns immediately. Next, An ordered set of operators is selected based on the current state of the world and the specified Goals that are to be achieved. The first operator in the set is Applied to the current state.
  2. Apply(Operator). Based on the current state of the world, a decision is made whether or not the specified operator can be executed.

Because operators are selected based on the current state of the world in addition to the goals at hand, it is possible for GIPS to exhibit a mixture of forward- and backward-chaining. Operators that are selected primarily based on the current state tend to produce opportunistic behavior, whereas operators that are selected primarily based on the goals at hand tend to produce goal-directed behavior.

Our implementation differs slightly from In Jones & Van Lehn's GIPS with respect to the execution of an operator. In Jones & Van Lehn's system no distinction is made between external and internal problem solving: all problem solving is performed internally ("in GIPS' head"). When GIPS decides to execute an operator, Jones & Van Lehn use an external supervisor to inform GIPS whether or not it has made the correct decision.

Gipsosoar distinctly separates internal from external problem solving using Soar's I/O mechanism. When Gipsosoar decides to execute an operator, an output command is immediately issued to the "motor system." Gipsosoar then uses perceptual cues to recognize whether or not the operator completed successfully.

Problem Solving Knowledge

There are five primary types of domain knowledge that Gipsosoar uses during problem solving, four of which are either explicit or implicit in GIPS, as well:

  1. Knowledge about when operators should be selected. Gipsosoar must recognize what combinations of states and goals are appropriate situations in which to select an operator.

  2. Knowledge about when operators should be executed. Gipsosoar must recognize what states are appropriate situations in which to execute an operator, versus recursively subgoaling on the operator's preconditions.

  3. Knowledge about the preconditions of operators. When a decision is made to subgoal on an operator's preconditions (instead of immediately executing the operator), Gipsosoar must recall those preconditions.

  4. Knowledge about the effects of operators. Gipsosoar must recognize whether or not an operator successfully completed execution. (This knowledge is not required in Jones & Van Lehn's implementation, because a supervisor forbids GIPS from executing an inappropriate operator.)

  5. Knowledge about the completion of goals. Gipsosoar must recognize when the current state satisfies the goals at hand.

It is not clear whether (2) and (3) are truly distinct types of knowledge. One could argue that knowledge about whether or not an operator should be executed is implicit in the knowledge of the operator's preconditions. We adopt the approach that Jones & Van :ehn took with GIPS; namely, that these two types of knowledge are related but distinct. However, we admit that this is an open area for research; see Bhattacharyya & Laird (in press) for a discussion of the relationship between recognition and recall knowledge in the context of SCA.

Problem Solving Knowledge & Learning

GIPS and Gipsosoar both perform knowledge-level learning (Diettrich, 1986) from their problem solving experiences, learning new facts not necessarily entailed by the system's existing knowledge. There are three primary areas where learning occurs in GIPS, one of which is not present in our implementation:

  1. Learning operator selection criteria. When GIPS has been able to successfully complete a Transform (i.e., an operator has successfully been applied and all of the Transform's goals have been satisfied), the system is in a position to expand its problem solving knowledge with respect to recognizing when operators should be selected.

    At the end of a successful Transform, there is an operator Op that has been successfully applied and resulted in the completion of the Transform. There may be several "FailedOps", operators that were applied but resulted in dead-ends. Using an inductive learning algorithm, GIPS can treat the initial state of the Transform (InitState) as a situation that represents and appropriate time to select Op. Likewise, it can treat InitState as a situation that represents an inappropriate time to select each of the FailedOps.

  2. Learning operator execution criteria. After GIPS attempts to execute an operator (either by asking the supervisor as in Jones & Van Lehn's implementation, or by executing the operator in an external environment as in Gipsosoar), the system is in a position to expand its problem solving knowledge with respect to recognizing when operators should be executed.

    Immediately after an operator has been executed in Gipsosoar (or, immediately after asking the supervisor in GIPS), the system will have feedback as to the successful completion of the operator (explicitly from the environment in Gipsosoar, implicitly from the supervisor in GIPS). Again using an inductive learning algorithm, GIPS can treat the initial state of the Apply phase as either (1) an appropriate situation to execute Op if the operator executes successfully, or (2) an appropriate situation to subgoal on Op's preconditions if the operator fails to execute successfully.

  3. Learning operator preconditions. In GIPS, knowledge about when operators should be executed can be used to infer the preconditions of operators. At a first level of approximation, Jones & Van Lehn's system will (1) add a new feature to an operator's set of preconditions when that feature appears in many situations where subgoaling is appropriate and few situations where execution is appropriate, and (2) remove a feature from an operator's set of preconditions when that feature appears in few situations where subgoaling is appropriate an many situations where execution is appropriate.

    This capability requires that GIPS be able to "introspect" on its own long-term knowledge, reasoning in a meta-level way about the relationship between an operator's execution knowledge and the operator's preconditions. In general, the Soar architecture enforces that long-term knowledge is opaque: one cannot explicitly examine ones knowledge. It is therefore an open research question as to how one could modify an operator's preconditions in Gipsosoar; we point the curious reader again to Bhattacharyya & Laird (in press) for one possible alternative.

Learning Mechanisms

Jones & Van Lehn's implementation of GIPS uses a learning mechanism inspired by STAGGER (Schlimmer 1986). For each feature in each target concept that STAGGER learns about, several running totals are maintained, including the total number of instances of the concept seen, the number of times a feature appeared in a positive concept, and the number of times a feature appeared in a negative concept. Using these running totals, STAGGER is able to compute the odds that a particular feature is likely to predict a positive or negative example. The odds for individual features can be aggregated to produce relative likelihoods of target concepts given a test example, amongst which a stochastic algorithm can choose.

GIPS uses separate concepts for operator selection criteria (called selection concepts) and operator execution criteria (called execution concepts). Selection concepts are trained on features from both the current state and the set of goals at hand. GIPS uses the relative likelihood of concepts to produce the ordered set of operators for a particular Transform. Execution concepts are trained on features of the state alone. GIPS stochastically chooses between SUBGOAL and EXECUTE for the execution concept based on the relative likelihood of one with respect to the other. In this way, GIPS will occasionally attempt to EXECUTE or SUBGOAL on an operator even though a large amount of evidence has been accrued to the contrary.

Gipsosoar's learning mechanism is based around Miller's Symbolic Category Acquisition (SCA). SCA works by learning more and more specific rules to map sets of features to concepts. When presented with an example to classify, SCA attempts to find the most specific rule that matches the training example, by removing ("abstracting") features from the training example one by one. For example, a rule:

{size:large;predict:friendly}

would be dominated by:

{size:large,teeth:sharp;predict:enemy}

for the complete feature vector {size:large,teeth:sharp,type:whale}. When presented with a training example, SCA attempts to find the most specific rule that predicts the correct category, and creates a new rule by adding the feature that was last abstracted to the feature set. For example, if

{size:large,teeth:sharp,type:whale;predict:friendly}

were presented as a training example, SCA would abstract away features one by one until the rule:

{size:large;predict:friendly}

is matched. It would then create a new rule by adding the last feature abstracted (say, type:whale), producing the new rule:

{size:large,type:whale;predict:friendly}

SCA behaves somewhat like a decision tree, and has been demonstrated to illustrate cognitively plausible interference effects while learning concepts (Miller, 1993).

Gipsosoar uses SCA to generate an ordered set of selected operators for the Transform by forcing SCA to return all predictions that it would make rather than just the first prediction it makes. The intuition is that more specific predictions are more likely to be correct than more general predictions, but in the even that a specific prediction fails, the best choice to try next is the next-most-specific prediction. As with GIPS, Gipsosoar trains its selection concept on both the features of the current state and the goals at hand. Implementation of SCA to predict execution concepts is fairly straightforward: SCA is trained from the features of the current state and must predict either EXECUTE or SUBGOAL. As with decision tree learning, choosing correct features is critical to converging on the correct goal concept in SCA. Our hope was that, in fact, when SCA chose an incorrect feature on which to base a prediction, this would lead to interesting interference effects that would produce strategy shifts like the ones observed by Siegler & Jenkins.

3. Experimental Results

In this section, we review the results of our experiments with Gipsosoar. We have primarily experimented with learning operator execution criteria, and the example we provide here illustrates a vivid example.

Problem Statement

We performed our experiments in the ubiquitous blocks world microworld, and were able to produce a variety of strategy shifts in problem solving. For purposes of illustration, we'll demonstrate the results of running Gipsosoar several times on a simple stacking problem, where Gipsosoar begins with incomplete knowledge of one of its operator execution criteria. Figure 1 shows the initial state of the world: blocks (1) and (2) are on the table, with block (3) on top of block (1).


Figure 1. Initial State

Figure 2 shows the final state of the world, where block (3) is on top of block (1), and block (1) is on top of block (2).


Figure 2. Final State

Note that even in this simple problem, interesting dynamics arise. The system must remove one goal that is currently satisfied in order to satisfy complete set of goals in the final state.

Initial Knowledge

Gipsosoar is given the following knowledge:

  1. Operator Selection Criteria
  2. Operator Execution Criteria
  3. Operator Preconditions
  4. Effects of Operators
  5. Completion of Goals

Note that the knowledge is incomplete in the following ways:

Trial Runs

Our first run immediately illustrates the lack of knowledge about when it is appropriate to subgoal on the Move-Block operator, as Gipsosoar attempts to move Block 1 on top of Block 2 while Block 1 is still buried under Block 3

Transform-1({(Block 1 on-top-of Block 2) (Block 3 on-top-of Block 1)})
        Select operator (Move-Block 1 to 2).
        Apply-1(Move-Block 1 to 2)
                Execute(Move-Block 1 to 2) (Incorrect!)

Result Block 3 is now on Block 2

                Detect that execution failed, subgoal on Move-Block preconditions
                Transform-2({(Block 1 clear) (Block 2 clear)})
                        Select operator (Move-Block 3 to Table)
                        Apply-2(Move-Block 3 to Table)
                                Execute(Move-Block 3 to Table) (Success!)

Result Block 3 is now on the Table

                                Apply-2 completes, successfully
                        Transform-2 completes, successfully
                Re-apply(Move-Block 1 to 2)
                Execute(Move-Block 1 to 2) (Success!)

Result Block 1 is now on Block 2

                Apply-1 completes, successfully
        Transform-3({(Block 3 on-top-of Block 1)})
                Select operator (Move-Block 3 to 1)
                Apply-3(Move-Block 3 to 1)
                        Execute(Move-Block 3 to 1) (Success!)

Result Block 3 on now on Block 1

                        Apply-3 completes, successfully
                Transform-3 completes, successfully
        Transform-1 completes, successfully

In our next two trials, Gipsosoar makes the same mistake again, providing a trace identical to the first trial. Finally, in our fourth trial, Gipsosoar successfully predicts to subgoal on the (Move-Block 1 to 2) operator, producing the following trace:

Transform-1({(Block 1 on-top-of Block 2) (Block 3 on-top-of Block 1)})
        Select operator (Move-Block 1 to 2).
        Apply-1(Move-Block 1 to 2)
                Subgoal(Move-Block 1 to 2) (Incorrect!)
                Transform-2({(Block 1 clear) (Block 2 clear)})
                        Select operator (Move-Block 3 to Table)
                        Apply-2(Move-Block 3 to Table)
                                Execute(Move-Block 3 to Table) (Success!)

Result Block 3 is now on the Table

                                Apply-2 completes, successfully
                        Transform-2 completes, successfully
                Re-apply(Move-Block 1 to 2)
                Execute(Move-Block 1 to 2) (Success!)

Result Block 1 is now on Block 2

                Apply-1 completes, successfully
        Transform-3({(Block 3 on-top-of Block 1)})
                Select operator (Move-Block 3 to 1)
                Apply-3(Move-Block 3 to 1)
                        Execute(Move-Block 3 to 1) (Success!)

Result Block 3 on now on Block 1

                        Apply-3 completes, successfully
                Transform-3 completes, successfully
        Transform-1 completes, successfully

It is fairly easy to see why such behavior occurs. SCA's execution concept starts with what amounts to an empty decision tree with respect to the SUBGOAL execution concept for Move-Block. Since SCA's search proceeds from specific to general, SCA requires at least three training instances before it can be assured of building a rule that is specific enough to mask the incorrect EXECUTE prediction (which has two conditions). Since SCA chooses randomly between operators of equal specificity, other training runs have required only two instances before the SUBGOAL execution concept is successfully applied.

This example does not illustrate learning that occurs with selection concepts, or learning that occurs across a variety of tasks. Although we have done some preliminary experiments that appear promising in these areas, these await a thorough treatment in future work.

4. Discussion

We feel that the current model we have created is successful as a "proof of concept". Specifically, we have demonstrated that a symbolic learning mechanism can successfully describe behaviors that, at a high level, appear to be probabilistic. This provides a certain strength to our model because we can now make predictions as to how learning might occur given a particular set of training examples, and when and why "gross" strategy shifts such as described by Siegler & Jenkins might occur.

A logical next step to test the cognitive plausibility of our hypothesis then would be to build the children's addition domain and attempt first to fit aggregate data such as the general progression through strategies that are described by Siegler & Jenkins (the SUM-to-MIN transition). If this proved to be successful, an even more ambitious test would be to attempt to fit one child's individual data. The hope would be that one could discover a specific ordering of training examples that would lead to quick acquisition of an optimal strategy.

Nevertheless, there are some distinct difficulties with our model that will need to be overcome before we would considered it to be cognitively plausible. First and foremost, we have implemented our system in such a way as to defeat some of the natural explanation-based learning that would normally occur as part of the Soar problem solving process. Specifically, we do not chunk the results of SCA's prediction process, nor do we chunk results from the goal hierarchy that arises from the means-ends nature of the problem solving. A natural next step would be to explore how chunking might be combined with Gipsosoar's knowledge-level learning mechanism to provide a combination of knowledge-level learning and symbol-level speedup.

A second problem with our model is that, we allow SCA to choose completely randomly from features in the world. This approach is acceptable in toy domains like the blocks world because the set of possible features is small: the number of irrelevant features is roughly equal to the number of relevant ones. In a more realistic domain, however, this would not be the case: the number of insignificant features would far outweigh the number of features that are critical for decision making. These would doubtless swamp SCA's learning mechanism and reduce the learning rate significantly. A logical extension to our system would then be an attention mechanism that would encode knowledge about which features are likely to be important for SCA to consider.

Finally, our current implementation does not learn to recall operator preconditions. As we have noted, this is a difficult problem in the Soar architecture, as long-term knowledge is opaque. Our hope is that deeper exploration into the basic mechanisms of SCA will lead to some insight as to how it is possible to extract operator preconditions from the operation execution criteria.

In some senses, our model provides an interesting architecture that was not available in the original GIPS system; specifically, an explicit separation of the internal and external problem solving components. Like GIPS, our model does some problem solving "in its head" as it performs operator subgoaling; however, unlike GIPS, our model performs opportunistic actions "in the real world". This allows us to remove the supervisor from the system, because now the external environment can be used to provide feedback on the success of operators. The down-side of this type of problem solving is that one can "leap before looking", and end up in a failure state from which no solution is possible.

5. Conclusions

We have taken a first step towards providing a process model for the long-term shifts in strategy akin to those described by Siegler & Jenkins. Although we make no claims as to the cognitive plausibility of the current model, this "proof of concept" provides a starting point from which further exploration can be made.

Our model provides certain strengths by virtue that it is a process model that it explicitly describes why strategy shifts occur. Specifically, our model predicts that high-level strategy shifts occur due to (1) learning of more an more specific rules about operator selection and execution criteria, and (2) interference effects between those rules that results from incorrect or incomplete knowledge.

6. Summary

In this paper, we have presented a knowledge-level learning system in the context of the Soar cognitive architecture that is able to produce gradual strategy shifts as described by Siegler & Jenkins. We have discussed the performance algorithm shared by the two systems and noted where differences occur (primarily with respect to operator execution). We have explicitly identified the sources of knowledge in Gipsosoar, including both knowledge that is "a priori" and that which is learned through performance. We have discussed the differences between the learning mechanisms GIPS and Gipsosoar, and have demonstrated that Gipsosoar's symbolic learning mechanism is able to reproduce strategy changes similar to those exhibited by GIPS. Finally, we have identified areas of future research, including further integration into the Soar architecture, feature selection and "attention", and precondition recall. It is clear that Gipsosoar is strongly rooted in Jones & Van Lehn's GIPS system. We feel that, in some respects, Gipsosoar provides a "deeper" process model for explaining how and why strategy shifts occur during problem solving.

7. References

S. Bhattacharyya and J. E. Laird. "A cognitive model of recall motivated by inductive learning." To be presented at the Canadian Conference on AI (1996).

T. G. Diettrich. "Learning at the knowledge level." Machine Learning 1, 287-315 (1986).

R. M. Jones and K. Van Lehn. "Acquisition of Children's Addition Strategies: A Model of Impasse-Free, Knowledge-Level Learning." Machine Learning 16, 11-36 (1994).

J. E. Laird, A. Newell, and P. Rosenbloom. "Soar: An Architecture for General Intelligence," Artificial Intelligence 33 1-64 (1987).

C. Miller. "Modeling Concept Acquisition in the Context of a Unified Theory of Cognition," Ph.D. Thesis. CSE Technical Report 157-93, The University of Michigan. 1993.

P. S. Rosenbloom, J. E. Laird, A. Newell. "Knowledge level learning in Soar." In Proceedings of AAAI-87.

J. C. Schlimmer and R. H. Granger. "Incremental learning from noisy data." Machine Learning 1, 317-354 (1986).

R. S. Siegler and E. Jenkins. How Children Discover New Strategies. Lawrence Erlbaum Associates, Inc. (1989).