Ran some training sets last night and collected the data. In each training instance, I started the problem solver with all four blocks on the table. Ran 10 sessions with 50 instances each. Zero learning occurred. Now, this could be because it didn't have enough time to pick up on other features; i.e., my heuristics weren't good enough. I'll need to play around with that a little more to see. Not very encouraging, though.
Got a meeting with John today, so I'm going to try to list out the open issues and questions that I have to deal with.
sp {(<o1> ^instance-of block)
(<o1> ^name 3)
(<t> ^name table)
-->
(<o1> ^on-top-of <t>)
}
In other words, to build a tower, the third block must be on the table. I'm not sure that the representation is rich enough for the system to learn that it can't lift a block over another block (noise?), so I'm not going to worry about this.
I think it's pretty clear how to use SCA to deal with operator selection (a la 1-incremental IMPROV), but I think that there could be some value added with the operator preconditions.
Allright. It seems to be holding up pretty well now. I've actually managed to see it through about fifty training instances with no apparent slowdown, so I'm optimistic that it should be able to hold up long enough to get some training data collected.
The next thing that I'm going to work on is making the target features useful: in other words, predicting features that actually can be used to alter the problem solver's goals. To do this, I'm first going to just try to make the problem solver concentrate on the "clear" and "on-top-of" features when selecting the target feature to learn for an execution concept.
I think that I'm having some trouble learning rules with "on-top-of" in the RHS. Since it has two unbound variables, I'm ending up with a lot of chunks with "undef-value" and "undef-object" as the prediction, which are (right now) useless during recall. I need to check into the learning process to see what happens when these kinds of chunks are forwarded as predictions. Are they augmented with all possible values/objects and used? Or are they just discarded?
I've put in a set of rules that restrict the features selected as the target feature, and now I'm getting absolutely zero recall for new preconditions. Kinda makes sense, because of what I'm including in the LHS -- I don't ever get a chance to iteratively recall things. So I tried including the same things on the RHS that I'm including in the LHS, but it hasn't bought me too much.
Just for kicks and because I'm not completely convinced that this thing is really learning anything, I've created a Tcl script that runs several trials and counts how many times it actually solves the problems. My hope is to be able to look over these trials and actually see the thing improving. I'm not confident that it is, but I'll hold off on my judgement for a bit. I've just set it to running on the simple case where all blocks are on the table and its trying to learn that it has to stack block 2 on 3 before putting 1 on 2. I'm doing twenty runs of 50 sessions each.
Spent some time this afternoon talking with Bob about my problem. Helped to clarify some things I think. I'm essentially trying to learn an exponential number of chunks: exponential in the number of heuristics that I use, that is. You could look at each chunk that I want to learn as a "cube" TTT => 3, TTF or TFT or FTT => 2, etc. So if I do want to do this, I should be a little bit concerned. I think I might be able to do something like this if I could make sure that the chunks I build aren't over general: that is, I can't have chunks like TTT => 3, TT => 2, T => 1, etc., because then I get conflicting values for the evaluation of the operator.
Bob suggested that I think about doing things "up front"; i.e., computing the "score" for each feature and then hand-coding some preferences to sort them out. (Well, maybe I'm putting words in his mouth, but we talked about it a little bit anyway.)
Anyway, the reason that I was thinking about all of this stuff in the first place was so that I could more quickly learn things like "can't build towers taller than three" (which I think would be represented by a precondition that block 3 starts on the table), or "can't lift a block over another block." This one would actually be a little bit more difficult to learn because I don't think that the features that are observable really make things all that apparent.
I think what I want to do now is finish up in as quick a way as possible the heuristics that guide learning so that I have something to hold up to the light and look at. I want it to be able to learn the above conditions, at least in some trial or another even if it can't learn them consistently. I guess right now the thing that scares me about the incremental learning junk is that is's so weak that I might run out of juice with Soar before I get there. An easy way to speed things up I guess would be just to learn several execution concept iterations for each selection concept iteration, although it's not clear that this will really help either.
After a little bit more struggling, I think that I'm now able to create a reasonable set of chunks for ordering the operators. I ended up working out the ATTENTION problem space where I create operator ties between all of the SELECT-FEATURE proposals. After a little fancy footwork I was able to construct chunks that seem to be just as specific as they need to be. I'm a little bit concerned because some of them look a little too general, but it seems like things are working fairly well so I'm gonna say "it works for now" and wait until it breaks to fix it.
Well, I've gotten the heuristics far enough along so that I Can do at least 30 or so training instances with things going ballistically exponential on me. Basically, I've forced it to sort on the "name" and "color" tags first, which are pretty good at getting things to be unique. On some experiments that I ran last night, I had pretty good success learning the correct selection concept for ordering block stacking. I limited the training instances to those where it where it was easy to discover that you should always put block 2 on block 3 before stacking block 1 on block 2, and it was able to successfully learn and even transfer the learning to other situations. I'm trying a completely random set of test cases now. As I'd mentioned yesterday, my results don't look as good: I'm not sure why.
Things didn't work so good for learning the execution concepts for instances that stray outside of the problem solver's initial knowledge. Well, before I get into that, let me just say that I think that I've got problems with my selection concepts on these problems, too. Specifically, if I give it a training instance where block 3 is on block 4, it kinda thrashes around, eventually learning an overly strong rejection rule for stacking block 1 on block 2 that prohibits it from ever solving the problem again. I thought maybe the way to deal with that problem was to weaken the learning algorithm so that it never learns about operator rejection rules.
Now, onto execution concepts. One problem seems to be that I'm never recalling any! This actually makes sense, because I've extinguished the ability for any to ever be recalled by enforcing the "name" and "color" order on my chunks. The cue that is presented to SFA for recall is an operator; something like:
<op> ^instance-of operator ^name move-block
^block <b1> ^destination <d>
^status execute
But all of my execution concept chunks look like my selection chunks; i.e., they concentrate on objects with names and then extend into other features of the object. So maybe what I need to do here is force the execution concept learning to focus on the operator first.
Okay, I think I've fixed that by making any SELECT-FEATURE proposal "worst" that doesn't select a feature in common with the operator. I'm kind of relying on some heresay that Soar will sort out "worst" preferences using the "better" stuff.
So now I've run into the next problem. It works so good that I'm now in a little bit of trouble. I'm recalling features like "the destination object was red", and setting up goals to achieve "make the destination object red"! How do I fix this? Well, my first thought was to add some additional knowledge that says "you make features like 'red' and 'name' into goals, but you can can make features like 'on-top-of' and 'clear' into goals." The problem with this is that it might interfere with the recall. An alternative solution might be to make goals like "make block 1 red" just automatically "satisfied", i.e., make them ignored. Think I'll try that for now because it's a pretty clear extension to the domain rules file.
Hmm. I'm getting close but I'm still not there yet. I'm having trouble sorting out my heuristics for determining which features to select when building a new chunk. I'm going to try to list them out here so that I can make some sense of them.
One way to do this is to just drop to a subspace and then try to figure out which is best. Or I could "decorate" each proposal with augmentations like "^in-common-with-target t", then just have rules that sort out these decorations. This ends up requiring a ton of productions. You can't count them out. I guess the best thing to do then is to just take off the indifferent preference and let them all tie, then sort them out using a subspace.
Well, that leads to a combinatoric explosion (again). I run out of gas when I try to build the chunks that sort out the 30 or so productions that need to be ordered. Tried just passing the scores back up, but because I end up with some over-general chunks, I get attribute conflicts (which makes me suspicious that even if I was able to get the preferences to work in the subspace, I'd end up with operator conflicts).
So I've just finished up a set of experiments where I learn all of the identifiers for the blocks first. Ends up getting me up to about a dozen or so items in the LHS of the chunks, but even then things go to hell in a bucket and my matches start to go ballistic. It's least robust enough so that I can automate the learning junk and get up to about 400 chunks after thirty trials or so, but it still runs out of gas. It's hard for me to say if its performance is actually improving, at least in the case where I'm trying to get it to stack block 2 on block 3 first. In fact, now that I'm watching it, it actually seems to be getting worse :(.
Thought I had an answer to my troubles and then, lost it. Basically discovered (as I'd suspected all along) that the reason I'm running out of memory is because of exponential match cost for my rules. Then, as I began to poke around in it a little bit today, I discovered that the way the conditions get ordered is different between whether the conditions are loaded with "sp" or by the chunks getting build during execution. So I sent a note off to Soar bugs, and found out that indeed, they are different. But, it doesn't make much of a difference to me, sort of six one way or a half-dozen the other.
So now I'm back to square one, staring the real problem in the face. Which is, how do I figure out which features in the world to concentrate on. My current heuristic (which doesn't seem to work all that well) is to focus on features that are common to either the match set of a rule (for selection chunks), or to the target feature that we're trying to learn about (for execution chunks).
I wonder what would happen if I got a little more aggressive and tried to aggregate together several features, in hopes that the resulting rule would be more specific. My hunch here is that I'm getting this "bulge" over which it's hard to pass: specifically, until you reach a certain point, you know the match cost has to go up because you don't have enough conditions to restrict it. Hmm. Food for thought.
First thing that I'm going to try is replacing the SET-COUNT rules in the style of how Soar is supposed to "un-learn" bad operator proposals. Right now, I just let a bunch of preferences sort things out, prefering larger set counts to smaller. Unfortunately, this creates an exponential number of matches as the number of proposals grows. So instead, I'll learn to reject the old proposal and accept the new one. This should alleviate some of the problems.
Continued testing things. I think that some of my problems from last Wednesday may have been caused by stale files. I did a "make" first thing and discovered that some of the things were out of date. Anyway, I'm still running into huge problems with thrashing. I can't get more than ten runs in before I completely run out of gas. I've been tracing the chunks that are firing and being built, and they don't seem outrageous. I mean, I can't find any that seem to have an exponential number of matches. I wonder if I'm doing something sloppily and I'm just getting a ton of retractions and firings happening over and over again? I guess that would account for slowness, but not the memory usage.
A quick look at stats says that Soar is spending beaucoup time doing matches. Which says that maybe I better figure out a better way to do my chunks.Looks like Soar gets up to about 16 Mb before knine crashes. Maybe I'll have better luck at home.
Made some decent progress today as far as building debugging tools goes. I think that I got my Tcl script that sucks out SCA and SFA rules going about as best as its going to go. Found some problems in my Java code that's building up the rule tree. Observations that I've made so far are...
Spent all of yesterday and some of tonight working on a tool that can actually wade through the morass of rules spewed out by SCA and make some sense of them. Turns out that this thing actually was useful for doing debugging -- I discovered that I was learning on inconsistent instances! Specifically, I think that I use the new (i.e., transformed) feature set to learn everything for each selection concept. (I kinda think that I knew about this in the back of my head.) So I think what I'm going to have to do for selection concepts is take a "snapshot" of each and store it with the operator that I'm attempting to apply before I apply the operator. Then use that to train afterwards.
Hmm. Actually now that I look at things, it's a little less clear that this is the case. I actually remember thinking about this a couple of days ago. Basically, my stuff munges together the "goal" and the "state" features, which isn't quite right. So some of the contradictory stuff that I was seeing was an artefact of this. Not sure if all of it is, but the ones I've tracked down have been. Need to enhance my stuff to take those things into account, I guess...
Spent the end of the day yesterday futzing around with heuristics for selecting features to try to improve the performance when learning.
Thought a little bit more about what Doug and I talked about the other day: specifically, just kludging the learning algorithm to learn everything to make sure that it works before trying to make it work incrementally. Brings back a problem that I was dealing with way back when while I was working with Gary Pelton's ideas. If you want to data chunk the entire state, how do you do it for an arbitrarily structured feature set without touching the features?
On the bus today, I think I may have come up with an answer. What you do is you number each feature in the superstate uniquely, starting with one. Then in the data chunking substate, you iterate from 1 to n (which you'll be able to do because you'll get a state no-change or something when you get to n+1). You propose object, attribute, and value featurelets just like you'd do in my earlier implementation of Gary's stuff. Only now you can reject featurelet proposals for anything that doesn't match the specific feature that you're working on. In my previous work on Gary's stuff, I'd been keeping the "attended feature" in the superstate, which meant that I could only learn about one feature at a time, building up incremental rules to do so. This led to "schizophrenia" when an overly general rule would match, and this I essentially solved with SFA. Problem with SFA is that you only learn about a single feature in the LHS, which makes learning much slower. So, now I think I can do multiple features if the need should arise.
Now what I think I'm going to try as a less drastic measure for increasing the learning speed is to train the execution concept with each feature in the feature set as the "target" (or RHS) feature.
But first, I need to put down the problem with goals being clobbered in a sub-transform. Here's the deal. Start out with two goals, G1 and G2. In transform T1, we apply an operator O1 that solves one of two goals, say G1. When that completes, we create a sub-transform T2 whose parent is T1. In T2, we solve goal G2 but in the process clobber G1. Now we return "success" to T1, but T1 doesn't know what to do. I don't think Randy had an answer for it. I sent him a quick e-mail with a couple of proposed solutions:
The last solution seems nice in that we want to realize that "now its ok to select O1 whereas initially it wasn't". Maybe that just has to get sorted out in the execution concepts? Nah...it's not a subgoal for the operator. Hmm. What is it then? Is there a better way to define interacting subgoals? Constraints between goals? I'm kind of "implicitly" learning whatever-it-is in the selection concept: e.g., if we have goals (ON A B) and (ON B C), then choose (MOVE-BLOCK B C).
Hmm. In this light, it seems that (3) isn't correct, because it's not so much that we want to reject selection of O1, we just want to select the other operator first. In other words, our "basic" rule is "if you have a GOAL(ON A B), then choose (MOVE-BLOCK A B)". Our induced, better rule is "if you have a GOAL(ON A B) and a GOAL(ON B C), then choose (MOVE-BLOCK B C)." If we were to use option (3) above, we'd end up learning "if you have a GOAL(ON A B) and a GOAL(ON B C), then reject (MOVE-BLOCK A B)". This last rule is clearly incorrect.
So it seems like method (2) it is, for now. With this method, when we successfully build the entire tower, we'll have the two goals as features that we can use for learning the selection concept. When we fail, we don't learn anything new. Bummer is that it's going to give up every once and a while. Well, maybe more than just every once and a while. But we'll have to see if it's just awful.
Ok, here's Randy's response...
From rjones@eecs.umich.edu Fri Oct 11 11:41:19 1996 Date: Fri, 11 Oct 1996 11:29:00 -0400 From: Randy JonesTo: "Christopher R. Waterson" Subject: Re: GIPS Hi Chris, GIPS does not have any explicit mechanism for dealing with interacting goals. However, it is able to learn about interacting goals given an appropriate training sequence of problems that let GIPS see the right failures and successes. Let me see if I can set up an abstract example. Probably the initial knowledge base treats all goals independently, so given interacting goals G1 and G2, GIPS might pick one at random to work on first (that is, GIPS stochastically selects an operator whose actions satisfy G1). When that is accomplished, GIPS then selects an operator to achieve G2, but in the course of getting that operator to apply, G1 gets undone. GIPS has no mechanism to notice that a previously satisfied goal is now unsatisfied, but it *will* see that the problem is not yet solved, so it will continue searching for a solution. Now, it's possible that the system will not be able to find a solution, depending on how its knowledge base is currently tuned, but if it *does* find a solution, it will learn a bunch of positive and negative instances for the operators that achieve G1 and G2, which will hopefully sort everything out. After learning, the selection concepts for the operators will take into account both G1 and G2 together, rather than being independent. I think this corresponds to your alternative number three. The way GIPS avoids an infinite regress, however, is that it detects when the system comes to a cycle, and returns failure. So, for example, if the new transform T3 is the same as transform T1, and they are both on the current solution path, return failure. But this does mean that on some problems we eventually just quit. I have a favorite example of this type of learning, which I have written up a few times but has never been published. Let me see if I can find it...okay, I found two pretty similar papers, and have attached them to this message. The particular parts you should be interested in are the discussion of the "one-way rocket" problems. (I am sending the papers with the MIME quoted-printable encoding...let me know if you have problems). Randy
And another follow up...
From rjones@eecs.umich.edu Fri Oct 11 12:52:54 1996 Date: Fri, 11 Oct 1996 12:49:57 -0400 From: Randy JonesTo: "Christopher R. Waterson" Subject: Re: GIPS In message , "Christopher R. Waterson" writes: >Randy - > >On Fri, 11 Oct 1996, Randy Jones wrote: > >> Probably the initial knowledge base treats all goals independently, so given >> interacting goals G1 and G2, GIPS might pick one at random to work on first >> (that is, GIPS stochastically selects an operator whose actions satisfy G1). >> When that is accomplished, GIPS then selects an operator to achieve G2, but >> in the course of getting that operator to apply, G1 gets undone. GIPS has >> no mechanism to notice that a previously satisfied goal is now unsatisfied, >> but it *will* see that the problem is not yet solved, so it will continue >> searching for a solution. > >Right -- my question is *how*. So we've returned to from our first >subtransform. In the top-level transform, G1 is now unsatisfied, while G2 >*is* satisfied. What do we do? > >Let's say we now subtransform on G1 (I think this corresponds more closely >to your description than my #3, which I'll discuss in a minute). Two >things could happen: 1) we succeed and solve all goals, or 2) we solve G1 >and clobber G2. Think about each: > >1. Say we solve both goals. Now GIPS will think (incorrectly) that >selecting O1 was "the right thing to do" in the top-level transform, >positively reinforcing the following rule: "if GOAL(ON A B) and GOAL(ON B >C), then choose (MOVE-BLOCK A B)". > >In each subtransform, there is only a single goal that we can use in the >feature set, either GOAL(ON A B) or GOAL(ON B C), so at no time during >this problem solving instance will we ever learn the rule "if GOAL(ON A B) >and GOAL(ON B C), then choose (MOVE-BLOCK B C)". > >Since initially it's just as likely that we choose the correct operator >(MOVE-BLOCK B C) as we choose the incorrect one (MOVE-BLOCK A B), we'll >eventually end up reinforcing each one just about the same amount. Of >course, we could get lucky and get a feedback loop going where one gets >reinforced more often than the other; however, it's just as likely that it >will be the incorrect choice as the correct one! You are mostly right here. If the goal interaction is such that you will never fail (or loop) by picking operators in the "wrong" order, then you will probably not learn to pick them in the wrong order. You would have to punish GIPS when it picked the wrong order, and currently the only way to do that would be to call clobbering an already achieved goal a failure. However, you might be surprised that you *could* learn the proper ordering in this case, although it would happen pretty slowly. Let's look at a GIPS trace of the A on B, B on C problem: TRANSFORM "A on table, B on table, C on table" to "A on B, B on C" APPLY stack(A,B) to "A on table, B on table, C on table" EXECUTE TRANSFORM "A on B, B on table, C on table" to "A on B, B on C" APPLY stack(B,C) to "A on B, B on table, C on table" TRANSFORM "A on B, B on table, C on table" to "B clear, C clear" APPLY unstack(A,B) to "A on B, B on table, C on table" EXECUTE TRANSFORM "A on table, B on table, C on table" to "B clear, C clear" ACHIEVED ACHIEVED APPLY stack(B,C) to "A on table, B on table, C on table" EXECUTE ACHIEVED TRANSFORM "A on table, B on C, C on table" to "A on B, B on C" APPLY stack (A,B) to "A on table, B on C, C on table" EXECUTE TRANSFORM "A on B, B on C, C on table" to "A on B, B on C" ACHIEVED ACHIEVED ACHIEVED Note that when the problem is solved, GIPS will store a positive instance for selecting stack(B,C) when "B on C" is unsatisfied. However, it will stores positive instances for stack(A,B) both when "A on B, B on C" is unsatisfied and when "A on B" is unsatisfied but "B on C" is satisfied. I'd have to run the system to make sure I'm right on this, but I think this would give stack(B,C) a slight competitive advantage when both "A on B" and "B on C" are unsatisfied because of the extra training. I may be wrong about that, but I don't think so. We should run it and find out. >2. Say we clobber G2. Now we have the problem of detecting an infinite >regress. Unfortunately, G1 and G2 could each be *sets* of goals, not just >single goals themselves, so we need to remember each set of goals that >we've tried to subtransform on. If we detect that we are subtransforming >on the same set we've tried before do we know we're in a loop. This could >end up being pretty expensive. Matching sets of goals is exactly what GIPS does. It is not that expensive as long as you are matching total sets and not doing partial matching (total matching is what Soar does, and it's pretty efficient...if it had to do partial matching, it would be much less efficient). >I guess what I'm getting at is that I'm leaning more and more towards just >letting the transform fail. The drawback is that the system "is a >quitter". Maybewe end up being overly pessimistic with respect to what we >think we can complete. The up side is that we will eventually learn the >correct relationship between the two goals. I think going this route is fine. It means putting in a mechanism for detecting "goal clobbers" and turning that into failure. I don't think this is a problem, because even though the system will "quit", it will just be able to backtrack and find the better solution if one exists. The problem will be if you run into something like the Sussman anomaly or the one-way rocket problem, where there is *no* solution without clobbering goals unless you stop doing strict means-ends analysis. But GIPS-in-Soar should presumably be able to break out of this behavior as GIPS does, with the appropriate training regimen. Randy
I'm wrestling with the state no-change that occurs instead of a sub-transform. One goal is achieved in the top state, and another cannot be achieved because of some craziness in the domain. So I think my mystery boils down to figuring out why the SUB-TRANSFORM operator doesn't get proposed.
Well, I managed to track down the cause of the "really expensive training" problem: I'd forgotten to include the ^count attribute on the execution concept seed prediction chunks, so I was never copying the count & match set from the operator to the prediction structure. Don't know why this was making stuff go bananas, but it was. So how does one do defensive programming in Soar? Should I write a bunch of productions that try to ensure the state is valid and halt the system if it isn't? Might not be such a bad idea...
As for my other problem, I think I've figured out what's going on. You start with two goals, A and B. You achieve A, but do so in such a way that to achieve B you'll have to clobber A. So now you subtransform with your only goal to achieve B and (wham!) A gets clobbered. Your subtransform completes successfully, but now A needs to be satisfied. For some reason, we're not subtransforming again on A, which would seem to make sense to do (of course, the larger issue of interacting subgoals raises its ugly head).
On the trip to Pittsburgh Monday I started thinking about Doug's thesis and what my stuff does differently. Doug learns correct operator proposal rules, or in GIPS terms, operator "selection concepts." GIPS does this as well, but it also is able to induce some of the goal structure from the problem with the execution concepts. Here are a couple of meandering thoughts on the subject:
Once an operator is selected (we know pretty well what it means to learn about selecting an operator), it may not necessarily apply right away: for example, it might be an absract operator.
Anyway, there's a lot to think about here. Probably best saved
until after I get a chance to read Doug's thesis.
Thursday, October 2, 1996
Found another problem today as I started getting back into debug mode. When an APPLY/EXECUTE fails, GIPS immediately subgoals. In the resulting TRANSFORM, it recalls all the subgoals, but since they're all immediately satisfied the TRANSFORM terminates. I used to have it faked out so that the TRANSFORM would terminate in failure, forcing the APPLY to fail as well, but somehow that got broken: now the TRANSFORM terminates successfully, which ends up sending Soar into space. Gotta fix it.
Ok one down. No I found another bug with the RETRANSFORM operator. It's not blowing away the ^sca structure on the problem space so I end up with an attribute impasse when SELECT-OPERATOR gets chosen again. Argh.