Wednesday, March 8, 2000

Feh. Reviewed string stuff with scc. It looks great. He needs to talk with warren more to make sure that everybody buys in. Helped hyatt fix his last to PDT+ bugs. Started working on a document for XPConnectifying the DOM.

Thursday, March 9, 2000

Put up a document on lanugage-agnostic script support. It's pretty feeble, right now; mostly just recycled emails and stuff coughed up from a bug that I'd filed on myself. I need to start digging into the code to grok what's going on.

Tinkered with btek all afternoon to get its hard drive upgraded.

Friday, March 10, 2000

Late last night I had a moment of clarity with respect to the problem with the template builder and assertion ordering (the whole RETE thing was a bit of a red herring).

Specifically, we'll maintain a state machine that represents the template rule match state for an RDF resource in the template builder's database.

The state machine will be constructed from the <rule> tags in the <template>. The "alphabet" will be the condition tests that a rule needs to make; in other words, the state machine will test the condition of a rule to change from one state to another. The set of states and transition function will be constructed such that, when a rule matches, the state machine will be in an "accepting" state.

The construction will be as follows.

To determine which rules are matched, construct a string from all of the conditions that are currently true. If an accepting state can be reached in Mr from M's start state, then rule r is currently matched.

Tuesday, March 14, 2000

Two hour meeting yesterday to go over tree control for the next six months. No concrete action items. I think jar was going to take the "green, green, red, yellow, red, red" plan to the PDT meeting and see if it was reasonable. chofmann was going to work up something that we could post to describe "green", "yellow", and "red" states. I should bug 'em to make sure shit gets done.

Sunday night, I managed to get an implementation for a finite automata, as well as code that converts an NFA to a DFA. Monday morning, I built code that will do the original construction that I described Friday from a set of rules. The bad news is, it pretty much is O(2n). Ack.

So, I tinkered with some alternate constructions yesterday without much luck. I started considering trying to just use the NFA directly, maintaining the current state in a bitmap, and then started to think maybe it's best to go back to the drawing board here.

Mozilla meeting. Feh.

String design review with scc, warren, and rickg. Went pretty well: I took some notes just in case anyone forgets what they're supposed to be doing. The plan-of-record right now is that scc will land his interface changes plus mods to nsString this week, with rickg to land nsStringX next week or the week after. (This was a point of contention, actually.) We didn't get into any of the nitty-gritty details of why rickg and scc were fighting, bug hopefully if this crap arises again in the future, we'll have the stringjockeys alias to deal.

scc and I talked a bit about this rule matching problem. No great idea with respect to the DFA construction, unfortunately. One interesting realization was that there is an "implicit" condition on the rule; specifically, (?a ^child ?b) (replace ^child with whatever the active containment properties are). So a rule really looks something like:

{(?a ^child ?b)
 (?b ^is-container true)
 (?b ^can-subscribe true)
 ==<
 (build "<blah blah blah />")}

Th at is pretty important, because even if I did work out a DFA construction that used a reasonable amount of space, I'd still have been screwed by the fact that I didn't remember the partial state from the "implicit" rule. So maybe RETE wasn't such a dead-end after all...

Wednesday, March 15, 2000

I just can't leave this DFA idea alone. Pulled out my old Logic Synthesis and Verification Algorithms text from school. There were some clever schemes in there for encoding states of DFAs. I thought that maybe I could use one of those to remember the partial state in an NFA. All the NFAs that I construct seem to have a very reasonable number of states and a reasonably sized state transition table. I guess I'm thinking that it would be cool if I could work out some bit-twiddling to handle the non-determinism.

I'd like to make it eventually be possible to do something like:

<rule>
(?a ^related-sites ?b)
(?b ^instance-of rdf-sequence)
(?b ^member ?c)
==<
(<treechildren>
  <treeitem uri="?c" />
 </treechildren>)
</rule>

In other words, let you do arbitrary graph traversal to construct a content model. Maybe the XUL syntax would be something like:

<rule>
  <conditions>
    <condition source="?a" property="http://blah#rdf-foo" target="?b" />
    <condition source="?b" property="http://blah#rdf-bar" target="rdf-sequence" />
    <condition source="?b" property="http://blah#rdf-baz" target="?c" />
  </conditions>

  <action onaction="doSomeJS();" >
    <treechildren>
      <treeitem uri="?c" ... />
    </treechidren>
  </action>
</rule>

Feh. Maybe I'll make love to the template stuff.

Talked to hyatt for a couple hours yesterday afternoon about doing the RDF for skins. It seems like he has a lot done already, and is just looking for a bit of help with respect to laying out the RDF graph. He'd like me to help, but since I'm strapped for time over the next few days, I'm thinking he'll probably be able to finish up most of it himself.

Made it to the Sloan library last night and picked up Forgy's paper on RETE. Read it, noodled on it for a bit. Seems like the hard part is going to be figuring out how to use the network to drive the initial construction of the content model. Once that's done, using the network to handle incremental updates should be pretty straight-forward.

Thursday, March 16, 2000

Updated the Performance Laundry List with fifty or sixty new "perf" bugs that had appeared since February 4th in preparation for meeting with Intel tomorrow.

Wrote a quick document called "Mozilla Needs Linux Help". I'm not sure where (or even if) I should post it. I'll bug somebody later.

Mitchell and I prepared for battle with Intel tomorrow.

Talked to David Ascher of ActiveState fame about Python stuff, well, mostly jband talked to David about Python and XPConnect stuff.

Kicked out some RDF/XML files for hyatt's chrome registry stuff.

Fixed a last-minute Beta bug (maybe) that has to do with incorrectly re-serializing RDF/XML out when the data is not straight ASCII.

Friday, March 17, 2000

Trip up to Intel to talk about Getting Involved. Lots of notes.

Monday, March 20, 2000

Finally read Miranker's paper on TREAT: as far as I could tell, the main idea is to avoid storing the computed joins ("beta memories"), and just use locality to look up the results in the discrimination nodes ("alpha memories") when you needed them. It looks like something like this will be the right thing to do here.

There are a couple of mismatches between the kind of system that RETE and TREAT are applied to, and what we've got. First of all, both RETE and TREAT seem to deal with frame-like structures where you hold a reference to the frame in the alpha memory, and the frame percolates downward through the discrimination network when its slots match. We've got a much looser "semantic network" kind of representation, where maintaining all of the assertions about a particular node in the network may become unwieldy.

I think this may become a problem if we have several discrimination nodes "stacked up" in a linear fashion: although we mail fail the first test, we'd pass the second and third. This means that we either need to do a combinatorial "network" like I was thinking about for the DFA representation, above, or that we'll need to return to the graph to run tests each time an element "falls through" to a new discrimination node.

Secondly, we're not going to get all of the working memory elements "added" to the network at initialization time. Instead, the network will need to be built up incrementally, if possible. For example:

           (Root)
             |
      +------+-------+
      |              |
      |              v
      |        (?c color blue)
      |              |
      v              v
(?a child ?b)  (?c type foopy)
      |              |
      +->(?b == ?c)<-+
             |
             v
          (Rule 1)

Here, we start with the root resource in a template, proceed down matching the "child" predicate, and then potentially need to proceed back up through "type" and "color" to backfill the join criteria. That isn't all that hard to do, but it'd be nice to do it in such a way that avoided continual recomputations of the alpha memories (specifically, remembering things that don't match).

Okay, so let's play around with a "real" example, the "new folder picker".

1: {(?a child ?b) (?b cancreatesubfolders true) (?b iscontainer true) (?b isempty false)}
2: {(?a child ?b) (?b cancreatesubfolders false)}
3: {(?a child ?b) (?b cancreatesubfolders false (?b iscontainer true) (?b isempty true)}
4: {(?a child ?b) (?b cancreatesubfolders true)}

                                    (Root)
                                      |
      +--------------------+----------+-------------------+
      |                    |                              |
      v                    v                              v
(?a child ?b) (?c cancreatesubfolders true) (?d cancreatesubfolders false)
  |  |  |  |     |                |                |            |
  |  |  |  +-----|----------------|----------------|-------+    |
  |  |  +------+ |                |            +---+       |    |
  |  +-------+ | |                |            |           v    v
  |        +-|---+                v            v         (?b == ?d)
  |        | | |                (?c iscontainer t)           |
  v        v | |                  |            |             v
  (?b == ?c) | |            +-----+            +------+     (2)
      |      | |            |                         |
      |      | |            v                         v
      v      | |     (?c isempty f)             (?c isempty t)
     (4)     | +------------|----------------+        |
             |              |                |        |
             +->(?b == ?c)<-+                v        v
                    |                        (?b == ?c)
                    v                            |
                   (1)                           v
                                                (3)

Looking at the above makes me think that we're missing the piece that connects it into the content model. There are two problems. First, how do we "seed" the network to get it started. Second, if a new assertion arrives (foo cancreatesubfolders false), how do we go back "up" through (?a child ?b)?

So let's say we extend the syntax to allow addressing elements by id; e.g., rule 1 would be written like:

<rule>
  <element attribute="id" value="?a" />
  <rdf source="?a" property="child" value="?b" />
  <rdf source="?b" property="cancreatesubfolders" value="true" />
  <rdf source="?b" property="iscontainer" value="true" />
  <rdf source="?b" property="isempty" value="false" />
</rule>

This might make the network look like:

                                    (Root)
                                      |
      +--------------------+----------+-------------------+
      |                    |                              |
      v                    |                              |
(element id ?a)            |                              |
      |                    |                              |
      v                    v                              v
(?a child ?b) (?c cancreatesubfolders true) (?d cancreatesubfolders false)
  |  |  |  |     |                |                |            |
  |  |  |  +-----|----------------|----------------|-------+    |
  |  |  +------+ |                |            +---+       |    |
  |  +-------+ | |                |            |           v    v
  |        +-|---+                v            v         (?b == ?d)
  |        | | |                (?c iscontainer t)           |
  v        v | |                  |            |             v
  (?b == ?c) | |            +-----+            +------+     (2)
      |      | |            |                         |
      |      | |            v                         v
      v      | |     (?c isempty f)             (?c isempty t)
     (4)     | +------------|----------------+        |
             |              |                |        |
             +->(?b == ?c)<-+                v        v
                    |                        (?b == ?c)
                    v                            |
                   (1)                           v
                                                (3)

This feels a little bit like we're opening Pandora's box with respect to element matching. We can do quick lookups for elements with an id attribute, but what if somebody specified something else? (Also, strictly speaking, this would not suffice to "seed" the template, because you'd want the ref attribute.) Maybe a better way to write it would be:

This leaves the door open for adding attribute and tag conditions, as well.

Feh. I could meander about with respect to syntax; maybe that's not such a good idea right now. Let's get back to matching.

I think there something wrong with the way I've drawn the above network. I think that I can't have a discrimination node with more than two parents; e.g., ?c iscontainer t). How could the matches computed by one parent be shared by the other? So let's try again...

                     (Root)
                       |
      +----------------+----------------+
      |                                 |
      v                                 v
(element id ?a)                 (?c iscontainer t)
      |                           |             |
      v                           v             v
(?a child ?b) (?c cancreatesubfolders true) (?c cancreatesubfolders false)
  |  |  |  |     |          |                         |         |
  |  |  |  +-----|----------|-------------------------|----+    |
  |  |  +------+ |          |                         |    |    |
  |  +-------+ | |          |                         |    v    v
  |        +-|---+          |                         |  (?b == ?c)
  |        | | |            |                         |      |
  v        v | |            |                         |      v
  (?b == ?c) | |            v                         |     (2)
      |      | |      (?c isempty f)                  |
      |      | |            |                         v
      v      | |            |                  (?c isempty t)
     (4)     | +------------|----------------+        |
             |              |                |        |
             +->(?b == ?c)<-+                v        v
                    |                        (?b == ?c)
                    v                            |
                   (1)                           v
                                                (3)

Started banging out some code to do the matching. I have this thing called the MatchState that will basically build the conflict set. Every node in the network is an InnerNode, except for the InstantionNodes that are that the leaves of the network. DiscriminationNodes are used to run individual "tests" (e.g., (?c isempty f)), and JoinNodes are used to propogate the results of joins.

I'm thinking that what should happen is that, when I'm given a new "working memory element" (which I'm calling Element for now), I'll hash into the network to find the DiscriminationNodes that would be affected by the changing Element. (There may be some redundancy in the network if it's built in a degenerate fashion, or the rules are simply complicated enough so that there can't be sharing.)

For each node that I initially discover as being affected by the test, I'll first propogate up through the parent chain to the root. I'll verify that the new Element actually passes the set-inclusion tests for each parent DiscriminationNode, stopping if it turns out that the Element fails.

If we can trace a path back to the root, then that means that the Element has satisfied all of the conditions to this point. Now, we propogate it downward through the network.

While going down, we may encounter a JoinNode. We'll either enter through the left- or the right-branch of that node, and will need to propogate set inclusion back up the other branch to verify that we've actually satisfied the conditions of the join. If this fails, we halt. If it succeeds, we continue to propogate the addition down the network.

As we process JoinNodes, we'll begin to build up variable binding information that we'll need to use for the instantiation code.

Tuesday, March 21, 2000

Right. So after some more thought last night, I think I am starting to realize that we'll need to build up binding information for the full syntax. For example, you may have a rule whose conditions look like this:

<rule>
  <element id="?a" />
  <rdf source="?a" property="child" target="?b" />
  <rdf source="?b" property="color" target="?c" />
</rule>

That is, you want to bind the color property into ?c and pull it through to the RHS. So that network would look something like:

              (Root)
                |
      +---------+---------+
      |                   |
      v                   |
(element id ?a)           |
      |                   |
      v                   v
 (?a child ?b)      (?c color ?d)
      |                   |
      +--->(?b == ?c)<----+
                |
                v
               (1)

Hrmph. I'm not sure I'm doing this right. I'm wondering if I'd really end up with networks that looked like this:

                    (Root)
                      |
       +--------------+---------------+
       |              |               |
       v              v               v
(element id ?a)  (?b child ?c)  (?d color ?e)
       |              |               |
       +->(?a == ?b)<-+               |
              |                       |
              +-------->(?c == ?d)<---+
                             |
                             v
                            (1)

This network is a bit more scary with respect to propogating tests back "up"; for example, if I'm given the assertion (foo color blue), I need to skip a level to compute how to bind ?c! Ack!

So let's say that, as we construct rules, each variable in a rule gets a unique identifier (like I've done in the last network). That means that part of negotiating the network involves assigning bindings to these variables. So let's take the (foo color blue) example and try to work out what would happen:

  1. We hash into the network to hit the (?d color ?e) node. We assign bindings {(?d foo) (?e blue)}
  2. Check "up" to see if the binding is good; we hit the root so it's fine.
  3. Propogate the addition down through the network. Our next stop is (?c == ?d). Here, we can add to the bindings (?c foo).
  4. Now we need to check "up" the LHS to (?a == ?b).
  5. Now this is the tricky part: (?a == ?b) needs to determine if there is anything consistent in the database with the bindings {(?c foo) (?d foo) (?e blue)}. It needs to do a "join" operation over the datasets of its ancestors. Okay, complexity alert: let's talk about this below.
  6. After computing the join, it will have assigned bindings to ?a and ?b, which can then be returned to (?c == ?d).
  7. (?c == ?d) can then propogate the bindings downward to the instantiation node (1), and voila, in goes the rule to the conflict set. [As an aside, this makes me realize that we'll need to always remember the entire conflict set: newly matched rules don't automatically fire.]

Allright, so let's talk about computing that join. There are several problems here.

I guess what's going unsaid here is that I was hoping that the discrimination nodes could be implemented in such a way that they wouldn't actually have to store anything. Problem The Second, above, makes it very difficult for me to do that.

Forging onward through adversity, let's say that our nodes advertise what bindings they participate in. This is a first step towards solving our "Big m" problem. (?b child ?c) indicates the transitive closure of variables that it binds (both itself, and its ancestors). We see that he binds ?c, where (element id ?a) doesn't bind anything that we have data for yet. So up we go to (?b child ?c).

Now maybe we have some way of asking a datasource whether or not it can efficiently do GetSources(). (Or, if it can do GetSources() at all!) If not, then we'll just suck it up and maintain an "alpha memory" in the node. So, in this case, maybe it's an in-memory datasource that can quickly answer our query, and we just do GetSources(child, foo) to figure out the set of parents. Voila, a binding for ?b and an m of one!

Then over to (element id ?a). Since the join node can bind ?a to ?b, we should be able to do a quick lookup in the document's element map to see if there is actually some content in the content model that we can whack. (We'd also need to verify that the content is something under the purview of the current builder, of course.)

Another Hard Part is dealing with the fact that we might've found several bindings for ?b; that is, ?c might've been the child of several different nodes in the RDF graph. Since that's the case, it seems like we'll need to somehow beef up our notion of the "current bindings". I'm guessing that the naive way to do this is the correct way: specifically, keep track of sets of products explicitly, rather than trying to create a product of sets. For example,

{(?a foo) (?b bar)}
{(?a foo) (?b baz)}

Rather than:

{(?a {foo}) (?b {bar, baz})}

My intuition here is that you could construct a case where one of the products-of-sets was not valid.

* * *

Okay, much later. I have a first cut at an implementation. Still need to work out how to do the big mega join thing, but I think I have a bit of a clue. Still some klunkiness with regards to the JoinNode and what assumptions we can make about variable binding when we arrive at the node. Put a ton o' comments into the code to talk about it. Hopefully I'll work out the kludges.

I made over-use of the ->* operator. I think I should just revert that to static callbacks that take the node as a parameter. Maybe. I dunno.

Wednesday, March 22, 2000

So I've been thinking more about this variable binding stuff. I have this idea that it's probably safe to assume that, in a set of bindings, the same variables have been bound (albeit to different values). The cool thing about that is that you can look at the variables that have been bound in the first binding, and use those to deduce what's left to bind.

It would also mean that you could construct bindings doing some kind of copy-on-write mechanism; e.g.,

                         +-- (?c bop) <-- (?d bop)
                         |
(?a foo) <-- (?b bar) <--+
                         |
                         +-- (?c bip) <-- (?d bip)

I'm kinda suspicious that this may be required for fan-out as we go down the network. For example, take a discrimination node that has two children: one set of bindings may need to be worked out for the first kid (which winds its way down to rule one), and another set of bindings worked out for the second kid.

Okay, thinking about this a bit more, I'm beginning to see that I may be prematurely optimizing with this match state stuff. The "upward" and "downward" operations seem distinctly different here: going up, we are strictly constraining an existing set of bindings; going downward, we are dealing with fan-out and possibly alternative expansions on the same binding.

* * *

Okay, got things "working" in the more complex case that I illustrated above (after making everything recursive instead of iterative, and doing a brain-dead implementation for the Instantiation and InstantiationSet objects). A long way still to go, but the join stuff seems to be hanging together -- at least in the simple case I mentioned above. Whee!

Thursday, March 23, 2000

Gotta figure out how to do the conflict set. There are a couple of things it needs to be able to do.

Hrm. I'm trying to figure out how to find all of the active rules for a given "master URI" so that I can sort them and ensure that the one with the highest priority is fired. Let's say I have a rule that looks like this:

<rule nc:CanSubscribe="true" iscontainer="true" isempty="false">
  ...
</rule>
This translates in more verbose syntax to:
<rule>
  <element id="?a" />
  <rdf source="?a" property="child" target="?b" />
  <rdf source="?b" property="nc:CanSubscribe" target="true" />
  <rdf source="?b" property="iscontainer" target="true" />
  <rdf source="?b" property="isempty" target="false" />
</rule>

How do I know which variable binding is important to "identifying" the rule? Or is it even correct to be thinking about a variable binding identifying the rule at all? It seems like maybe I could look for all bindings with identical matches for ?a and ?b (e.g., because of the property="child" stuff), but that'll break down as soon as you introduce a rule that "skips" children.

Maybe we have to do it based on the RHS of the rule. We could look at the "topmost" uri="..." magic, figure out its binding, and look for all rules with that binding? That should work in the simple world where all the rules will likely share the same prefix (the first two conditions of the expanded rule, above). We could always look for ?b.

So maybe a more correct thing to do would be look at a new instantiation, figure out how it's bound to the uri="..." magic, then figure out amongst all the rules, how each of their magic is bound. For example, our rule above bound it to ?b. Another rule may have uri="..." bound to ?q. So we grovel through the conflict set searching for bindings of the newly instantiation URI to ?b and ?q.

Friday, March 24, 2000

Hacking hacking hacking. Something I need to do: get rid of these "default" frigging containment properties. They need to die die die in favor of explicit ones. The only default thing I want to have left when the smoke clears is RDF containers.

Allright, I think I have the LHS stuff pretty much working (at least, to a first order approximation). Now I need to hook up the conflict set and the RHS and see what happens.

Saturday, March 25, 2000

Woohoo! Got the bookmarks window comin' up baby. I get an assertion in the filesystem datasource as soon as I try to open a folder, but hey! Not bad. Anyway, I've been littering the code with XXXwaterson that I need to remember to go strip out.

Sunday, March 26, 2000

On the drive home I started thinking about what I need to do to determine which rules are matched. I think that I need to maintain a map that is keyed by the content node beneath which we're building and the URI of the "pivot" node. As we propogate changes through the network, we need to remember the <content, URI> pairs for newly matched rules. Then, we'll examine the set to see which rule has the highest priority. In the incremental update case, it may be useful to remember if anything was matched before, so we know whether or not to nuke old content.

Boy, this makes me realize that we'll also need to recursively propogate incremental updates. If a newly matched rule causes different to be created, we need to nuke the element's kids and re-create. The new kids may be constructed differently; e.g., because of a condition that depends on the parent tag. Not rocket science, but something to remember.

Spent most of the evening banging through string stuff with scott.

Tuesday, March 28, 2000

Struggling with the conflict set some more.

I got asynchronous additions working (for the most part) today, and was able to re-factor a lot of the code that lived in the old OnAssert, OnUnassert, and OnChange. I figured out the difference between Iterator and ConstIterator, and made those translations. I'm pretty much using Iterators everywhere now. What a C++ geek!

Argued with jar and warren about "Netscape Confidential" bugs, the Mozilla-vs.-Netscape cage match du jour.

Wednesday, March 29, 2000

Triaged all my M14 and M15 bugs. Yay!

So the problem that I'm staring down right now has to do with rule retraction. I've got an OnUnassert coming in with a source, property, and target: how does this affect current conflict set? My initial (broken) stab at this was to see if a node CanPropogate the assertion, and if so, remove each instantiation in the conflict set that contains its "seed" bindings.

This ends up being too aggressive. For example, just because I can bind to a parent element, doesn't mean I should remove every instantiation that has to do with the parent element. I guess I was equating a binding with a working memory element, which is wrong.

So if I wanted to pretend that I had working memory elements, what I'd have to do is remember which assertions actually matched in an instantiation, and index the conflict set that way; e.g., <foo, child, bar>. Then when this assertion comes through on an OnUnassert, I'd just nix any instantiations that "contained" it. This gets a bit trickier with respect to all the "computed" working memory elements that some of the test nodes "create". For example, emptiness and containerhood.

Allright, so here's an idea. Say we have the following network:

             (root)
               |
               v
          (content ?a)
               |
               v
           (?a ^id ?b)
               |
               v
         (?b ^child ?c) [A]
               |
      +--------+--------+
      v                 v
(?c ^color blue) (?c color red) [B]
      |                 |
      v                 v
     [R1]        (?c empty true) [C]
                        |
                        v
                       [R2]

And the following instantiations:

[R1]
1: {(?a node1) (?b parent1) (?c child1)}
2: {(?a node1) (?b parent1) (?c child2)}
3: {(?a node2) (?b child3) (?c child4)}

[R2]
4: {(?a node1) (?b parent1) (?c child3)}

Now, if I remove (parent1 ^child child2), I'd like to see instantiation 2 get nixed. If I remove (child3 ^color red), I'd like to see 4 get nixed. So dig this. Let's sayI ask the test node if it matches the assertion without doing any backend lookup, and got the variable bindngs back. Then I could remove any instantiations from the conflict set that properly contains those variable bindings. (By "properly contain", I mean, that has all of the variable bindings returned from the test node, not just some of them.)

For example, say I get OnUnassert(parent1 ^child child2). Node A is the only test that will pass given these values, and will bind ?b to parent1 and ?c to child2. The only instantiation that contains both these bindings is 2. So we can safely remove it from the conflict set.

Similarly, OnUnassert(child3 ^color red). Node B is the only node that will pass this test. It will bind ?c to child3. It seems that it is now safe to remove instantiation 4 from the conflict set.

So this works out fairly well for the container-membership and property-value tests, but I'm not so sure about the container instance tests. Let's say we get OnUnassert(child3 ^child child4), which will make child3 be "empty". It's easy to see how we'd remove instantiation 3 from the conflict set; however, how would we remove instantiation 4? (And worse, what if there was another rule with a condition (?c empty false) which would all of a sudden be matched?)

If we bend the rules a little bit, and say that the container-instance test can be clever, and is allowed to check the graph, then maybe we can circumvent the first problem.

But what about the second problem? Ack. I'll work on that later.

Okay so on to removal from the conflict set. Right now I have two maps in the conflict set. The first maps a binding to the set of instantiations in which that binding participates. The second maps a binding pair <content-binding, member-binding> to the set of instantiations in which that pair participates. The second table is used to identify a "cluster" of rules that compete amongst one another to match. (It's redundant, because you can certainly filter the results from the first map to determine which rules are matched.)

Typically, the maps are used as follows:

Thursday, March 30, 2000

Awright. It seems like my plan almost worked, but not quite. See, by the time I get the OnUnassert, I can't just run FilterInstantiations because the test will fail in exactly the case where I want it to succeed. FilterInstantiations goes to the RDF graph and says, "hey, is this instantiation consistent?" Which it won't be, because we've just revoked it.

The problems with relying on CanPropogate by itself are 1) you'll run the risk of yanking too many instantiations on a partial match, and 2) even if you do get a full match, you don't actually condition anything on the test itself (e.g., the container-instance tests).

Feh. Ok, here's the deal. Let's say I add a "negated" argument to CanPropogate, which means, could you propogate this if it we're true. Or something to that effect.

Oh wow. I just realized that I totally fucked this up. I need to remember working memory elements that actually match, not just the bindings. Doh! An example, please.

Say we have the following graph:

        (A)
         |
   +-----+-------------------+
   |     |                   |
 child child               child
   |     |                   |
   v     v                   v
  (B)   (C)--color-->blue   (D)--color-->red
   |     |                   |
 color child               color
   |     |                   |
   v     v                   v
  red   (E)--color-->blue  blue

And these rules:

1: (content ^id ?1) (?1 ^child ?2) (?2 ^color blue) (?2 ^empty true)
2: (content ^id ?1) (?1 ^child ?2) (?2 ^color blue)
3: (content ^id ?1) (?1 ^child ?2) (?2 ^color red)

This example breaks the bindings-only algorithm that I was trying to use. Specifically, removing (D ^color blue) would cause all instantiations containing (?2, D) to be nuked, "hiding" the fact that (D ^color red) could be used to match rule 3.

What this means (I think) is that the Instantiation now needs to remember the "working memory elements" that have been used to match, as well as the variable/value bindings. Since we don't really have "working memory elements", the trick will be to somehow use the CanPropogate mechanism to figure out how to turn a lone RDF assertion into a working memory element that can be used to yank instantiations out of the conflict set.

It also means that as we propogate tests downward through the graph, we'll also be extending the instantation objects with "working memory elements" that trigger matches. I'm not sure I know what will happen in the case where we have to Constrain back up the graph. Man, am I fucked.

Allright, I think that the way to do this is not to propogate two lists around (as I was first inclined), but rather to treat the working memory elements as "support" for a Binding in an Instantation. My intuition is that this will make a lot more sense when handling joins. I should probably work out why I think that.

There are two cases we need to deal with. The first is the "constrained" case where a propogation is coming down from the left or right ancestor. One of our variables will be bound, and we need to ensure that the other ancestor has consistent bindings. The second is the "unconstrained" case, where a propogation is coming up from a descendant, and neither our left nor right variable is bound.

In the constrained case, I think we can just assign the working-memory support from the constrained variable to the unconstrained variable. Hmm. Strictly speaking, it's probably not even all that relevant. We probably don't care what support is issued from one variable to another.

Regardless, I like the support idea a lot. It seems cleaner. Let me take a quick peek at how we use bindings right now and see if it's consistent.

Friday, March 31, 2000

Allright, I got the "memory element" stuff in last night. After making the above list, I decided that having elements literally support a binding was a wrong-headed thing to do, and went back to having elements support the instantiation directly.

It all works great, except for one thing. I need to figure out how to handle the RebuildContainer call. I'd made the mistake of assuming that I could just look at the content element for its id, and use that to yank elements from the conflict set. Unfortunately, by the time I get asked to rebuild the container, the id (or ref) attribute has long since been changed to the new value.

I could revert the code to just rip stuff out of the content model, but then the conflict set would be updated. Hmm. Currently I'm treating ContentIdTestNode::Elements as being equal iff they have the same content node and same ID. Maybe that's a bit over-zealous: a content node will never have multiple IDs: maybe I can just index the thing on the node alone...

* * *

Finally got Quantify to work with a large (~2,000) mail folder. Interesting results:

I also have noticed that switching away from a large folder seems to be pretty expensive as well. My first guess is that it's because we're removing content from the tree in a random order, rather than from back to front. So putting the old "rip all the nodes out" code back should work fine. I'll also need to leave the conflict set maintenance code in there.

Did a quick n' dirty hashtable implementation for the KeySet. It puts us right in the game, baby! I'm not noticing a visible difference between the old and new builders in an opt build. Both are around 3-4 seconds for a 2,500 message folder.