XML autocompletion according to RNG schema

121 views Asked by At

Let RNG be a relax NG specification, XML be a valid XML document relative to RNG, designate n be the nth character of the XML document, B a node of XML and XML[n = B] the XML document where the node B is inserted at the character n.

I want to define a function suggest(RNG, XML, n) → {Node} = O such that

  • for any B ∈ O, XML[n = B] is a valid XML document relative to RNG and,
  • for any C ∉ O, XML[n = C] is not valid.

For now, I have tried two approaches:

  1. Using a double recursion on XML and RNG, at the elementXML[n] containing the n the character and the RNG <element> RNG[n] matching XML[n], and return a tuple (boolean, suggestions), then decide what to return next based on those two elements. For instance, if the current RNG node is a choice between A and B, if fn(A)[0] is true and fn(B)[0] is false, return (true, fn(A)[1]), if they are both false then (false, []), and if they are both true, ..., well (true, fn(A)[1] + fn(B)[1]) might not be too bad, but certainly isn't always the right anwser.

  2. Using a "state machine", with four piles (say *r with the children of RNG[n], *x with the children of XML[n], *c with the current choices that are being solved, and *s with the output suggestions), such that

  • if *r equals *x, then pop *r and *x, if *r doesn't equal *x, then pop *r and push it in *s,
  • if *r is a choice between A and B, then push (*r, *x, B, null) in *c and push A in *r.
  • if *r is a ref, pop it and push the matching definition, if it is a definition process it as a group,
  • if *r is a group, push its element to *r
  • if *c[0] equals *r, and *c[2] is a node, then pop it, set *r and *x to *c[2],*c[0] and *c[1] respectively, push (*r, *x, null, *s) in *c
  • if *c[0] equals *r and *c[2] is null, then pop it, compare *s and *c[3], then, decide (and that's quite hard), which one to keep.
  • simplify the other nodes in terms of choices and groups (not so hard), such that they only have two elements.

But both have not been so successful:

  1. Both a are horrifingly slow on a large schema
  2. The first one fails frequently because of too much recursion, especially if the RNG contains multiple <ref>s, and it gets quite difficult to check for infinite loops too.
  3. The second one uses an awful lot of memory, because I have to deep copy each XML nodes on every <choice> I encounter.
  4. Writing tests is hard because of the number of possible cases, and I can't find a way to generate them programatically. So actually, I am not even sure my function produces a valid XML document.
  5. It is still not clear to me how I should be handling <choice>s. For instance, take <choice><element name="a" /><group><element name="a" /><element name="b" /></group></choice>: how should I encode this so that all valid nodes are suggested?

I know that multiple editors (e.g. http://www.thaiopensource.com/nxml-mode/ for Emacs) offer this option, but I am not sure they do indeed produce valid documents, and I am not familiar enough to read through Lisp.

So I am searching for a "good way" to implement this on DOM, or, even better a JavaScript library that would allow to do that.

0

There are 0 answers