| <html> |
| <body> |
| |
| <p> |
| This is a Free Software DOM Level 3 implementation, supporting these features: |
| <ul> |
| <li>"XML"</li> |
| <li>"Events"</li> |
| <li>"MutationEvents"</li> |
| <li>"HTMLEvents" (won't generate them though)</li> |
| <li>"UIEvents" (also won't generate them)</li> |
| <li>"USER-Events" (a conformant extension)</li> |
| <li>"Traversal" (optional)</li> |
| <li>"XPath"</li> |
| <li>"LS" and "LS-Async"</li> |
| </ul> |
| It is intended to be a reasonable base both for |
| experimentation and supporting additional DOM modules as clean layers. |
| </p> |
| |
| <p> |
| Note that while DOM does not specify its behavior in the |
| face of concurrent access, this implementation does. |
| Specifically: |
| <ul> |
| <li>If only one thread at a time accesses a Document, |
| of if several threads cooperate for read-only access, |
| then no concurrency conflicts will occur.</li> |
| <li>If several threads mutate a given document |
| (or send events using it) at the same time, |
| there is currently no guarantee that |
| they won't interfere with each other.</li> |
| </ul> |
| </p> |
| |
| <h3>Design Goals</h3> |
| |
| <p> |
| A number of DOM implementations are available in Java, including |
| commercial ones from Sun, IBM, Oracle, and DataChannel as well as |
| noncommercial ones from Docuverse, OpenXML, and Silfide. Why have |
| another? Some of the goals of this version: |
| </p> |
| |
| <ul> |
| <li>Advanced DOM support. This was the first generally available |
| implementation of DOM Level 2 in Java, and one of the first Level 3 |
| and XPath implementations.</li> |
| |
| <li> Free Software. This one is distributed under the GPL (with |
| "library exception") so it can be used with a different class of |
| application.</li> |
| |
| <li>Second implementation syndrome. I can do it simpler this time |
| around ... and heck, writing it only takes a bit over a day once you |
| know your way around.</li> |
| |
| <li>Sanity check the then-current Last Call DOM draft. Best to find |
| bugs early, when they're relatively fixable. Yes, bugs were found.</li> |
| |
| <li>Modularity. Most of the implementations mentioned above are part |
| of huge packages; take all (including bugs, of which some have far |
| too many), or take nothing. I prefer a menu approach, when possible. |
| This code is standalone, not beholden to any particular parser or XSL |
| or XPath code.</li> |
| |
| <li>OK, I'm a hacker, I like to write code.</li> |
| </ul> |
| |
| <p> |
| This also works with the GNU Compiler for Java (GCJ). GCJ promises |
| to be quite the environment for programming Java, both directly and from |
| C++ using the new CNI interfaces (which really use C++, unlike JNI). </p> |
| |
| |
| <h3>Open Issues</h3> |
| |
| <p>At this writing:</p> |
| <ul> |
| <li>See below for some restrictions on the mutation event |
| support ... some events aren't reported (and likely won't be).</li> |
| |
| <li>More testing and conformance work is needed.</li> |
| |
| <li>We need an XML Schema validator (actually we need validation in the DOM |
| full stop).</li> |
| </ul> |
| |
| <p> |
| I ran a profiler a few times and remove some of the performance hotspots, |
| but it's not tuned. Reporting mutation events, in particular, is |
| rather costly -- it started at about a 40% penalty for appendNode calls, |
| I've got it down around 12%, but it'll be hard to shrink it much further. |
| The overall code size is relatively small, though you may want to be rid of |
| many of the unused DOM interface classes (HTML, CSS, and so on). |
| </p> |
| |
| |
| <h2><a name="features">Features of this Package</a></h2> |
| |
| <p> Starting with DOM Level 2, you can really see that DOM is constructed |
| as a bunch of optional modules around a core of either XML or HTML |
| functionality. Different implementations will support different optional |
| modules. This implementation provides a set of features that should be |
| useful if you're not depending on the HTML functionality (lots of convenience |
| functions that mostly don't buy much except API surface area) and user |
| interface support. That is, browsers will want more -- but what they |
| need should be cleanly layered over what's already here. </p> |
| |
| <h3> Core Feature Set: "XML" </h3> |
| |
| <p> This DOM implementation supports the "XML" feature set, which basically |
| gets you four things over the bare core (which you're officially not supposed |
| to implement except in conjunction with the "XML" or "HTML" feature). In |
| order of decreasing utility, those four things are: </p> <ol> |
| |
| <li> ProcessingInstruction nodes. These are probably the most |
| valuable thing. Handy little buggers, in part because all the APIs |
| you need to use them are provided, and they're designed to let you |
| escape XML document structure rules in controlled ways.</li> |
| |
| <li> CDATASection nodes. These are of of limited utility since CDATA |
| is just text that prints funny. These are of use to some sorts of |
| applications, though I encourage folk to not use them. </li> |
| |
| <li> DocumentType nodes, and associated Notation and Entity nodes. |
| These appear to be useless. Briefly, these "Type" nodes expose no |
| typing information. They're only really usable to expose some lexical |
| structure that almost every application needs to ignore. (XML editors |
| might like to see them, but they need true typing information much more.) |
| I strongly encourage people not to use these. </li> |
| |
| <li> EntityReference nodes can show up. These are actively annoying, |
| since they add an extra level of hierarchy, are the cause of most of |
| the complexity in attribute values, and their contents are immutable. |
| Avoid these.</li> |
| |
| </ol> |
| |
| <h3> Optional Feature Sets: "Events", and friends </h3> |
| |
| <p> Events may be one of the more interesting new features in Level 2. |
| This package provides the core feature set and exposes mutation events. |
| No gooey events though; if you want that, write a layered implementation! </p> |
| |
| <p> Three mutation events aren't currently generated:</p> <ul> |
| |
| <li> <em>DOMSubtreeModified</em> is poorly specified. Think of this |
| as generating one such event around the time of finalization, which |
| is a fully conformant implementation. This implementation is exactly |
| as useful as that one. </li> |
| |
| <li> <em>DOMNodeRemovedFromDocument</em> and |
| <em>DOMNodeInsertedIntoDocument</em> are supposed to get sent to |
| every node in a subtree that gets removed or inserted (respectively). |
| This can be <em>extremely costly</em>, and the removal and insertion |
| processing is already significantly slower due to event reporting. |
| It's much easier, and more efficient, to have a listener higher in the |
| tree watch removal and insertion events through the bubbling or capture |
| mechanisms, than it is to watch for these two events.</li> |
| |
| </ul> |
| |
| <p> In addition, certain kinds of attribute modification aren't reported. |
| A fix is known, but it couldn't report the previous value of the attribute. |
| More work could fix all of this (as well as reduce the generally high cost |
| of childful attributes), but that's not been done yet. </p> |
| |
| <p> Also, note that it is a <em>Bad Thing™</em> to have the listener |
| for a mutation event change the ancestry for the target of that event. |
| Or to prevent mutation events from bubbling to where they're needed. |
| Just don't do those, OK? </p> |
| |
| <p> As an experimental feature (named "USER-Events"), you can provide |
| your own "user" events. Just name them anything starting with "USER-" |
| and you're set. Dispatch them through, bubbling, capturing, or what |
| ever takes your fancy. One important thing you can't currently do is |
| pass any data (like an object) with those events. Maybe later there |
| will be a "UserEvent" interface letting you get some substantial use |
| out of this mechanism even if you're not "inside" of a DOM package.</p> |
| |
| <p> You can create and send HTML events. Ditto UIEvents. Since DOM |
| doesn't require a UI, it's the UI's job to send them; perhaps that's |
| part of your application. </p> |
| |
| <p><em>This package may be built without the ability to report mutation |
| events, gaining a significant speedup in DOM construction time. However, |
| if that is done then certain other features -- notably node iterators |
| and getElementsByTagname -- will not be available.</em> |
| |
| |
| <h3> Optional Feature: "Traversal" </h3> |
| |
| <p> Each DOM node has all you need to walk to everything connected |
| to that node. Lightweight, efficient utilities are easily layered on |
| top of just the core APIs. </p> |
| |
| <p> Traversal APIs are an optional part of DOM Level 2, providing |
| a not-so-lightweight way to walk over DOM trees, if your application |
| didn't already have such utilities for use with data represented via |
| DOM. Implementing this helped debug the (optional) event and mutation |
| event subsystems, so it's provided here. </p> |
| |
| <p> At this writing, the "TreeWalker" interface isn't implemented. </p> |
| |
| |
| |
| <h2><a name='avoid'>DOM Functionality to Avoid</a></h2> |
| |
| <p> For what appear to be a combination of historical and "committee |
| logic" reasons, DOM has a number of <em>features which I strongly advise |
| you to avoid using</em> in your library and application code. These |
| include the following types of DOM nodes; see the documentation for the |
| implementation class for more information: <ul> |
| |
| <li> CDATASection |
| (<a href='DomCDATA.html'>DomCDATA</a> class) |
| ... use normal Text nodes instead, so you don't have to make |
| every algorithm recognize multiple types of character data |
| |
| <li> DocumentType |
| (<a href='DomDoctype.html'>DomDocType</a> class) |
| ... if this held actual typing information, it might be useful |
| |
| <li> Entity |
| (<a href='DomEntity.html'>DomEntity</a> class) |
| ... neither parsed nor unparsed entities work well in DOM; it |
| won't even tell you which attributes identify unparsed entities |
| |
| <li> EntityReference |
| (<a href='DomEntityReference.html'>DomEntityReference</a> class) |
| ... permitted implementation variances are extreme, all children |
| are readonly, and these can interact poorly with namespaces |
| |
| <li> Notation |
| (<a href='DomNotation.html'>DomNotation</a> class) |
| ... only really usable with unparsed entities (which aren't well |
| supported; see above) or perhaps with PIs after the DTD, not with |
| NOTATION attributes |
| |
| </ul> |
| |
| <p> If you really need to use unparsed entities or notations, use SAX; |
| it offers better support for all DTD-related functionality. |
| It also exposes actual |
| document typing information (such as element content models).</p> |
| |
| <p> Also, when accessing attribute values, use methods that provide their |
| values as single strings, rather than those which expose value substructure |
| (Text and EntityReference nodes). (See the <a href='DomAttr.html'>DomAttr</a> |
| documentation for more information.) </p> |
| |
| <p> Note that many of these features were provided as partial support for |
| editor functionality (including the incomplete DTD access). Full editor |
| functionality requires access to potentially malformed lexical structure, |
| at the level of unparsed tokens and below. Access at such levels is so |
| complex that using it in non-editor applications sacrifices all the |
| benefits of XML; editor aplications need extremely specialized APIs. </p> |
| |
| <p> (This isn't a slam against DTDs, note; only against the broken support |
| for them in DOM. Even despite inclusion of some dubious SGML legacy features |
| such as notations and unparsed entities, |
| and the ongoing proliferation of alternative schema and validation tools, |
| DTDs are still the most widely adopted tool |
| to constrain XML document structure. |
| Alternative schemes generally focus on data transfer style |
| applications; open document architectures comparable to |
| DocBook 4.0 don't yet exist in the schema world. |
| Feel free to use DTDs; just don't expect DOM to help you.) </p> |
| |
| </body> |
| </html> |
| |