llvm-gcc-4.2/libjava/classpath/gnu/xml/dom/package.html - llvm-archive - Git at Google

 <html>
 <body>

 <p>
 This is a Free Software DOM Level 3 implementation, supporting these features:
 <ul>
 <li>"XML"</li>
 <li>"Events"</li>
 <li>"MutationEvents"</li>
 <li>"HTMLEvents" (won't generate them though)</li>
 <li>"UIEvents" (also won't generate them)</li>
 <li>"USER-Events" (a conformant extension)</li>
 <li>"Traversal" (optional)</li>
 <li>"XPath"</li>
 <li>"LS" and "LS-Async"</li>
 </ul>
 It is intended to be a reasonable base both for
 experimentation and supporting additional DOM modules as clean layers.
 </p>

 <p>
 Note that while DOM does not specify its behavior in the
 face of concurrent access, this implementation does.
 Specifically:
 <ul>
 <li>If only one thread at a time accesses a Document,
 of if several threads cooperate for read-only access,
 then no concurrency conflicts will occur.</li>
 <li>If several threads mutate a given document
 (or send events using it) at the same time,
 there is currently no guarantee that
 they won't interfere with each other.</li>
 </ul>
 </p>

 <h3>Design Goals</h3>

 <p>
 A number of DOM implementations are available in Java, including
 commercial ones from Sun, IBM, Oracle, and DataChannel as well as
 noncommercial ones from Docuverse, OpenXML, and Silfide.  Why have
 another?  Some of the goals of this version:
 </p>

 <ul>
 <li>Advanced DOM support. This was the first generally available
 implementation of DOM Level 2 in Java, and one of the first Level 3
 and XPath implementations.</li>

 <li> Free Software.  This one is distributed under the GPL (with
 "library exception") so it can be used with a different class of
 application.</li>

 <li>Second implementation syndrome.  I can do it simpler this time
 around ... and heck, writing it only takes a bit over a day once you
 know your way around.</li>

 <li>Sanity check the then-current Last Call DOM draft.  Best to find
 bugs early, when they're relatively fixable.  Yes, bugs were found.</li>

 <li>Modularity.  Most of the implementations mentioned above are part
 of huge packages; take all (including bugs, of which some have far
 too many), or take nothing.  I prefer a menu approach, when possible.
 This code is standalone, not beholden to any particular parser or XSL
 or XPath code.</li>

 <li>OK, I'm a hacker, I like to write code.</li>
 </ul>

 <p>
 This also works with the GNU Compiler for Java (GCJ).  GCJ promises
 to be quite the environment for programming Java, both directly and from
 C++ using the new CNI interfaces (which really use C++, unlike JNI). </p>


 <h3>Open Issues</h3>

 <p>At this writing:</p>
 <ul>
 <li>See below for some restrictions on the mutation event
 support ... some events aren't reported (and likely won't be).</li>

 <li>More testing and conformance work is needed.</li>

 <li>We need an XML Schema validator (actually we need validation in the DOM
 full stop).</li>
 </ul>

 <p>
 I ran a profiler a few times and remove some of the performance hotspots,
 but it's not tuned.  Reporting mutation events, in particular, is
 rather costly -- it started at about a 40% penalty for appendNode calls,
 I've got it down around 12%, but it'll be hard to shrink it much further.
 The overall code size is relatively small, though you may want to be rid of
 many of the unused DOM interface classes (HTML, CSS, and so on).
 </p>


 <h2><a name="features">Features of this Package</a></h2>

 <p> Starting with DOM Level 2, you can really see that DOM is constructed
 as a bunch of optional modules around a core of either XML or HTML
 functionality.  Different implementations will support different optional
 modules.  This implementation provides a set of features that should be
 useful if you're not depending on the HTML functionality (lots of convenience
 functions that mostly don't buy much except API surface area) and user
 interface support.  That is, browsers will want more -- but what they
 need should be cleanly layered over what's already here. </p>

 <h3> Core Feature Set:  "XML" </h3>

 <p> This DOM implementation supports the "XML" feature set, which basically
 gets you four things over the bare core (which you're officially not supposed
 to implement except in conjunction with the "XML" or "HTML" feature).  In
 order of decreasing utility, those four things are: </p> <ol>

     <li> ProcessingInstruction nodes.  These are probably the most
     valuable thing. Handy little buggers, in part because all the APIs
     you need to use them are provided, and they're designed to let you
     escape XML document structure rules in controlled ways.</li>

     <li> CDATASection nodes.  These are of of limited utility since CDATA
     is just text that prints funny. These are of use to some sorts of
     applications, though I encourage folk to not use them. </li>

     <li> DocumentType nodes, and associated Notation and Entity nodes.
     These appear to be useless.  Briefly, these "Type" nodes expose no
     typing information.  They're only really usable to expose some lexical
     structure that almost every application needs to ignore.  (XML editors
     might like to see them, but they need true typing information much more.)
     I strongly encourage people not to use these.  </li>

     <li> EntityReference nodes can show up.  These are actively annoying,
     since they add an extra level of hierarchy, are the cause of most of
     the complexity in attribute values, and their contents are immutable.
     Avoid these.</li>

     </ol>

 <h3> Optional Feature Sets:  "Events", and friends </h3>

 <p> Events may be one of the more interesting new features in Level 2.
 This package provides the core feature set and exposes mutation events.
 No gooey events though; if you want that, write a layered implementation! </p>

 <p> Three mutation events aren't currently generated:</p> <ul>

     <li> <em>DOMSubtreeModified</em> is poorly specified.  Think of this
     as generating one such event around the time of finalization, which
     is a fully conformant implementation.  This implementation is exactly
     as useful as that one. </li>

     <li> <em>DOMNodeRemovedFromDocument</em> and
     <em>DOMNodeInsertedIntoDocument</em> are supposed to get sent to
     every node in a subtree that gets removed or inserted (respectively).
     This can be <em>extremely costly</em>, and the removal and insertion
     processing is already significantly slower due to event reporting.
     It's much easier, and more efficient, to have a listener higher in the
     tree watch removal and insertion events through the bubbling or capture
     mechanisms, than it is to watch for these two events.</li>

     </ul>

 <p> In addition, certain kinds of attribute modification aren't reported.
 A fix is known, but it couldn't report the previous value of the attribute.
 More work could fix all of this (as well as reduce the generally high cost
 of childful attributes), but that's not been done yet. </p>

 <p> Also, note that it is a <em>Bad Thing&#153;</em> to have the listener
 for a mutation event change the ancestry for the target of that event.
 Or to prevent mutation events from bubbling to where they're needed.
 Just don't do those, OK? </p>

 <p> As an experimental feature (named "USER-Events"), you can provide
 your own "user" events.  Just name them anything starting with "USER-"
 and you're set.  Dispatch them through, bubbling, capturing, or what
 ever takes your fancy.  One important thing you can't currently do is
 pass any data (like an object) with those events.  Maybe later there
 will be a "UserEvent" interface letting you get some substantial use
 out of this mechanism even if you're not "inside" of a DOM package.</p>

 <p> You can create and send HTML events.  Ditto UIEvents.  Since DOM
 doesn't require a UI, it's the UI's job to send them; perhaps that's
 part of your application.  </p>

 <p><em>This package may be built without the ability to report mutation
 events, gaining a significant speedup in DOM construction time.  However,
 if that is done then certain other features -- notably node iterators
 and getElementsByTagname -- will not be available.</em>


 <h3> Optional Feature:  "Traversal" </h3>

 <p> Each DOM node has all you need to walk to everything connected
 to that node.  Lightweight, efficient utilities are easily layered on
 top of just the core APIs. </p>

 <p> Traversal APIs are an optional part of DOM Level 2, providing
 a not-so-lightweight way to walk over DOM trees, if your application
 didn't already have such utilities for use with data represented via
 DOM.  Implementing this helped debug the (optional) event and mutation
 event subsystems, so it's provided here.  </p>

 <p> At this writing, the "TreeWalker" interface isn't implemented. </p>


 <h2><a name='avoid'>DOM Functionality to Avoid</a></h2>

 <p> For what appear to be a combination of historical and "committee
 logic" reasons, DOM has a number of <em>features which I strongly advise
 you to avoid using</em> in your library and application code.  These
 include the following types of DOM nodes; see the documentation for the
 implementation class for more information: <ul>

     <li> CDATASection
     (<a href='DomCDATA.html'>DomCDATA</a> class)
     ... use normal Text nodes instead, so you don't have to make
     every algorithm recognize multiple types of character data

     <li> DocumentType
     (<a href='DomDoctype.html'>DomDocType</a> class)
     ... if this held actual typing information, it might be useful

     <li> Entity
     (<a href='DomEntity.html'>DomEntity</a> class)
     ... neither parsed nor unparsed entities work well in DOM; it
     won't even tell you which attributes identify unparsed entities

     <li> EntityReference
     (<a href='DomEntityReference.html'>DomEntityReference</a> class)
     ... permitted implementation variances are extreme, all children
     are readonly, and these can interact poorly with namespaces

     <li> Notation
     (<a href='DomNotation.html'>DomNotation</a> class)
     ... only really usable with unparsed entities (which aren't well
     supported; see above) or perhaps with PIs after the DTD, not with
     NOTATION attributes

     </ul>

 <p> If you really need to use unparsed entities or notations, use SAX;
 it offers better support for all DTD-related functionality.
 It also exposes actual
 document typing information (such as element content models).</p>

 <p> Also, when accessing attribute values, use methods that provide their
 values as single strings, rather than those which expose value substructure
 (Text and EntityReference nodes).  (See the <a href='DomAttr.html'>DomAttr</a>
 documentation for more information.) </p>

 <p> Note that many of these features were provided as partial support for
 editor functionality (including the incomplete DTD access).  Full editor
 functionality requires access to potentially malformed lexical structure,
 at the level of unparsed tokens and below.  Access at such levels is so
 complex that using it in non-editor applications sacrifices all the
 benefits of XML; editor aplications need extremely specialized APIs. </p>

 <p> (This isn't a slam against DTDs, note; only against the broken support
 for them in DOM.  Even despite inclusion of some dubious SGML legacy features
 such as notations and unparsed entities,
 and the ongoing proliferation of alternative schema and validation tools,
 DTDs are still the most widely adopted tool
 to constrain XML document structure.
 Alternative schemes generally focus on data transfer style
 applications; open document architectures comparable to
 DocBook 4.0 don't yet exist in the schema world.
 Feel free to use DTDs; just don't expect DOM to help you.) </p>

 </body>
 </html>
	<html>
	<body>

	<p>
	This is a Free Software DOM Level 3 implementation, supporting these features:
	<ul>
	<li>"XML"</li>
	<li>"Events"</li>
	<li>"MutationEvents"</li>
	<li>"HTMLEvents" (won't generate them though)</li>
	<li>"UIEvents" (also won't generate them)</li>
	<li>"USER-Events" (a conformant extension)</li>
	<li>"Traversal" (optional)</li>
	<li>"XPath"</li>
	<li>"LS" and "LS-Async"</li>
	</ul>
	It is intended to be a reasonable base both for
	experimentation and supporting additional DOM modules as clean layers.
	</p>

	<p>
	Note that while DOM does not specify its behavior in the
	face of concurrent access, this implementation does.
	Specifically:
	<ul>
	<li>If only one thread at a time accesses a Document,
	of if several threads cooperate for read-only access,
	then no concurrency conflicts will occur.</li>
	<li>If several threads mutate a given document
	(or send events using it) at the same time,
	there is currently no guarantee that
	they won't interfere with each other.</li>
	</ul>
	</p>

	<h3>Design Goals</h3>

	<p>
	A number of DOM implementations are available in Java, including
	commercial ones from Sun, IBM, Oracle, and DataChannel as well as
	noncommercial ones from Docuverse, OpenXML, and Silfide. Why have
	another? Some of the goals of this version:
	</p>

	<ul>
	<li>Advanced DOM support. This was the first generally available
	implementation of DOM Level 2 in Java, and one of the first Level 3
	and XPath implementations.</li>

	<li> Free Software. This one is distributed under the GPL (with
	"library exception") so it can be used with a different class of
	application.</li>

	<li>Second implementation syndrome. I can do it simpler this time
	around ... and heck, writing it only takes a bit over a day once you
	know your way around.</li>

	<li>Sanity check the then-current Last Call DOM draft. Best to find
	bugs early, when they're relatively fixable. Yes, bugs were found.</li>

	<li>Modularity. Most of the implementations mentioned above are part
	of huge packages; take all (including bugs, of which some have far
	too many), or take nothing. I prefer a menu approach, when possible.
	This code is standalone, not beholden to any particular parser or XSL
	or XPath code.</li>

	<li>OK, I'm a hacker, I like to write code.</li>
	</ul>

	<p>
	This also works with the GNU Compiler for Java (GCJ). GCJ promises
	to be quite the environment for programming Java, both directly and from
	C++ using the new CNI interfaces (which really use C++, unlike JNI). </p>


	<h3>Open Issues</h3>

	<p>At this writing:</p>
	<ul>
	<li>See below for some restrictions on the mutation event
	support ... some events aren't reported (and likely won't be).</li>

	<li>More testing and conformance work is needed.</li>

	<li>We need an XML Schema validator (actually we need validation in the DOM
	full stop).</li>
	</ul>

	<p>
	I ran a profiler a few times and remove some of the performance hotspots,
	but it's not tuned. Reporting mutation events, in particular, is
	rather costly -- it started at about a 40% penalty for appendNode calls,
	I've got it down around 12%, but it'll be hard to shrink it much further.
	The overall code size is relatively small, though you may want to be rid of
	many of the unused DOM interface classes (HTML, CSS, and so on).
	</p>


	<h2><a name="features">Features of this Package</a></h2>

	<p> Starting with DOM Level 2, you can really see that DOM is constructed
	as a bunch of optional modules around a core of either XML or HTML
	functionality. Different implementations will support different optional
	modules. This implementation provides a set of features that should be
	useful if you're not depending on the HTML functionality (lots of convenience
	functions that mostly don't buy much except API surface area) and user
	interface support. That is, browsers will want more -- but what they
	need should be cleanly layered over what's already here. </p>

	<h3> Core Feature Set: "XML" </h3>

	<p> This DOM implementation supports the "XML" feature set, which basically
	gets you four things over the bare core (which you're officially not supposed
	to implement except in conjunction with the "XML" or "HTML" feature). In
	order of decreasing utility, those four things are: </p> <ol>

	<li> ProcessingInstruction nodes. These are probably the most
	valuable thing. Handy little buggers, in part because all the APIs
	you need to use them are provided, and they're designed to let you
	escape XML document structure rules in controlled ways.</li>

	<li> CDATASection nodes. These are of of limited utility since CDATA
	is just text that prints funny. These are of use to some sorts of
	applications, though I encourage folk to not use them. </li>

	<li> DocumentType nodes, and associated Notation and Entity nodes.
	These appear to be useless. Briefly, these "Type" nodes expose no
	typing information. They're only really usable to expose some lexical
	structure that almost every application needs to ignore. (XML editors
	might like to see them, but they need true typing information much more.)
	I strongly encourage people not to use these. </li>

	<li> EntityReference nodes can show up. These are actively annoying,
	since they add an extra level of hierarchy, are the cause of most of
	the complexity in attribute values, and their contents are immutable.
	Avoid these.</li>

	</ol>

	<h3> Optional Feature Sets: "Events", and friends </h3>

	<p> Events may be one of the more interesting new features in Level 2.
	This package provides the core feature set and exposes mutation events.
	No gooey events though; if you want that, write a layered implementation! </p>

	<p> Three mutation events aren't currently generated:</p> <ul>

	<li> <em>DOMSubtreeModified</em> is poorly specified. Think of this
	as generating one such event around the time of finalization, which
	is a fully conformant implementation. This implementation is exactly
	as useful as that one. </li>

	<li> <em>DOMNodeRemovedFromDocument</em> and
	<em>DOMNodeInsertedIntoDocument</em> are supposed to get sent to
	every node in a subtree that gets removed or inserted (respectively).
	This can be <em>extremely costly</em>, and the removal and insertion
	processing is already significantly slower due to event reporting.
	It's much easier, and more efficient, to have a listener higher in the
	tree watch removal and insertion events through the bubbling or capture
	mechanisms, than it is to watch for these two events.</li>

	</ul>

	<p> In addition, certain kinds of attribute modification aren't reported.
	A fix is known, but it couldn't report the previous value of the attribute.
	More work could fix all of this (as well as reduce the generally high cost
	of childful attributes), but that's not been done yet. </p>

	<p> Also, note that it is a <em>Bad Thing</em> to have the listener
	for a mutation event change the ancestry for the target of that event.
	Or to prevent mutation events from bubbling to where they're needed.
	Just don't do those, OK? </p>

	<p> As an experimental feature (named "USER-Events"), you can provide
	your own "user" events. Just name them anything starting with "USER-"
	and you're set. Dispatch them through, bubbling, capturing, or what
	ever takes your fancy. One important thing you can't currently do is
	pass any data (like an object) with those events. Maybe later there
	will be a "UserEvent" interface letting you get some substantial use
	out of this mechanism even if you're not "inside" of a DOM package.</p>

	<p> You can create and send HTML events. Ditto UIEvents. Since DOM
	doesn't require a UI, it's the UI's job to send them; perhaps that's
	part of your application. </p>

	<p><em>This package may be built without the ability to report mutation
	events, gaining a significant speedup in DOM construction time. However,
	if that is done then certain other features -- notably node iterators
	and getElementsByTagname -- will not be available.</em>


	<h3> Optional Feature: "Traversal" </h3>

	<p> Each DOM node has all you need to walk to everything connected
	to that node. Lightweight, efficient utilities are easily layered on
	top of just the core APIs. </p>

	<p> Traversal APIs are an optional part of DOM Level 2, providing
	a not-so-lightweight way to walk over DOM trees, if your application
	didn't already have such utilities for use with data represented via
	DOM. Implementing this helped debug the (optional) event and mutation
	event subsystems, so it's provided here. </p>

	<p> At this writing, the "TreeWalker" interface isn't implemented. </p>



	<h2><a name='avoid'>DOM Functionality to Avoid</a></h2>

	<p> For what appear to be a combination of historical and "committee
	logic" reasons, DOM has a number of <em>features which I strongly advise
	you to avoid using</em> in your library and application code. These
	include the following types of DOM nodes; see the documentation for the
	implementation class for more information: <ul>

	<li> CDATASection
	(<a href='DomCDATA.html'>DomCDATA</a> class)
	... use normal Text nodes instead, so you don't have to make
	every algorithm recognize multiple types of character data

	<li> DocumentType
	(<a href='DomDoctype.html'>DomDocType</a> class)
	... if this held actual typing information, it might be useful

	<li> Entity
	(<a href='DomEntity.html'>DomEntity</a> class)
	... neither parsed nor unparsed entities work well in DOM; it
	won't even tell you which attributes identify unparsed entities

	<li> EntityReference
	(<a href='DomEntityReference.html'>DomEntityReference</a> class)
	... permitted implementation variances are extreme, all children
	are readonly, and these can interact poorly with namespaces

	<li> Notation
	(<a href='DomNotation.html'>DomNotation</a> class)
	... only really usable with unparsed entities (which aren't well
	supported; see above) or perhaps with PIs after the DTD, not with
	NOTATION attributes

	</ul>

	<p> If you really need to use unparsed entities or notations, use SAX;
	it offers better support for all DTD-related functionality.
	It also exposes actual
	document typing information (such as element content models).</p>

	<p> Also, when accessing attribute values, use methods that provide their
	values as single strings, rather than those which expose value substructure
	(Text and EntityReference nodes). (See the <a href='DomAttr.html'>DomAttr</a>
	documentation for more information.) </p>

	<p> Note that many of these features were provided as partial support for
	editor functionality (including the incomplete DTD access). Full editor
	functionality requires access to potentially malformed lexical structure,
	at the level of unparsed tokens and below. Access at such levels is so
	complex that using it in non-editor applications sacrifices all the
	benefits of XML; editor aplications need extremely specialized APIs. </p>

	<p> (This isn't a slam against DTDs, note; only against the broken support
	for them in DOM. Even despite inclusion of some dubious SGML legacy features
	such as notations and unparsed entities,
	and the ongoing proliferation of alternative schema and validation tools,
	DTDs are still the most widely adopted tool
	to constrain XML document structure.
	Alternative schemes generally focus on data transfer style
	applications; open document architectures comparable to
	DocBook 4.0 don't yet exist in the schema world.
	Feel free to use DTDs; just don't expect DOM to help you.) </p>

	</body>
	</html>