llvm-gcc-4.2/libjava/classpath/gnu/xml/aelfred2/package.html - llvm-archive - Git at Google

 <!DOCTYPE html PUBLIC
 	'-//W3C//DTD XHTML 1.0 Transitional//EN'
 	'http://www.w3.org/TR/xhtml1/DTD/transitional.dtd'>

 <html><head>
     <title>package overview</title>
 <!--
 /*
  * Copyright (C) 1999,2000,2001 The Free Software Foundation, Inc.
  */
 -->
 </head><body>

 <p> This package contains &AElig;lfred2, which includes an
 enhanced SAX2-compatible version of the &AElig;lfred
 non-validating XML parser, a modular (and hence optional)
 DTD validating parser, and modular (and hence optional)
 JAXP glue to those.
 Use these like any other SAX2 parsers. </p>

 <ul>
     <li><a href="#about">About &AElig;lfred</a><ul>
 	<li><a href="#principles">Design Principles</a></li>
 	<li><a href="#name">About the Name &AElig;lfred</a></li>
 	<li><a href="#encodings">Character Encodings</a></li>
 	<li><a href="#violations">Known Conformance Violations</a></li>
 	<li><a href="#copyright">Licensing</a></li>
 	</ul></li>

     <li><a href="#changes">Changes Since the Last Microstar Release</a><ul>
 	<li><a href="#sax2">SAX2 Support</a></li>
 	<li><a href="#validation">Validation</a></li>
 	<li><a href="#smaller">You Want Smaller?</a></li>
 	<li><a href="#bugfixes">Bugs Fixed</a></li>
 	</ul></li>

 </ul>

 <h2><a name="about">About &AElig;lfred</a></h2>

 <p>&AElig;lfred is a XML parser written in the java programming language.

 <h3><a name="principles">Design Principles</a></h3>

 <p>In most Java applets and applications, XML should not be the central
 feature; instead, XML is the means to another end, such as loading
 configuration information, reading meta-data, or parsing transactions.</p>

 <p> When an XML parser is only a single component of a much larger
 program, it cannot be large, slow, or resource-intensive.  With Java
 applets, in particular, code size is a significant issue.  The standard
 modem is still not operating at 56 Kbaud, or sometimes even with data
 compression.  Assuming an uncompressed 28.8 Kbaud modem, only about
 3 KBytes can be downloaded in one second; compression often doubles
 that speed, but a V.90 modem may not provide another doubling.  When
 used with embedded processors, similar size concerns apply.  </p>

 <p> &AElig;lfred is designed for easy and efficient use over the Internet,
 based on the following principles: </p> <ol>

 <li> &AElig;lfred must be as small as possible, so that it doesn't add too
    much to an applet's download time. </li>

 <li> &AElig;lfred must use as few class files as possible, to minimize the
    number of HTTP connections necessary.  (The use of JAR files has made this
    be less of a concern.) </li>

 <li> &AElig;lfred must be compatible with most or all Java implementations
    and platforms. (Write once, run anywhere.) </li>

 <li> &AElig;lfred must use as little memory as possible, so that it does
    not take away resources from the rest of your program.  (It doesn't force
    you to use DOM or a similar costly data structure API.)</li>

 <li> &AElig;lfred must run as fast as possible, so that it does not slow down
    the rest of your program. </li>

 <li> &AElig;lfred must produce correct output for well-formed and valid
    documents, but need not reject every document that is not valid or
    not well-formed. (In &AElig;lfred2, correctness was a bigger concern
    than in the original version; and a validation option is available.) </li>

 <li> &AElig;lfred must provide full internationalization from the first
     release.  (&AElig;lfred2 now automatically handles all encodings
     supported by the underlying JVM; previous versions handled only
     UTF-8, UTF_16, ASCII, and ISO-8859-1.)</li>

 </ol>

 <p>As you can see from this list, &AElig;lfred is designed for production
 use, but neither validation nor perfect conformance was a requirement.
 Good validating parsers exist, including one in this package,
 and you should use them as appropriate.  (See conformance reviews
 available at <a href="http://www.xml.com/">http://www.xml.com</a>)
 </p>

 <p> One of the main goals of &AElig;lfred2 was to significantly improve
 conformance, while not significantly affecting the other goals stated above.
 Since the only use of this parser is with SAX, some classes could be
 removed, and so the overall size of &AElig;lfred was actually reduced.
 Subsequent performance work produced a notable speedup (over twenty
 percent on larger files).  That is, the tradeoffs between speed, size, and
 conformance were re-targeted towards conformance and support of newer APIs
 (SAX2), with a a positive performance impact. </p>

 <p> The role anticipated for this version of &AElig;lfred is as a
 lightweight Free Software SAX parser that can be used in essentially every
 Java program where the handful of conformance violations (noted below)
 are acceptable.
 That certainly includes applets, and
 nowadays one must also mention embedded systems as being even more
 size-critical.
 At this writing, all parsers that are more conformant are
 significantly larger, even when counting the optional
 validation support in this version of &AElig;lfred. </p>


 <h3><a name="name">About the Name <em>&AElig;lfred</em></a></h3>

 <p>&AElig;lfred the Great (AElfred in ASCII) was King of Wessex, and
 some say of King of England, at the time of his death in 899 AD.
 &AElig;lfred introduced a wide-spread literacy program in the hope that
 his people would learn to read English, at least, if Latin was too
 difficult for them.  This &AElig;lfred hopes to bring another sort of
 literacy to Java, using XML, at least, if full SGML is too difficult.</p>

 <p>The initial &AElig; ligature ("AE)" is also a reminder that XML is
 not limited to ASCII.</p>


 <h3><a name="encodings">Character Encodings</a></h3>

 <p> The &AElig;lfred parser currently builds in support for a handful
 of input encodings.  Of course these include UTF-8 and UTF-16, which
 all XML parsers are required to support:</p> <ul>

     <li> UTF-8 ... the standard eight bit encoding, used unless
     you provide an encoding declaration or a MIME charset tag.</li>

     <li> US-ASCII ... an extremely common seven bit encoding,
     which happens to be a subset of UTF-8 and ISO-8859-1 as well
     as many other encodings.  XHTML web pages using US-ASCII
     (without an encoding declaration) are probably more
     widely interoperable than those in any other encoding. </li>

     <li> ISO-8859-1 ... includes accented characters used in
     much of western Europe (but excluding the Euro currency
     symbol).</li>

     <li> UTF-16 ... with several variants, this encodes each
     sixteen bit Unicode character in sixteen bits of output.
     Variants include UTF-16BE (big endian, no byte order mark),
     UTF-16LE (little endian, no byte order mark), and
     ISO-10646-UCS-2 (an older and less used encoding, using a
     version of Unicode without surrogate pairs).  This is
     essentially the native encoding used by Java.  </li>

     <li> ISO-10646-UCS-4 ... a seldom-used four byte encoding,
     also known as UTF-32BE.  Four byte order variants are supported,
     including one known as UTF-32LE.  Some operating systems
     standardized on UCS-4 despite its significant size penalty,
     in anticipation that Unicode (even with surrogate pairs)
     would eventually become limiting.  UCS-4 permits encoding
     of non-Unicode characters, which Java can't represent (and
     XML doesn't allow).
     </li>

     </ul>

 <p> If you use any encoding other than UTF-8 or UTF-16 you should
 make sure to label your data appropriately: </p>

 <blockquote>
 &lt;?xml version="1.0" encoding="<b>ISO-8859-15</b>"?&gt;
 </blockquote>

 <p> Encodings accessed through <code>java.io.InputStreamReader</code>
 are now fully supported for both external labels (such as MIME types)
 and internal types (as shown above).
 There is one limitation in the support for internal labels:
 the encodings must be derived from the US-ASCII encoding,
 the EBCDIC family of encodings is not recognized.
 Note that Java defines its
 own encoding names, which don't always correspond to the standard
 Internet encoding names defined by the IETF/IANA, and that Java
 may even <em>require</em> use of nonstandard encoding names.
 Please report
 such problems; some of them can be worked around in this parser,
 and many can be worked around by using external labels.
 </p>

 <p>Note that if you are using the Euro symbol with an fixed length
 eight bit encoding, you should probably be using the encoding label
 <em>iso-8859-15</em> or, with a Microsoft OS, <em>cp-1252</em>.
 Of course, UTF-8 and UTF-16 handle the Euro symbol directly.
 </p>


 <h3><a name="violations">Known Conformance Violations</a></h3>

 <p>Known conformance issues should be of negligible importance for
 most applications, and include: </p><ul>

     <li> Rather than following the voluminous "Appendix B" rules about
     what characters may appear in names (and name tokens), the Unicode
     rules embedded in <em>java.lang.Character</em> are used.
     This means mostly that some names are inappropriately accepted,
     though a few are inappropriately rejected.  (It's much simpler
     to avoid that much special case code.  Recent OASIS/NIST test
     cases may have these rules be realistically testable.) </li>

     <li> Text containing "]]&gt;" is not rejected unless it fully resides
     in an internal buffer ... which is, thankfully, the typical case.  This
     text is illegal, but sometimes appears in illegal attempts to
     nest CDATA sections.  (Not catching that boundary condition
     substantially simplifies parsing text.) </li>

     <li> Surrogate characters that aren't correctly paired are ignored
     rather than rejected, unless they were encoded using UTF-8.  (This
     simplifies parsing text.)  Unicode 3.1 assigned the first characters
     to those character codes, in early 2001, so few documents (or tools)
     use such characters in any case. </li>

     <li> Declarations following references to an undefined parameter
     entity reference are not ignored. (Not maintaining and using state
     about this validity error simplifies declaration handling; few
     XML parsers address this constraint in any case.) </li>

     <li> Well formedness constraints for general entity references
     are not enforced.  (The code to handle the "content" production
     is merged with the element parsing code, making it hard to reuse
     for this additional situation.) </li>

 </ul>

 <p> When tested against the July 12, 1999 version of the OASIS
 XML Conformance test suite, an earlier version passed 1057 of 1067 tests.
 That contrasts with the original version, which passed 867.  The
 current parser is top-ranked in terms of conformance, as is its
 validating sibling (which has some additional conformance violations
 imposed on it by SAX2 API deficiencies as well as some of the more
 curious SGML layering artifacts found in the XML specification). </p>

 <p> The XML 1.0 specification itself was not without problems,
 and after some delays the W3C has come out with a revised
 "second edition" specification.  While that doesn't resolve all
 the problems identified the XML specification, many of the most
 egregious problems have been resolved.  (You still need to drink
 magic Kool-Aid before some DTD-related issues make sense.)
 To the extent possible, this parser conforms to that second
 edition specification, and does well against corrected versions
 of the OASIS/NIST XML conformance test cases.  See <a href=
 "http://xmlconf.sourceforge.net">http://xmlconf.sourceforge.net</a>
 for more information about SAX2/XML conformance testing. </p>


 <h3><a name="copyright">Copyright and distribution terms</a></h3>

 <p>
 The software in this package is distributed under the GNU General Public
 License (with a special exception described below).
 </p>

 <p>
 A copy of GNU General Public License (GPL) is included in this distribution,
 in the file COPYING.  If you do not have the source code, it is available at:

     <a href="http://www.gnu.org/software/classpath/">http://www.gnu.org/software/classpath/</a>
 </p>

 <pre>
   Linking this library statically or dynamically with other modules is
   making a combined work based on this library.  Thus, the terms and
   conditions of the GNU General Public License cover the whole
   combination.

   As a special exception, the copyright holders of this library give you
   permission to link this library with independent modules to produce an
   executable, regardless of the license terms of these independent
   modules, and to copy and distribute the resulting executable under
   terms of your choice, provided that you also meet, for each linked
   independent module, the terms and conditions of the license of that
   module.  An independent module is a module which is not derived from
   or based on this library.  If you modify this library, you may extend
   this exception to your version of the library, but you are not
   obligated to do so.  If you do not wish to do so, delete this
   exception statement from your version.

   Parts derived from code which carried the following notice:

   Copyright (c) 1997, 1998 by Microstar Software Ltd.

   AElfred is free for both commercial and non-commercial use and
   redistribution, provided that Microstar's copyright and disclaimer are
   retained intact.  You are free to modify AElfred for your own use and
   to redistribute AElfred with your modifications, provided that the
   modifications are clearly documented.

   This program is distributed in the hope that it will be useful, but
   WITHOUT ANY WARRANTY; without even the implied warranty of
   merchantability or fitness for a particular purpose.  Please use it AT
   YOUR OWN RISK.
 </pre>

 <p> Some of this documentation was modified from the original
 &AElig;lfred README.txt file.  All of it has been updated. </p>

 </p>


 <h2><a name="changes">Changes Since the last Microstar Release</a></h2>

 <p> As noted above, Microstar has not updated this parser since
 the summer of 1998, when it released version 1.2a on its web site.
 This release is intended to benefit the developer community by
 refocusing the API on SAX2, and improving conformance to the extent
 that most developers should not need to use another XML parser.  </p>

 <p> The code has been cleaned up (referring to the XML 1.0 spec in
 all the production numbers in
 comments, rather than some preliminary draft, for one example) and
 has been sped up a bit as well.
 JAXP support has been added, although developers are still
 strongly encouraged to use the SAX2 APIs directly.  </p>


 <h3><a name="sax2">SAX2 Support</a></h3>

 <p> The original version of &AElig;lfred did not support the
 SAX2 APIs. </p>

 <p> This version supports the SAX2 APIs, exposing the standard
 boolean feature descriptors.  It supports the "DeclHandler" property
 to provide access to all DTD declarations not already exposed
 through the SAX1 API.  The "LexicalHandler" property is supported,
 exposing entity boundaries (including the unnamed external subset) and
 things like comments and CDATA boundaries.  SAX1 compatibility is
 currently provided.</p>


 <h3><a name="validation">Validation</a></h3>

 <p> In the 'pipeline' package in this same software distribution is an
 <a href="../pipeline/ValidationConsumer.html">XML Validation component</a>
 using any full SAX2 event stream (including all document type declarations)
 to validate.  There is now a <a href="XmlReader.html">XmlReader</a> class
 which combines that class and this enhanced &AElig;lfred parser, creating
 an optionally validating SAX2 parser. </p>

 <p> As noted in the documentation for that validating component, certain
 validity constraints can't reliably be tested by a layered validator.
 These include all constraints relying on
 layering violations (exposing XML at the level of tokens or below,
 required since XML isn't a context-free grammar), some that
 SAX2 doesn't support, and a few others.  The resulting validating
 parser is conformant enough for most applications that aren't doing
 strange SGML tricks with DTDs.
 Moreover, that validating filter can be used without
 a parser ... any application component that emits SAX event streams
 can DTD-validate its output on demand. </p>

 <h3><a name="smaller">You want Smaller?</a></h3>

 <p> You'll have noticed that the original version of &AElig;lfred
 had small size as a top goal.  &AElig;lfred2 normally includes a
 DTD validation layer, but you can package without that.
 Similarly, JAXP factory support is available but optional.
 Then the main added cost due to this revision are for
 supporting the SAX2 API itself; DTD validation is as
 cleanly layered as allowed by SAX2.</p>

 <h3><a name="bugfixes">Bugs Fixed</a></h3>

 <p> Bugs fixed in &AElig;lfred2 include: </p>

 <ol>
     <li> Originally &AElig;lfred didn't close file descriptors, which
     led to file descriptor leakage on programs which ran for any
     length of time. </li>

     <li> NOTATION declarations without system identifiers are
     now handled correctly. </li>

     <li> DTD events are now reported for all invocations of a
     given parser, not just the first one. </li>

     <li> More correct character handling: <ul>

 	<li> Rejects out-of-range characters, both in text and in
 	character references. </li>

 	<li> Correctly handles character references that expand to
 	surrogate pairs. </li>

 	<li> Correctly handles UTF-8 encodings of surrogate pairs. </li>

 	<li> Correctly handles Unicode 3.1 rules about illegal UTF-8
 	encodings: there is only one legal encoding per character. </li>

 	<li> PUBLIC identifiers are now rejected if they have illegal
 	characters. </li>

 	<li> The parser is more correct about what characters are allowed
 	in names and name tokens.  Uses Unicode rules (built in to Java)
 	rather than the voluminous XML rules, although some extensions
 	have been made to match XML rules more closely.</li>

 	<li> Line ends are now normalized to newlines in all known
 	cases. </li>

 	</ul></li>

     <li> Certain validity errors were previously treated as well
     formedness violations. <ul>

 	<li> Repeated declarations of an element type are no
 	longer fatal errors. </li>

 	<li> Undeclared parameter entity references are no longer
 	fatal errors. </li>

 	</ul></li>

     <li> Attribute handling is improved: <ul>

 	<li> Whitespace must exist between attributes. </li>

 	<li> Only one value for a given attribute is permitted. </li>

 	<li> ATTLIST declarations don't need to declare attributes. </li>

 	<li> Attribute values are normalized when required. </li>

 	<li> Tabs in attribute values are normalized to spaces. </li>

 	<li> Attribute values containing a literal "&lt;" are rejected. </li>

 	</ul></li>

     <li> More correct entity handling: <ul>

 	<li> Whitespace must precede NDATA when declaring unparsed
 	entities.</li>

 	<li> Parameter entity declarations may not have NDATA annotations. </li>

 	<li> The XML specification has a bug in that it doesn't specify
 	that certain contexts exist within which parameter entity
 	expansion must not be performed.  Lacking an offical erratum,
 	this parser now disables such expansion inside comments,
 	processing instructions, ignored sections, public identifiers,
 	and parts of entity declarations. </li>

 	<li> Entity expansions that include quote characters no longer
 	confuse parsing of strings using such expansions. </li>

 	<li> Whitespace in the values of internal entities is not mapped
 	to space characters. </li>

 	<li> General Entity references in attribute defaults within the
 	DTD now cause fatal errors when the entity is not defined at the
 	time it is referenced. </li>

 	<li> Malformed general entity references in entity declarations are
 	now detected.  </li>

 	</ul></li>

     <li> Neither conditional sections
     nor parameter entity references within markup declarations
     are permitted in the internal subset. </li>

     <li> Processing instructions whose target names are "XML"
     (ignoring case) are now rejected. </li>

     <li> Comments may not include "--".</li>

     <li> Most "]]&gt;" sequences in text are rejected. </li>

     <li> Correct syntax for standalone declarations is enforced. </li>

     <li> Setting a locale for diagnostics only produces an exception
     if the language of that locale isn't English. </li>

     <li> Some more encoding names are recognized.  These include the
     Unicode 3.0 variants of UTF-16 (UTF-16BE, UTF-16LE) as well as
     US-ASCII and a few commonly seen synonyms. </li>

     <li> Text (from character content, PIs, or comments) large enough
     not to fit into internal buffers is now handled correctly even in
     some cases which were originally handled incorrectly.</li>

     <li> Content is now reported for element types for which attributes
     have been declared, but no content model is known.  (Such documents
     are invalid, but may still be well formed.) </li>

 </ol>

 <p> Other bugs may also have been fixed. </p>

 <p> For better overall validation support, some of the validity
 constraints that can't be verified using the SAX2 event stream
 are now reported directly by &AElig;lfred2. </p>

 </body></html>
	<!DOCTYPE html PUBLIC
	'-//W3C//DTD XHTML 1.0 Transitional//EN'
	'http://www.w3.org/TR/xhtml1/DTD/transitional.dtd'>

	<html><head>
	<title>package overview</title>
	<!--
	/*
	* Copyright (C) 1999,2000,2001 The Free Software Foundation, Inc.
	*/
	-->
	</head><body>

	<p> This package contains Ælfred2, which includes an
	enhanced SAX2-compatible version of the Ælfred
	non-validating XML parser, a modular (and hence optional)
	DTD validating parser, and modular (and hence optional)
	JAXP glue to those.
	Use these like any other SAX2 parsers. </p>

	<ul>
	<li><a href="#about">About Ælfred</a><ul>
	<li><a href="#principles">Design Principles</a></li>
	<li><a href="#name">About the Name Ælfred</a></li>
	<li><a href="#encodings">Character Encodings</a></li>
	<li><a href="#violations">Known Conformance Violations</a></li>
	<li><a href="#copyright">Licensing</a></li>
	</ul></li>

	<li><a href="#changes">Changes Since the Last Microstar Release</a><ul>
	<li><a href="#sax2">SAX2 Support</a></li>
	<li><a href="#validation">Validation</a></li>
	<li><a href="#smaller">You Want Smaller?</a></li>
	<li><a href="#bugfixes">Bugs Fixed</a></li>
	</ul></li>

	</ul>

	<h2><a name="about">About Ælfred</a></h2>

	<p>Ælfred is a XML parser written in the java programming language.

	<h3><a name="principles">Design Principles</a></h3>

	<p>In most Java applets and applications, XML should not be the central
	feature; instead, XML is the means to another end, such as loading
	configuration information, reading meta-data, or parsing transactions.</p>

	<p> When an XML parser is only a single component of a much larger
	program, it cannot be large, slow, or resource-intensive. With Java
	applets, in particular, code size is a significant issue. The standard
	modem is still not operating at 56 Kbaud, or sometimes even with data
	compression. Assuming an uncompressed 28.8 Kbaud modem, only about
	3 KBytes can be downloaded in one second; compression often doubles
	that speed, but a V.90 modem may not provide another doubling. When
	used with embedded processors, similar size concerns apply. </p>

	<p> Ælfred is designed for easy and efficient use over the Internet,
	based on the following principles: </p> <ol>

	<li> Ælfred must be as small as possible, so that it doesn't add too
	much to an applet's download time. </li>

	<li> Ælfred must use as few class files as possible, to minimize the
	number of HTTP connections necessary. (The use of JAR files has made this
	be less of a concern.) </li>

	<li> Ælfred must be compatible with most or all Java implementations
	and platforms. (Write once, run anywhere.) </li>

	<li> Ælfred must use as little memory as possible, so that it does
	not take away resources from the rest of your program. (It doesn't force
	you to use DOM or a similar costly data structure API.)</li>

	<li> Ælfred must run as fast as possible, so that it does not slow down
	the rest of your program. </li>

	<li> Ælfred must produce correct output for well-formed and valid
	documents, but need not reject every document that is not valid or
	not well-formed. (In Ælfred2, correctness was a bigger concern
	than in the original version; and a validation option is available.) </li>

	<li> Ælfred must provide full internationalization from the first
	release. (Ælfred2 now automatically handles all encodings
	supported by the underlying JVM; previous versions handled only
	UTF-8, UTF_16, ASCII, and ISO-8859-1.)</li>

	</ol>

	<p>As you can see from this list, Ælfred is designed for production
	use, but neither validation nor perfect conformance was a requirement.
	Good validating parsers exist, including one in this package,
	and you should use them as appropriate. (See conformance reviews
	available at <a href="http://www.xml.com/">http://www.xml.com</a>)
	</p>

	<p> One of the main goals of Ælfred2 was to significantly improve
	conformance, while not significantly affecting the other goals stated above.
	Since the only use of this parser is with SAX, some classes could be
	removed, and so the overall size of Ælfred was actually reduced.
	Subsequent performance work produced a notable speedup (over twenty
	percent on larger files). That is, the tradeoffs between speed, size, and
	conformance were re-targeted towards conformance and support of newer APIs
	(SAX2), with a a positive performance impact. </p>

	<p> The role anticipated for this version of Ælfred is as a
	lightweight Free Software SAX parser that can be used in essentially every
	Java program where the handful of conformance violations (noted below)
	are acceptable.
	That certainly includes applets, and
	nowadays one must also mention embedded systems as being even more
	size-critical.
	At this writing, all parsers that are more conformant are
	significantly larger, even when counting the optional
	validation support in this version of Ælfred. </p>


	<h3><a name="name">About the Name <em>Ælfred</em></a></h3>

	<p>Ælfred the Great (AElfred in ASCII) was King of Wessex, and
	some say of King of England, at the time of his death in 899 AD.
	Ælfred introduced a wide-spread literacy program in the hope that
	his people would learn to read English, at least, if Latin was too
	difficult for them. This Ælfred hopes to bring another sort of
	literacy to Java, using XML, at least, if full SGML is too difficult.</p>

	<p>The initial Æ ligature ("AE)" is also a reminder that XML is
	not limited to ASCII.</p>


	<h3><a name="encodings">Character Encodings</a></h3>

	<p> The Ælfred parser currently builds in support for a handful
	of input encodings. Of course these include UTF-8 and UTF-16, which
	all XML parsers are required to support:</p> <ul>

	<li> UTF-8 ... the standard eight bit encoding, used unless
	you provide an encoding declaration or a MIME charset tag.</li>

	<li> US-ASCII ... an extremely common seven bit encoding,
	which happens to be a subset of UTF-8 and ISO-8859-1 as well
	as many other encodings. XHTML web pages using US-ASCII
	(without an encoding declaration) are probably more
	widely interoperable than those in any other encoding. </li>

	<li> ISO-8859-1 ... includes accented characters used in
	much of western Europe (but excluding the Euro currency
	symbol).</li>

	<li> UTF-16 ... with several variants, this encodes each
	sixteen bit Unicode character in sixteen bits of output.
	Variants include UTF-16BE (big endian, no byte order mark),
	UTF-16LE (little endian, no byte order mark), and
	ISO-10646-UCS-2 (an older and less used encoding, using a
	version of Unicode without surrogate pairs). This is
	essentially the native encoding used by Java. </li>

	<li> ISO-10646-UCS-4 ... a seldom-used four byte encoding,
	also known as UTF-32BE. Four byte order variants are supported,
	including one known as UTF-32LE. Some operating systems
	standardized on UCS-4 despite its significant size penalty,
	in anticipation that Unicode (even with surrogate pairs)
	would eventually become limiting. UCS-4 permits encoding
	of non-Unicode characters, which Java can't represent (and
	XML doesn't allow).
	</li>

	</ul>

	<p> If you use any encoding other than UTF-8 or UTF-16 you should
	make sure to label your data appropriately: </p>

	<blockquote>
	<?xml version="1.0" encoding="<b>ISO-8859-15</b>"?>
	</blockquote>

	<p> Encodings accessed through <code>java.io.InputStreamReader</code>
	are now fully supported for both external labels (such as MIME types)
	and internal types (as shown above).
	There is one limitation in the support for internal labels:
	the encodings must be derived from the US-ASCII encoding,
	the EBCDIC family of encodings is not recognized.
	Note that Java defines its
	own encoding names, which don't always correspond to the standard
	Internet encoding names defined by the IETF/IANA, and that Java
	may even <em>require</em> use of nonstandard encoding names.
	Please report
	such problems; some of them can be worked around in this parser,
	and many can be worked around by using external labels.
	</p>

	<p>Note that if you are using the Euro symbol with an fixed length
	eight bit encoding, you should probably be using the encoding label
	<em>iso-8859-15</em> or, with a Microsoft OS, <em>cp-1252</em>.
	Of course, UTF-8 and UTF-16 handle the Euro symbol directly.
	</p>


	<h3><a name="violations">Known Conformance Violations</a></h3>

	<p>Known conformance issues should be of negligible importance for
	most applications, and include: </p><ul>

	<li> Rather than following the voluminous "Appendix B" rules about
	what characters may appear in names (and name tokens), the Unicode
	rules embedded in <em>java.lang.Character</em> are used.
	This means mostly that some names are inappropriately accepted,
	though a few are inappropriately rejected. (It's much simpler
	to avoid that much special case code. Recent OASIS/NIST test
	cases may have these rules be realistically testable.) </li>

	<li> Text containing "]]>" is not rejected unless it fully resides
	in an internal buffer ... which is, thankfully, the typical case. This
	text is illegal, but sometimes appears in illegal attempts to
	nest CDATA sections. (Not catching that boundary condition
	substantially simplifies parsing text.) </li>

	<li> Surrogate characters that aren't correctly paired are ignored
	rather than rejected, unless they were encoded using UTF-8. (This
	simplifies parsing text.) Unicode 3.1 assigned the first characters
	to those character codes, in early 2001, so few documents (or tools)
	use such characters in any case. </li>

	<li> Declarations following references to an undefined parameter
	entity reference are not ignored. (Not maintaining and using state
	about this validity error simplifies declaration handling; few
	XML parsers address this constraint in any case.) </li>

	<li> Well formedness constraints for general entity references
	are not enforced. (The code to handle the "content" production
	is merged with the element parsing code, making it hard to reuse
	for this additional situation.) </li>

	</ul>

	<p> When tested against the July 12, 1999 version of the OASIS
	XML Conformance test suite, an earlier version passed 1057 of 1067 tests.
	That contrasts with the original version, which passed 867. The
	current parser is top-ranked in terms of conformance, as is its
	validating sibling (which has some additional conformance violations
	imposed on it by SAX2 API deficiencies as well as some of the more
	curious SGML layering artifacts found in the XML specification). </p>

	<p> The XML 1.0 specification itself was not without problems,
	and after some delays the W3C has come out with a revised
	"second edition" specification. While that doesn't resolve all
	the problems identified the XML specification, many of the most
	egregious problems have been resolved. (You still need to drink
	magic Kool-Aid before some DTD-related issues make sense.)
	To the extent possible, this parser conforms to that second
	edition specification, and does well against corrected versions
	of the OASIS/NIST XML conformance test cases. See <a href=
	"http://xmlconf.sourceforge.net">http://xmlconf.sourceforge.net</a>
	for more information about SAX2/XML conformance testing. </p>


	<h3><a name="copyright">Copyright and distribution terms</a></h3>

	<p>
	The software in this package is distributed under the GNU General Public
	License (with a special exception described below).
	</p>

	<p>
	A copy of GNU General Public License (GPL) is included in this distribution,
	in the file COPYING. If you do not have the source code, it is available at:

	<a href="http://www.gnu.org/software/classpath/">http://www.gnu.org/software/classpath/</a>
	</p>

	<pre>
	Linking this library statically or dynamically with other modules is
	making a combined work based on this library. Thus, the terms and
	conditions of the GNU General Public License cover the whole
	combination.

	As a special exception, the copyright holders of this library give you
	permission to link this library with independent modules to produce an
	executable, regardless of the license terms of these independent
	modules, and to copy and distribute the resulting executable under
	terms of your choice, provided that you also meet, for each linked
	independent module, the terms and conditions of the license of that
	module. An independent module is a module which is not derived from
	or based on this library. If you modify this library, you may extend
	this exception to your version of the library, but you are not
	obligated to do so. If you do not wish to do so, delete this
	exception statement from your version.

	Parts derived from code which carried the following notice:

	Copyright (c) 1997, 1998 by Microstar Software Ltd.

	AElfred is free for both commercial and non-commercial use and
	redistribution, provided that Microstar's copyright and disclaimer are
	retained intact. You are free to modify AElfred for your own use and
	to redistribute AElfred with your modifications, provided that the
	modifications are clearly documented.

	This program is distributed in the hope that it will be useful, but
	WITHOUT ANY WARRANTY; without even the implied warranty of
	merchantability or fitness for a particular purpose. Please use it AT
	YOUR OWN RISK.
	</pre>

	<p> Some of this documentation was modified from the original
	Ælfred README.txt file. All of it has been updated. </p>

	</p>


	<h2><a name="changes">Changes Since the last Microstar Release</a></h2>

	<p> As noted above, Microstar has not updated this parser since
	the summer of 1998, when it released version 1.2a on its web site.
	This release is intended to benefit the developer community by
	refocusing the API on SAX2, and improving conformance to the extent
	that most developers should not need to use another XML parser. </p>

	<p> The code has been cleaned up (referring to the XML 1.0 spec in
	all the production numbers in
	comments, rather than some preliminary draft, for one example) and
	has been sped up a bit as well.
	JAXP support has been added, although developers are still
	strongly encouraged to use the SAX2 APIs directly. </p>


	<h3><a name="sax2">SAX2 Support</a></h3>

	<p> The original version of Ælfred did not support the
	SAX2 APIs. </p>

	<p> This version supports the SAX2 APIs, exposing the standard
	boolean feature descriptors. It supports the "DeclHandler" property
	to provide access to all DTD declarations not already exposed
	through the SAX1 API. The "LexicalHandler" property is supported,
	exposing entity boundaries (including the unnamed external subset) and
	things like comments and CDATA boundaries. SAX1 compatibility is
	currently provided.</p>


	<h3><a name="validation">Validation</a></h3>

	<p> In the 'pipeline' package in this same software distribution is an
	<a href="../pipeline/ValidationConsumer.html">XML Validation component</a>
	using any full SAX2 event stream (including all document type declarations)
	to validate. There is now a <a href="XmlReader.html">XmlReader</a> class
	which combines that class and this enhanced Ælfred parser, creating
	an optionally validating SAX2 parser. </p>

	<p> As noted in the documentation for that validating component, certain
	validity constraints can't reliably be tested by a layered validator.
	These include all constraints relying on
	layering violations (exposing XML at the level of tokens or below,
	required since XML isn't a context-free grammar), some that
	SAX2 doesn't support, and a few others. The resulting validating
	parser is conformant enough for most applications that aren't doing
	strange SGML tricks with DTDs.
	Moreover, that validating filter can be used without
	a parser ... any application component that emits SAX event streams
	can DTD-validate its output on demand. </p>

	<h3><a name="smaller">You want Smaller?</a></h3>

	<p> You'll have noticed that the original version of Ælfred
	had small size as a top goal. Ælfred2 normally includes a
	DTD validation layer, but you can package without that.
	Similarly, JAXP factory support is available but optional.
	Then the main added cost due to this revision are for
	supporting the SAX2 API itself; DTD validation is as
	cleanly layered as allowed by SAX2.</p>

	<h3><a name="bugfixes">Bugs Fixed</a></h3>

	<p> Bugs fixed in Ælfred2 include: </p>

	<ol>
	<li> Originally Ælfred didn't close file descriptors, which
	led to file descriptor leakage on programs which ran for any
	length of time. </li>

	<li> NOTATION declarations without system identifiers are
	now handled correctly. </li>

	<li> DTD events are now reported for all invocations of a
	given parser, not just the first one. </li>

	<li> More correct character handling: <ul>

	<li> Rejects out-of-range characters, both in text and in
	character references. </li>

	<li> Correctly handles character references that expand to
	surrogate pairs. </li>

	<li> Correctly handles UTF-8 encodings of surrogate pairs. </li>

	<li> Correctly handles Unicode 3.1 rules about illegal UTF-8
	encodings: there is only one legal encoding per character. </li>

	<li> PUBLIC identifiers are now rejected if they have illegal
	characters. </li>

	<li> The parser is more correct about what characters are allowed
	in names and name tokens. Uses Unicode rules (built in to Java)
	rather than the voluminous XML rules, although some extensions
	have been made to match XML rules more closely.</li>

	<li> Line ends are now normalized to newlines in all known
	cases. </li>

	</ul></li>

	<li> Certain validity errors were previously treated as well
	formedness violations. <ul>

	<li> Repeated declarations of an element type are no
	longer fatal errors. </li>

	<li> Undeclared parameter entity references are no longer
	fatal errors. </li>

	</ul></li>

	<li> Attribute handling is improved: <ul>

	<li> Whitespace must exist between attributes. </li>

	<li> Only one value for a given attribute is permitted. </li>

	<li> ATTLIST declarations don't need to declare attributes. </li>

	<li> Attribute values are normalized when required. </li>

	<li> Tabs in attribute values are normalized to spaces. </li>

	<li> Attribute values containing a literal "<" are rejected. </li>

	</ul></li>

	<li> More correct entity handling: <ul>

	<li> Whitespace must precede NDATA when declaring unparsed
	entities.</li>

	<li> Parameter entity declarations may not have NDATA annotations. </li>

	<li> The XML specification has a bug in that it doesn't specify
	that certain contexts exist within which parameter entity
	expansion must not be performed. Lacking an offical erratum,
	this parser now disables such expansion inside comments,
	processing instructions, ignored sections, public identifiers,
	and parts of entity declarations. </li>

	<li> Entity expansions that include quote characters no longer
	confuse parsing of strings using such expansions. </li>

	<li> Whitespace in the values of internal entities is not mapped
	to space characters. </li>

	<li> General Entity references in attribute defaults within the
	DTD now cause fatal errors when the entity is not defined at the
	time it is referenced. </li>

	<li> Malformed general entity references in entity declarations are
	now detected. </li>

	</ul></li>

	<li> Neither conditional sections
	nor parameter entity references within markup declarations
	are permitted in the internal subset. </li>

	<li> Processing instructions whose target names are "XML"
	(ignoring case) are now rejected. </li>

	<li> Comments may not include "--".</li>

	<li> Most "]]>" sequences in text are rejected. </li>

	<li> Correct syntax for standalone declarations is enforced. </li>

	<li> Setting a locale for diagnostics only produces an exception
	if the language of that locale isn't English. </li>

	<li> Some more encoding names are recognized. These include the
	Unicode 3.0 variants of UTF-16 (UTF-16BE, UTF-16LE) as well as
	US-ASCII and a few commonly seen synonyms. </li>

	<li> Text (from character content, PIs, or comments) large enough
	not to fit into internal buffers is now handled correctly even in
	some cases which were originally handled incorrectly.</li>

	<li> Content is now reported for element types for which attributes
	have been declared, but no content model is known. (Such documents
	are invalid, but may still be well formed.) </li>

	</ol>

	<p> Other bugs may also have been fixed. </p>

	<p> For better overall validation support, some of the validity
	constraints that can't be verified using the SAX2 event stream
	are now reported directly by Ælfred2. </p>

	</body></html>