A survey of XML standards: Part 2

来源:百度文库 编辑:神马文学网 时间:2024/04/28 06:28:03

A survey of XML standards: Part 2

XML processing standards

Document options

Print this page

E-mail this page


My developerWorks needs you!

Connect to your technical community


Rate this page

Help us improve this content


Level: Introductory

Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.

03 Feb 2004

The world of XML is vast and growing, with a huge variety of standards and technologies that interact in complex ways. It can be difficult for beginners to navigate the most important aspects of XML, and for users to keep track of new entries and changes in the space. Uche Ogbuji continues this series on XML standards by focusing on XML processing technologies.


XML started strong and has grown quite rapidly. It has proven itself a very valuable technology, but it can be an intimidating one, when one considers all the moving parts that fall under the term "XML". In this series of articles, I provide a summary of what I see as the most important XML technologies, and discuss how they each fit into the greater scope of things in the XML world. I also recommend tutorials and other useful resources for evaluating and learning to use each technology.

All the technologies I present here are standards, although that word is itself a bit slippery. Standards come in all forms, and multiple standards often compete in the same space. I follow the practical approach of defining a standard as any specification that is significantly adopted by a diversity of vendors, or is recommended by a respectable, vendor-neutral organization.

In my first article, I focused on core XML technologies. (See the sidebar in that article for an overview of the various standards development bodies and how specifications are categorized.) In this article, I cover standards relating to XML processing by developers. In the next, I shall present a selection of the most important XML applications (that is, vocabularies).

XSLT

Extensible Stylesheet Language Transformations (XSLT) 1.0 [W3C Recommendation] is a language for describing transforms from an input XML document to an output tree. The output tree can, for example, take the form of an HTML document or another XML format and, as such, XSLT can be a language for rendering XML into legacy browser display form or for scripted operations on XML files. The transform is itself defined as an XML document in a special vocabulary. XPath (covered earlier) is used for accessing the source document and general expression processing. Special instructions set up processing rules (XSLT is a declarative language) and direct the creation of the output tree. XSLT 1.0 is an extraordinarily successful language and it covers most common XML processing tasks. If you are familiar with XML it is easy to learn the basics of XSLT, though mastering the language takes some effort. It has a well-designed extensibility mechanism. Its declarative processing model allows for very maintainable and reusable code. The standard way to link an XML document to its XSLT stylesheet document is defined in Associating Style Sheets with XML documents, Version 1.0 [W3C Recommendation]. The XSLT specification has been widely translated.

As I mentioned, XSLT has a nice extension mechanism with which you can define additional capabilities using the language of your choice. But it's even nicer when you don't have to write extensions because someone else has done so for you. EXSLT [Community specification] is a standard set of such extensions defined in an implementation-agnostic way. EXSLT attempts to cover the most commonly needed extensions, such as date processing, regular expressions, and mathematical operations. Many XSLT implementations implement one or more EXSLT modules.

XSLT 2.0 [in development] offers some key improvements based on collective experience with XSLT 1.0, but it has the disadvantage of being closely tied to XPath 2.0, which I think is fundamentally flawed (see Part 1).

Recommended introductions and tutorials

  • W3Schools offers a brief XSLT tutorial.
  • ZVON offers a more in-depth XSLT tutorial.
  • IBM developerWorks offers several XSLT tutorials, including:
    • "Create multi-purpose Web content with XSLT" (March 2003)
    • "Python and XML development using 4Suite, Part 2: 4XPath and 4XSLT" (October 2001) which includes an introduction to XSLT
  • To get started with EXSLT, see "EXSLT by example" (developerWorks, February 2003).

References and other resources

  • ZVON offers an XSLT Reference.
  • Dave Pawson's XSL FAQ covers XSLT and XPath as well as XSL-FO (covered later in this series).
  • TopXML offers over 100 examples of XSLT stylesheets, arranged by category.
  • Jeni Tennison is famous for her clear and incisive explanations of XSLT arcana. Her XSLT Pages are an excellent reference for common XSLT questions and problems.


Back to top

SAX

Simple API for XML (SAX) [Community specification] is an event-driven API. The developer registers handler code for specific events that are triggered by different parts of XML markup (such as start and end tags, text, entities). The parser then sends a stream of these events based on the input XML, which the handler code processes in turn.

SAX was essentially created on a marathon thread starting in late 1997 on the XML-DEV mailing list, which has long been the prime habitat for XML experts. David Megginson led the discussion, and the result was one of the most successful XML initiatives, with no large company or standards-body sponsorship. Before SAX, each parser had its own peculiar API for communicating XML structure to handler code, and SAX provided important unification. In general, parsers provide SAX drivers that translate low-level parser events into SAX standard events, allowing for portable code. SAX was developed with the Java language in mind, but has become popular across numerous languages and environments, although sometimes its Java-centricity complicates porting. SAX is currently in its second generation, which includes XML namespace processing and optional reporting of certain events relating to document structure.

In mainstream languages, event-based interfaces are usually implemented using callback functions, a style familiar from GUI programming and the like. In object-oriented languages, callbacks are usually registered methods for an object, using polymorphism to match the method name to the handler code, and using encapsulation to manage state in the handler between callbacks. This overall model of event-based programming is known as a push model and has a reputation for being difficult for many programmers to master. Most models that are considered easier to program, however, require random access to the document, and thus can lead to inefficiencies, so SAX has a reputation for being the most efficient standard way to process XML, if not the easiest.

Recommended introductions and tutorials

  • See the developerWorks tutorial "Understanding SAX" by Nicholas Chase (July 2003).
  • Sun offers a SAX tutorial for Java technology users.
  • My article "Taking Applications to the Next Level with XML, Part 3: The Toolbox of XML APIs" covers SAX and DOM (see below).
  • Perl programmers should see "Using Perl with XML (part 1)," which covers SAX.

References and other resources

  • XML.org's focus on SAX is a useful hub resource.


Back to top

DOM

Programming languages for XML

XML has been very popular with programmers from the very first. Here are some useful resources for programmers of various languages looking to process XML:

Java technology: IBM alphaworks XML page; The Apache XML page; Sun's community page of Java Technology and XML

C/C++: "C/C++ developers: Fill your XML toolbox" (developerWorks, September 2001)

Python: Special Interest Group for XML Processing in Python; "Python & XML" column on XML.com; "The State of the Python-XML Art, 2003"; "Uche Ogbuji's Akara site on XML processing in Python"

Perl: "Perl developers: Fill your XML toolbox" (developerWorks, June 2001); Perl-XML Project; "Perl & XML" column on XML.com; XMLperl.com

Other: PHP XML Classes; <rubyXML/>; XML and Scheme

Document Object Model (DOM) [W3C Recommendation] is an object model for XML documents that can be used for direct access to parts of an XML document. In DOM, the document is modeled as a tree, where each component of the XML syntax (such as an element or text content) is represented by a node. DOM is an API that allows you to navigate this tree, moving from parent to child node, to siblings, and more, taking advantage of special properties of certain types of nodes (for instance, elements can have attributes while text nodes have text data). DOM is designed to be language-neutral. The Object Management Group's (OMG) CORBA Interface Definition Language (IDL) [ISO International Standard, number 14750] is used to express DOM node and support interfaces.

DOM actually originated as an object model for standardizing scripting operations on HTML and XML objects in Web browsers. In some places this translates to awkwardness when it is used as a standalone programming API. DOM is evolving through several levels, each of which builds added capabilities on the prior one. Level 1 covered the basics, Level 2 added namespace support, a UI event model, iterators, and more. Level 3 adds APIs for loading to and saving from XML document files, integrating XPath, support for validation, and more.

DOM is generally much easier to master than SAX because it does not involve callbacks and sophisticated state management, but DOM implementations generally keep all XML nodes in memory, which can be very inefficient for larger documents. While many languages have DOM implementations, DOM tries to be language-neutral. Adherents of particular languages often complain that DOM is awkward and doesn't take advantage of any language's particular strengths. As a result, many language-specific tree APIs have flourished.

Recommended introductions and tutorials

  • See the developerWorks tutorial "Understanding DOM" by Nicholas Chase (July 2003).
  • W3Schools offers a tutorial that focuses on the use of DOM Level 1 for HTML and XML in browser JavaScript.
  • Perl programmers should see "Using Perl with XML (part 1)," which covers DOM.
  • Python programmers should also check out the DOM page on the standard Python Library Reference.

References and other resources

  • ZVON offers nice reference guides, complete with comprehensive JavaScript examples, for DOM Level 1 and DOM Level 2.


Back to top

XAPI

XML Database API (XAPI) [in development] is a vendor- and language-neutral (though object-oriented) API for XML databases. XML:DB is an interest group of developers of XML database management tools. XAPI covers storage, retrieval, modification, and querying of data in an XML database, with support for transaction management. It's similar to the likes of ODBC and JDBC. Like the DOM, XAPI is specified using OMG IDL and is organized by levels of capability. Level 0 is the base API and Level 1 adds XPath support (the XPathQueryService). XAPI is broadly implemented in native XML database management tools, especially open-source tools such as Apache XIndice and SleepyCat Berkeley XML DB. Despite this, there are few Web resources besides the XML:DB specification itself. The API Use Cases provide some sketchy examples of the API in the Java language.



Back to top

XUpdate

XUpdate [in development] defines update facilities for modifying data in XML documents. Even though it comes from the XML:DB group, XUpdate is designed to work on regular XML documents as well as XML in database collections and even virtual XML data models. XUpdate is an XML vocabulary similar to XSLT, although it is much simpler than XSLT and is a very accessible vocabulary overall. Like XSLT, it uses XPath for accessing the document to be modified, and has specialized elements that define output operations. XUpdate is also widely implemented, mostly among open-source tools such as XML DBMS and XML difference and patching tools. The XUpdate Use Cases draft also serves as an excellent introduction to XUpdate.

Recommended introductions and tutorials

  • Arun Gaikwad's "Introduction to Xindice" covers XUpdate towards the end (developerWorks, September 2002).
  • "Develop Python/XML with 4Suite, Part 4: Composition and updates" includes a section that introduces XUpdate (developerWorks, October 2002).
  • X-Hive's online XUpdate demo is a great way to learn the language through experimentation.


Back to top

XQuery

XQuery 1.0: An XML Query Language [in development] is a specification for querying XML data sources -- documents and databases. XQuery is pretty much a complete programming language, constituting a superset of XPath. XQuery is being developed in tandem with XPath 2.0, and is just as controversial because of its complexity, which many argue is unnecessary. The XQuery 1.0/XPath 2.0 system is defined in a daunting array of specifications that cover semantics, syntax, and the core function libraries:

  • XML Query Use Cases [in development] anchors XQuery by setting forth usage scenarios with XQuery examples.
  • XQuery 1.0 and XPath 2.0 Data Model [in development] defines precisely the information contained in the input to an XSLT 2.0 or XQuery processor, as well as all permissible values of expressions in XSLT 2.0, XQuery, and XPath 2.0.
  • XQuery 1.0 and XPath 2.0 Formal Semantics [in development] gives a precise formal meaning to each of the expressions of the XPath 2.0 and XQuery 1.0 specification in terms of their data model.
  • XPath 2.0 [in development] defines the core syntax of XPath 2.0.
  • XQuery 1.0 and XPath 2.0 Functions and Operators [in development] defines common processing tasks used in expressions.
  • XQuery 1.0 [in development] defines the core syntax of XQuery 1.0.
  • XML Syntax for XQuery 1.0 (XQueryX) [in development] provides an optional XML representation of XQuery.
  • XSLT 2.0 and XQuery 1.0 Serialization [in development] defines how data model values look in XML, HTML, and text, in effect replacing the XSLT section on processor output.
  • XSLT 2.0 [in development] is not directly part of the XQuery family but is tightly coupled with XPath 2.0 and XQuery 1.0, and is completely dependent on the former.

Recommended introductions and tutorials

  • "An introduction to XQuery," by Howard Katz, introduces XQuery and offers some examples, updated to the most recent working drafts (developerWorks, September 2003).
  • "Process XML using XML Query," by Nicholas Chase, teaches XQuery and looks at the changes in XPath 2.0. It covers somewhat older working drafts but the changes since this tutorial are minor enough that I still recommend it (developerWorks, September 2002).
  • Per Bothner wrote the article "What is XQuery?" as well as a recent update covering the latest drafts.

References and other resources

  • xquery.com is a good hub resource for XQuery, and includes a Wiki, a collaborative resource index and discussion page.


Back to top

SQL/XML

SQL/XML [ISO International Standard ISO/IEC 9075-14:2003] is a new section of the SQL standard covering a whole raft of XML-related extensions to SQL. SQL/XML was originally developed by the "SQLX Informal Group of Companies", which includes IBM, and then in committee at the American National Standards Institute (ANSI -- the standards organization in which SQL is maintained). The scope of SQL/XML encompasses (quoted from Andrew Eisenberg and Jim Melton):

  • Specifications for the representation of SQL data (specifically rows and tables of rows, as well as views and query results) in XML form, and vice versa.
  • Specifications associated with mapping SQL schemata to and from XML schemata. This may include performing the mapping between existing arbitrary XML and SQL schemata.
  • Specifications for the representation of SQL schemata in XML.
  • Specifications for the representation of SQL actions (insert, update, delete).
  • Specifications for messaging for XML when used with SQL.

SQL/XML has very little overlap with XQuery, and the involved parties in both standards generally work together.

References and other resources

  • "SQL/XML and the SQLX Informal Group of Companies [PDF]" by Andrew Eisenberg and Jim Melton outlines the SQL/XML effort.
  • "XML programming with SQL/XML and XQuery [PDF]" by J. E. Funderburk, S. Malaika, and B. Reinwald (IBM Systems Journal, Vol. 41, No. 4, 2002) provides a very thorough examination of the intersection of all these XML and DBMS technologies.
  • The SQL/XML draft is officially now available only by paying ISO (or the relevant affiliate for your country) for a copy, but if you wish to get a good sense of the standard, an earlier draft of SQL/XML [PDF]" is available.


Back to top

CSS

Cascading Style Sheets (CSS) [W3C Recommendation] is a system for applying presentation style to markup. It is best known for its use in styling HTML Web pages, but especially since the release of CSS Level 2, it is very well suited to presenting XML documents on the Web and on other media. Mapping XML documents to output structure is performed using the display property. The standard way to link an XML document to its CSS stylesheet document is defined in Associating Style Sheets with XML documents Version 1.0 [W3C Recommendation].

Recommended introductions and tutorials

  • "On Display: XML Web Pages with Mozilla" by Simon St. Laurent is an old article, but covers the basics well using examples on the Mozilla browser (and comparisons to MSIE 5).
  • ZVON's "CSS 2 Tutorial" teaches how to use CSS 2 to display XML documents.
  • Dr. David Mertz's developerWorks tip, "Using CSS2 to display XML documents," is a brief introduction with a detailed example (December 2001).


Back to top

XForms

XForms 1.0 [W3C Recommendation], not to be confused with the XWindows GUI library of the same name, is a specification of Web forms for XML data processing that can be used with a wide variety of platforms through a variety of media. XForms looks to separate a form's purpose from its presentation. It separates considerations of what the form does from how the form looks. It is an XML vocabulary that can be used to develop form UIs for manipulating XML content. XForms started out as part of the XHTML family, but has taken on a life of its own. It is more complex than it needs to be, but a sound enough technology to help bring order to the chaotic world of Web forms.

Recommended introductions and tutorials

  • "What Are XForms?," by Micah Dubinko, gives a general overview of the technology.
  • "Get ready for XForms," by Joel Rivera and Len Taing, introduces XForms using several very detailed examples (developerWorks, September 2002).
  • "Understanding XForms," by Nicholas Chase, drills even more deeply into a series of examples (developerWorks, December 2002).


Back to top

SOAP

SOAP [W3C Recommendation] (which officially is no longer an acronym despite the capitalization) is a protocol for using XML to communicate between systems that are connected using lower-level Internet protocols. Some users consider SOAP to be the foundation of XML Web services, a set of technologies for managing and organizing the interaction of systems connected using XML data formats and Internet communications protocols. SOAP was originally developed among a small, odd assortment of individuals from a diverse mix of companies, including IBM. It quickly grew in popularity because it provided similar capabilities to earlier efforts towards XML messaging, but with a more solid architecture and more commercial support. Development of SOAP passed to the W3C, which developed SOAP 1.2, having made a lot of architectural improvements but having also made a lot of controversial compromises. The SOAP protocol defines an XML envelope format which can contain a pseudo-XML payload (the fact that the actual payload of a SOAP message is restricted from using the full capabilities of XML is a matter of huge contention).

Web services don't have to use SOAP, and a large group of people advocate the idea of simply exchanging raw XML documents directly over HTTP, an approach loosely advocated under the banner of "REpresentational State Transfer (REST)". REST itself is the name given to the architectural style of the Web by one of its architects, Roy Fielding. Advocates of REST style for Web services complain that SOAP is complex, stunts its XML payload, and doesn't take enough advantage of the fundamental strengths of the Web. Among SOAP advocates, recent emphasis has shifted from SOAP's RPC roots to what is called the document-literal style of SOAP. In the RPC style, the data to be transmitted is marshaled into discrete data types in a special XML payload format (called the SOAP encoding). In the document-literal style, the XML payload consists of more natural XML formats that generally tend to be more descriptive and human-readable.

The SOAP edifice

A huge array of standards build on SOAP -- many more than I can cover in this article. Some good sources of information on these standards are:

  • IBM developerWorks' Web services standards listing
  • W3C Web Services Activity home page
  • webservices.xml.com

One antecedent of SOAP that is still in fairly wide use is XML Remote Procedure Calls (XML-RPC) [Community specification]. XML-RPC defines procedure calls encoded in XML and communicated over HTTP. It retains some popularity because of its simplicity (the full specification is less than 10 printed pages), and the fact that most languages and many application frameworks now have standard or readily available XML-RPC implementations. It does have some very notable weaknesses, including very primitive data typing and lack of support for character encodings (an astonishing flaw given its use of XML).

Recommended introductions and tutorials

  • The W3C has an official primer on SOAP, which I recommend because of its focus on the XML transport format.
  • Perl programmers can look at Paul Kulchenko's "Quick Start with SOAP," which is an older article, but because it focuses on the developer's API rather than the actual transport format, is still mostly applicable. I do recommend becoming additionally familiar with SOAP's wire format.
  • Python programmers can check out The Python Web services developer column on IBM developerWorks.
  • I recommend following up on document-literal style SOAP. See "Reap the benefits of document style Web services" by James McCarthy (developerWorks, June 2002).
  • For a good introduction to the idea and motivation of REST, see Paul Prescod's "Second Generation Web Services" and "REST and the Real World".
  • Perl users who are interested in XML-RPC should start with "Using XML-RPC for Web services: Getting started with XML-RPC in Perl" and the follow-up, "XML-RPC Middleware," by Joe Johnston (developerWorks, March 2001).
  • Python users who are interested in XML-RPC should start with "XML-RPC for Python," by Mike Olson and Uche Ogbuji (developerWorks, September 2002).
  • Eric Kidd's "XML-RPC HOWTO" discusses how to use the protocol in the Java language, C, C++, Perl, Ruby, and .NET.


Back to top

WSDL

According to the official definition, Web Services Description Language (WSDL) Version 1.2 [in development] is "an XML format for describing network services as a set of endpoints operating on messages containing either document-oriented or procedure-oriented information." It defines, at a series of levels of abstraction, the components of the end-to-end communications in a Web service. WSDL originated as a joint project between IBM and Microsoft, but has since moved to the W3C for the development of WSDL 1.2. WSDL is generally placed alongside SOAP as a core Web services technology, but it can be used to describe other protocols besides SOAP.

Recommended introductions and tutorials

  • An article on IBM developerWorks that covers an older version of WSDL is "Deploying Web services with WSDL" by Bilal Siddiqui (November 2001).
  • Find a wide range of additional Web services resources, including WSDL content, on the developerWorks SOA and Web services zone.


Back to top

More to come

In this article I have surveyed the most important XML standards relating to application development. In the next I shall survey the most important general XML vocabularies.



Resources

  • Read the first installment of this series on XML standards, which focuses on what Uche Ogbuji considers to be the core XML technologies (developerWorks, January 2004). In Part 3 of this series on XML standards, the author looks at the most important XML vocabularies. (developerWorks, February 2004). Part 4 is a detailed cross-reference of all the standards covered in this series on XML standards. (developerWorks, March 2004).

  • Find a brief listing of some XML standards on IBM developerWorks.

  • Read The XML Bible, 2nd Edition, by Elliotte Rusty Harold (John Wiley & Sons, 2001), if you need to gain as solid a foundation in XML as possible, but are only willing to buy one book.

  • Visit Web sites of the most significant organizations where XML standards are developed:
    • W3C (World Wide Web Consortium)
    • OASIS (Organization for the Advancement of Structured Information Standards)
    • The ISO (International Organization for Standards), especially through the project ISO/IEC 19757 - Document Schema Definition Languages (DSDL)

  • Simon St. Laurent's Outsider's Guide to the W3C is a FAQ that clarifies many aspects of the organization that brought you HTML and XML.

  • Look up nearly any aspect of XML technology in Robin Cover's The Cover Pages, an XML resource guide of staggering comprehensiveness.

  • Visit the xmlhack news site for XML developers, which Uche Ogbuji helps to edit.

  • Find more XML resources on the developerWorks XML zone, including Uche Ogbuji's Thinking XML column.

  • IBM's DB2 database provides not only relational database storage, but also XML-related tools such as the DB2 XML Extender which provides a bridge between XML and relational systems. Visit the Information Management section of developerWorks to learn more about DB2.

  • Find out how you can become an IBM Certified Developer in an IBM Certified Developer in XML 1.1 and related technologies.


About the author

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge management applications. Mr. Ogbuji is also a lead developer of the Versa RDF query language. He is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact Mr. Ogbuji at uche@ogbuji.net.