XML: Building with Oracle XML Database

来源:百度文库 编辑:神马文学网 时间:2024/04/30 10:00:58
Building with Oracle XML Database
By Magnús Kristjánsson
Exploiting XML DB capabilities, Theriak Software built a production-grade survey application.
As a software development shop involved with XML technologies for several years, Theriak Software (formerly eMR) has witnessed firsthand the many recent improvements Oracle has made in these technologies. This article explores some of the techniques we used to successfully build a production-grade, generic survey system we codeveloped with deCODE Genetics, Inc., using the XML DB capability in Oracle9i Release 2.
Based in Reykjavik, Iceland, deCODE Genetics develops new drugs and DNA-based diagnostic protocols based on human genetics research tied to population data, much of which is gathered via survey. Study participants fill out extensive questionnaires that vary from research project to research project; at any given time, deCODE is conducting dozens of research projects, each of which may involve thousands of participants, numerous surveys, and hundreds of questions. To automate the survey process, we developed a Web-based survey system, trademarked as Questor by deCODE.
The system allows researchers to easily author questions and compose new study-specific questionnaires from a repository of pretested questions, gathering results over the Web. Likewise, study participants complete the surveys by using a Web browser, and the system securely stores all answers in the database.
The survey system is now in its second major release, having been redesigned to take advantage of some of the new or improved XML DB features of Oracle9i Release 2, such as support for XML Schema, the XML DB Repository feature, XPath, and XDBUriType.
Reimplementing Questor to use these and other native XML capabilities has led to both increased simplicity (fewer tables, less code) and significantly improved performance—the system runs twice as fast as the initial release.
XML Schema and XML DB
One of the major changes we made in going from Questor release 1.0 to release 2.0 was to implement the native XMLType and to use XML Schema to define the documents that would be important to any survey system—questions, surveys, and responses.
XML Schema describes the structure of a set of XML documents. It is also used to validate that a given XML document conforms to the structure it defines. It also facilitates automatic definition of O-R (object-relational) database structures for storing XML data.
For XML DB to use an XML schema, it must first be registered with (loaded into) the database. The registerSchema() function, in the DBMS_XMLSCHEMA PL/SQL package, accomplishes this task. It parses and validates the XML schema, creates entries in the Oracle data dictionary that describe the XML schema, creates SQL object definitions (based on complexTypes defined in the XML schema), and creates an XMLType table for each global element defined in the XML schema.
Oracle XML DB supports XML Schema‘s annotation mechanism, which enables adding vendor-specific information to XML Schema. Oracle uses the annotation mechanism to give developers control over the mapping between an XML schema and the SQL object model, naming, and storage.
For example,Listing 1 shows one of the XML Schema definition files used in Questor for survey data. It shows schema annotation being used to specify a default table (xdb:defaultTable="HLQ_SURVEY_XML_DTB"). Register-ing the XML schema shown inListing 1 creates a default table (HLQ_SURVEY_XML_DTB) in the Questor database schema.
To use annotations, you must declare Oracle‘s XDB namespace in the XML schema file, as in
xmlns:xdb=http://xmlns.oracle.com/xdb
 
Listing 1 also shows an annotation (xdb:SQLType="CLOB") being used to specify the storage option.
XML DB supports both structured XML O-R storage and CLOB storage. Traditional CLOB storage provides document fidelity, whereas object-relational storage maintains DOM (document object model) fidelity by decomposing the XML into underlying O-R structures. You can use both types of storage in your application.
O-R storage facilitates query-rewrite by the Oracle query optimizer, so that XPath queries can directly access the underlying object-relational columns. XPath queries can be evaluated without construction of the full XML document in memory, a feature that can result in better query performance.
The Questor system was designed to optimize presentation, so question documents, for example, contained everything needed to render a question in the Web browser. As a result, we chose CLOB storage for the XML content, and we maintained several relational tables from the previous versions of the system.
XML DB Repository Functionality
Another unique feature of XML DB we used in Questor 2.0 is the XML DB Repository. This component provides path-based access, using open protocols, to resources (XML or other types of files) stored and maintained in a folder hierarchy, similar to a file system. As with traditional file systems, access to resources can be restricted by means of Access Control Lists (ACLs), and resources can be subject to versioning.
XML DB Repository can store any kind of data, not just XML files. Anything that can fit in a BLOB, such as a PDF file, a JPG file, or an Excel spreadsheet, can be foldered and stored in XML DB Repository. Oracle XML DB has an embedded protocol handler that enables content in the XML DB Repository to be directly accessible via WebDAV, HTTP, or FTP.
Programming with XML DB Repository
The DBMS_XDB built-in package provides a wide range of functions for managing Repository resources and ACLs on resources, as well as for managing the XML DB configuration. Together with RESOURCE_VIEW and PATH_VIEW, these tools give developers query-based access to Oracle XML DB.
We used the XML DB Repository to store all Questor XML documents as foldered resources.Listing 2 shows how to create a schema owner and set up the necessary Repository folders. The Repository folder hierarchy reflects that Questor documents are of different types.
Given a schema owner named HLQ2, the code inListing 2 creates two document folders: /home/HLQ2/questions and /home/HLQ2/surveys.
Optionally, we can limit access to a given folder and all its subfolders, by specifying an ACL, as follows:
EXEC DBMS_XDB.SETACL(‘/home/&1‘, ‘/sys/acls/all_owner_acl.xml‘);
 
InListing 3, we then specify that user HLQ2 owns the resources in the folders, by changing the appropriate resource metadata in the RESOURCE_VIEW, using the existsNode() function and XPath to locate the resources. Note the use of the UNDER_PATH operator, which returns the paths under a given path.
At this point, we have folders within which to store documents and can use FTP or HTTP-based protocols, the XDBUriType, the DBMS_XDB package, or the RESOURCE_VIEW to store, retrieve, or manipulate the XML data. In our implementation, we created our own Questor XML API, using a selection of the above techniques, as the following sections outline.
Questor Architecture
Let‘s look at the Questor architecture in more detail. Questions, surveys, and responses in Questor are stored as foldered resources in the XML DB Repository. All XML instance data is stored in default XML tables (as CLOBs) owned by the Questor database schema, as specified in the respective XML schemas.
In addition to the CLOBs used for XML documents, all key data needed for document lookup—such as survey, question, and language code—is extracted from the documents as they get checked for integrity and validation.
We chose this approach partly because we wanted to keep some of the relational data, such as participant account and subscription information, from the original system. We also took this approach because our main goal was to optimize the XML documents for presentation purposes.
This key lookup data is stored and maintained in relational tables, with links between XML Repository and XML instance data. We simply link the relational information to survey contents and participant answers in XML format.
Whether access is via FTP, HTTP, or the PL/SQL-based Questor XML API, an XML document is stored in a folder whenever it is sent to the Repository.
When a document is sent to the Repository, the Oracle XML DB server does the following six things:
1. Calls dbms_xdb.createResource, using the XML instance document and the requested Repository path as parameters
2. Inspects the instance document and identifies its XML schema reference
3. Locates the registered XML schema
4. Determines into which default XML table (as annotated in the XML schema) the document should be inserted
5. Executes the insert trigger on the default XML table in the Questor database schema
The insert trigger, shown inListing 4, calls routines that perform various validation and integrity checks on the incoming XML document, raising exceptions if the document has errors and thus causing the insert operation to abort. If there are no errors, the trigger code extracts key values from the XML document that can be used for subsequent document lookup and stores them in the Link Table.
The Link Table is a relational table that links the XML data to the relational data model. As shown inListing 4, the XDBUriType() constructor generates the key that gets stored in the Link Table for subsequent retrieval operations by the Questor XML API.
Calling these check routines in a default table trigger ensures the quality of the XML documents, regardless of access method (FTP, HTTP, or PL/SQL).
6. Inserts a row in XML DB‘s XDB$RESOURCE table with a reference to the XML instance document in the default XML table (in the Questor database schema).
Native XML and the Questor XML API
Let‘s take a closer look at how the Questor API goes about storing, retrieving, and updating XML Repository documents.
We developed a custom API that consists of some PL/SQL packages in the Questor database schema. These packages, in turn, are wrapped as Java methods, running as servlets in the middle tier.
We first created an object type (hlq_dockey_ty) that can be used to wrap all document keys. As shown inListing 5, our document key (dockey) object type comprises a URI, a rowid, and a REF, as well as several member functions for manipulating the instances of this object type.
Next, we constructed a document key object of type hlq_dockey_ty, as shown inListing 6.
Once these objects and functions were available, we combined them into a single function to insert or update an XML document. As shown inListing 7, if the document already exists, the function will update the document; if it doesn‘t exist, the function will create the XML document.
The code for retrieving an XML document via a document key is straightforward:
function hlq_get_survey_xml(p_survey_code varchar2,p_language_code varchar2)return clobisbeginreturn hlq_get_survey_dockey(p_survey_code,p_language_code).get_clob();end;
 
Indexing and Searching
Oracle XML DB provides several alternatives for indexing XML, including function-based indexes using XPath expressions, and traditional Oracle Text indexes. We used a combination of these techniques for the Questor system, as follows:
Function-based index. Creating a function-based index by using XPath is straightforward. To get quick access to the number of questions each participant has answered at any point, we created an index on the survey participants‘ answers, as shown inListing 8, storing the answer in the element /sml/survey/ progress, under the answeredResponses attribute of hlq_userdata_xml_dtb. Then we created a function-based index on that column, as follows:
create index hlq_userdata_xml_dtb_answeredon hlq_userdata_xml_dtb(sys_nc_rowinfo$.extract(‘/sml/survey/progress/@answeredResponses‘,hlq_sml.hlq_get_sml_ns).getNumberVal());
 
Be sure to enable query rewrite, so that the index will be used, as follows:
alter sessionset query_rewrite_enabled=‘TRUE‘;alter sessionset query_rewrite_integrity=‘TRUSTED‘;
 
If you don‘t enable query rewrite, the index won‘t be used—even if the WHERE clause uses exactly the same extract() expression as the index. Traditional Oracle Text index. The Oracle Text index has been extended to support XMLType columns, and the CONTAINS clause also supports XPath syntax in the INPATH and HASPATH operators.
For either the INPATH or the HASPATH operator to work, you must create the index with PATH_SECTION_GROUP, which is used by default.
You can create all the basic Oracle Text indexes on an XML Type column, as well as on the sys_nc_rowinfo$ (also called object_value) pseudocolumn of a default XML table. Instead of specifying the sys_nc_rowinfo$ pseudocolumn in the query, we could use an alias,—say, T—for the default XML table and write value(t).
Here‘s an example of creating an Oracle Text index of type ctxsys.context on the question XML default table:
create indexhlq_question_xml_ctx on hlq_question_xml_dtb(sys_nc_rowinfo$)indextype is ctxsys.context;
 
After creating the index, you can issue traditional queries with the CONTAINS clause, but with a new twist—using the INPATH or HASPATH operator. For example, in the query inListing 9, we use INPATH to determine the number of questions containing fragments including the words smoke and pipe.
The result inListing 9 shows that only 2 questions are about smoking a pipe.
 
DOWNLOAD Oracle Database 10g
VISIT theXML technology center
LEARN more aboutXML DB
READ more about XML
XML: To CLOB or Object?
SQL in, XML Out
From XML to Storage and Back
Similarly, we can use HASPATH to search for existence in XPath. InListing 10, we want to ascertain the total number of multiple-choice ("MC") questions available in the question table.
Apparently, 6,290 questions are multiple-choice, as shown inListing 10. The use of // in the XPath expression causes the index to scan all elements, regardless of their location in the XML.
Note that the domain index is used, as shown in the execution plans in Listings9 and10, and both Elapsed Time and Statistics indicate that it‘s very efficient.
Conclusion
The most important advice we can give developers considering building systems with XML DB is to make sure to start with a well-designed XML schema. Spend at least as much time designing and refining the XML schemas as you would creating a data model for SQL-based applications. One development team design goal should be to design the XML schema—and thus the structure and content of the XML instance documents—to achieve a natural data granularity and grouping that best fits the application‘s main access and usage patterns.
Overall, despite various ramp-up issues, we are pleased with XML DB, because it has enabled us to deliver a working application with less complexity, more flexibility, and better performance than was previously possible with traditional relational techniques. In short, Oracle XML DB enabled us to use XML effectively as the core technology for designing and developing Questor.
Magnús Kristjánsson (magnus@theriak.is) is an informatics architect with Theriak Software in Iceland and a key developer of Questor. He is an active participant in the Oracle Database 9i Release 2 and Oracle 10g beta programs, and his primary focus has been on the native XML support in the Oracle database (XML DB). Kristjánsson was previously a consultant and solutions architect for Oracle in Oslo, Norway, where he worked on several projects involving XML and other core database technologies.