DBMS - April 1997 - Server Side

来源:百度文库 编辑:神马文学网 时间:2024/04/20 20:20:23
DBMS, April 1997The Return of LOBs
DBMS Vendors are Finally Giving Large Objects the Respect They Deserve.
Binary large objects (BLOBs) have been available in some DBMSs for a long while, but it is only now, with the influences of the object-oriented and Web worlds, that these datatypes are coming into their own. In some DBMSs, large objects are also called LOBs, where they drop the "binary" connotation to cater to large character data sets -- for example, word-processing documents and hyperlinked documents, such as Web pages.
For the remainder of this column, I will use the LOB acronym, except where a LOB is explicitly called a BLOB in the context of a particular product.
Borland International Inc.‘s InterBase, initially developed and marketed by InterBase Inc., already had BLOBs as a standard datatype in its first release in 1986. At that stage we -- as young technocrats -- were looking for problems where we could apply this interesting solution. One of our InterBase clients at the time, the Electronic Engineering faculty of the University of Stellenbosch, was doing some interesting work on voice recognition. The group‘s researchers analyzed voice samples and broke the samples into words, syllables, phonetic sounds, and symbols, which they then stored in an InterBase database. They stored the voice samples and the frequency graphs in operating system files, which they tracked from the InterBase database.
No matter how much we tried to persuade them, they didn‘t want to store the voice samples and the frequency graphs as BLOBs in the database, because they were uncomfortable changing to this new approach. Raw data such as theirs had always been stored in operating system files, so, they reasoned, why change it? This kind of thinking has led to BLOBs‘ underutilization and lack of popularity.
Only with the relatively recent interest in object-oriented databases and object-oriented extensions to relational DBMS products are LOBs gaining the interest and acceptance that they should have received ages ago. The current hype around the Web is also furthering this interest. Most Web pages have graphical objects such as bitmaps and icons, binary objects such as sound and video, and unstructured textual data such as hypertext documents, all of which must be stored in the database. As a result, all of the major players on the DBMS market are following InterBase‘s early lead and including LOB support in their products.
Many DBMSs have datatypes that you could call MOBs: medium-sized objects. These are datatypes in which you can store a "smallish" document such as a 2,000-character memo or a smallish binary object such as a 2KB bitmap. In this month‘s column, however, I focus only on real LOBs: objects that at this stage seem almost limitless in size. Note, however, that most of the DBMS products currently limit the LOBs they support to 2GB in size.
In the remainder of this column, I investigate how the major players in the DBMS world (namely Borland InterBase, CA-OpenIngres, IBM DB2, Informix, Microsoft SQL Server, and Sybase SQL Server) address LOBs in their latest product offerings.
Defining LOBs
In each of the major DBMSs, LOBs are defined in a different way. Some DBMSs use extentions to the existing "standard" datatypes to define LOBs. For example, in CA-OpenIngres 1.2 you will find LONG BYTE and LONG VARCHAR datatypes. You can store noncharacter data of up to 2GB in size per data item in a LONG BYTE column, and you can store character strings of up to 2GB in length in a LONG VARCHAR column. Oracle 7 release 7.3 supports the LONG and LONG RAW datatypes, where the LONG datatype can be used to store variable-length character data containing up to 2GB of information and the LONG RAW datatype is intended for binary data or byte strings, up to 2GB per row.
Some of the other systems follow more appropriate naming conventions for their LOB datatypes. Borland‘s InterBase 4.0 supports LOBs through its BLOB datatype. You can specify the subtype of an InterBase BLOB as 0 for binary data, 1 for ASCII data, and a negative number for a user-defined subtype. In IBM‘s DB2 release 2.1.1 you will find BLOB, CLOB, and DBCLOB datatypes to store data items larger than 4KB in size. DB2‘s Binary Large Object (BLOB) datatype is a byte string used to store binary data such as photographs, voice, and video. A Character Large Object (CLOB) datatype is a string in which the sequence of characters can be either single- or multibyte characters or a combination of both. An example of a CLOB is a word-processing document, such as a resume.
Other variations on this theme are Informix release 7.12, with its BYTE and TEXT datatypes to support large binary data items and large character streams, and Microsoft SQL Server 6.5 and Sybase SQL Server 11 with their IMAGE and TEXT datatypes.
At closer inspection, the reason for the differences between the various LOB datatypes becomes obvious. The LOB datatypes are not defined in the ANSI SQL-92 standard; therefore the various DBMS vendors have had a free reign in naming their LOB datatypes. I expect this will also be standardized in future, just like the standard datatypes. However, just like the current situation with the standard datatypes, each of the vendors will probably keep its own proprietary extensions to the basic LOB datatypes.
Accessing LOBs
Although accessing a data item in a LOB column may seem trivial through an interactive interface, the same is definitely not true from an application developed in a 3GL programming language, or even from an application developed in a 4GL application development environment. With each of the major DBMS vendors‘ LOB offerings, you can still easily access data columns smaller than 2KB in size through standard SQL functions. However, for anything larger than 2KB, you must code so-called "data handlers" to select data from or to insert, update, and delete the data in a column with a LOB datatype. A data handler is a routine that you use to read and write the LOB data from or into a particular column. The data handler must convert the data from the representation used in the application program to the representation used in the database. (See Figure 1.)
For example, in CA-OpenIngres you must use a data handler function to transform the data between the application program and the database, and vice versa. The data handler can then be specified in a SQL query wherever an expression or host language variable is acceptable. In the data handler function, you can use the PUT DATA and GET DATA embedded SQL statements and the ENDDATA SQL function to transmit or receive the components of the LOB data item, one segment at a time.
In Microsoft SQL Server 6.5, LOBs can be manipulated using a set of built-in Transact-SQL functions. These functions act like a SQL API to the LOBs. The functions include TEXTPTR to obtain a valid text pointer, READTEXT to read a segment of the object, and WRITETEXT and UPDATETEXT, to replace the whole object or change parts of the object, respectively. Alternatively, programmers can access LOBs using the relevant calls in the DB-LIBRARY API. The DB-LIBRARY API implements the same functionality that is available in Transact-SQL.
This problem is in part a result of the so-called "impedance mismatch" between the relational data model and the database support found in host programming languages such as C and C++. The relational DBMSs process sets of rows in single operations, giving sets of rows as the result. However, the host programming languages can only process single records at a time; they do not cater to the relational constructs such as a set of rows in a query result, nor do they cater to the nonstandard datatypes such as LOBs.
Programmers can also use ODBC to manipulate LOBs by using the SQLGetData and SQLPutData ODBC API calls in conjunction with SQLFetch ODBC operations. The functionality of these API calls is similar to the native DB-LIBRARY and Transact-SQL interfaces of Microsoft SQL Server and Sybase SQL Server. Using this kind of API can hide all of the respective DBMSs‘ implementation differences from the programmers.
Limitations
Now, before you get all excited and rush off to code your own data handlers in C, take note that each of the LOB implementations is subject to a number of limitations: A LOB datatype cannot be indexed in CA-OpenIngres, Informix, Microsoft SQL Server, Oracle, or Sybase SQL Server. In fact, in most of these systems, a LOB datatype cannot be part of the table‘s key, either. You cannot order LOB data in CA-OpenIngres, Microsoft SQL Server, Oracle, or Sybase SQL Server. For example, you cannot use a column of a LOB datatype in an order by or a group by clause of a SQL select statement. In many systems, you cannot even use a LOB column in a SQL where clause. Some systems, however, such as Microsoft SQL Server and Sybase SQL Server, let you apply the like operator to a character LOB in a where clause. In most systems, you cannot apply the standard datatype conversion functions or the standard character string manipulation functions to data items of the LOB datatypes. In fact, in most systems, data columns of the LOB datatypes cannot even be compared to each other, not to mention to columns of the standard datatypes. In some of the major DBMSs, such as Oracle, you can only have one column of a LOB datatype per table.
Use It If You Dare. . .
As I have illustrated, all of the major players in the DBMS market support LOBs in some form or another. These implementations are still far from ideal, however. In all of them, you must use some data handler or lower-level routines to "transport" the LOB data between the application and the database. The host languages do not cater to relational database concepts, including null values, missing values, or new datatypes such as LOBs. Therefore, although a DBMS can store LOB data, you may still have trouble getting the LOB data from the application program to the database and back again.
But what are your options? You can use a standard API such as ODBC to hide the LOB‘s implementation and access details from you, but then you are at the mercy of the API‘s developers. You can only implement the type of access to your LOB data that their interface allows you to implement. Alternatively, you should investigate the object-oriented databases, where a LOB is implemented as just another object class, with the access details hidden in the object class‘s methods.
The other area of concern is the internal representation of the BLOB datatypes on different platforms. In a client/server environment with different platforms, you must pay careful consideration to the format and internal representation of the BLOB data to ensure that it is usable on all platforms. You must avoid platform-specific conversions and representations, which can render the BLOB data useless or unreadable on other platforms.
Martin Rennhackkamp is the owner and principal consultant of The Data Base Approach, a corporation specializing in relational and distributed databases, based in Cape Town, South Africa. You can reach Martin via the Internet at mr@dba.co.za._xyz