A nova saga dos bancos de dados relacionais

Como sobreviver aos (conviver com) objetos

04 de maio de 1997

The elegance of the relational database model has survived for twenty years. The RDBMS has been so successful that five of the top-ten independent software vendors (ISVs) sell relational databases—Microsoft, Computer Associates, Oracle, Sybase, and Informix—not to mention IBM's DB2. But the RDBMS is under considerable technology pressure due to the limited and simple types of data understood by the RDBMS, notably integers, scientific floating-point, character strings, date/time, and money.

The technology pressure is coming from innumerable real-world applications that demand more information from the data. When business managers ask the simple question, "what are the 13-week average sales for our top-five profitable products?" they do not expect that a highly-trained programmer must churn out and test several pages of SQL code—and that the query will not work next week without rework. This happens because the RDBMS does not understand time series, moving averages, or ranking, so the programmer must force-feed the RDBMS with a program embodying these "complex data types".

A reborn RDBMS is emerging, called a Universal Server, that allows IS organizations, RDBMS suppliers and ISVs to extend the RDBMS with complex data, functions, and access techniques. These new features give the Universal Server RDBMS far great-er extensibility and flexibility, higher complex-data scalability, and better fit with the new technologies such as Intranets, relational OLAP (Online Analytical Processing), and new development toolsets.

- como o cliente vai saber que a ciberloja é de confiança?

- como fazer um pagamento seguro (evitando que alguém pague usando suas informações ou o seu pagamento ser desviado para um local não desejado)

- como o vendedor vai verificar se o cliente é de confiança?

OODBMSs Are Not Enough In the early 1990s, object-oriented DBMSs (OODBMSs) seemed set to provide an early answer to IS' needs. Designed to support object-oriented programming, OODBMSs allow developers to handle high-level and complex data types, especially CAD/CAM graphics and text databases. However, OODBMSs were created "de novo", and thus lacked the RDBMS features that IS has come to love: superb query capabilities with the Structured Query Language (SQL); excellent OLTP performance; and a huge ISV industry providing complementary utilities and application solutions. The OODBMSs needed to implement the full array of RDBMS features such as multithreading and SMP support, distributed-database features, and open gateways from scratch in order to match RDBMSs' scalability, flexibility, and robustness—in effect, chasing a moving target from far in the rear.

Aberdeen believes that a real-world answer to IS' needs is now arriving: complex-datatype support added to existing, installed RDBMSs to create a Universal Server. Instead of forcing suppliers and users to reinvent the RDBMS, the Universal Server acts as a series of integrated and compatible extensions to the RDBMS. The Universal Server promises to carry the RDBMS into the future by meeting its most significant upcoming challenge—extensibility—and to ensure that enterprises' RDBMS investments can continue to be leveraged—in effect, an "RDBMS investment protection plan".

What A Universal Server Adds What does a Universal Server add to an enterprise's RDBMS? Practically speaking, Universal Servers deliver the following new or upgraded capabilities: More support for complex data types—via specific operations (e.g., search a video archive for a visual pattern) and storage of new types of data (e.g., multidimensional, text, multimedia, and spatial). Many applications can benefit from the simplicity created by having the data in the right form for the application — and for the application user. For example, a text-search capability applied to a comments field can extract useful repeated information that defies today's RDBMS query capabilities; More support for complex operations on simple and complex data types, because support for more complex functions is built into the extensions. For example, users of decision support systems can greatly benefit from including statistics and mathematics libraries in the core RDBMS, since they do not have to "reinvent the wheel" to implement complex data analysis; Greater efficiency in high-level data access and computation. Tuning the Universal Server's query optimizer, for example, for particular types of complex data can yield major improvements in querying speed on those data types. Likewise, complex computations such as pattern matching and economic-order-quantity functions scale better, because developers do not have to "reinvent the wheel" in optimizing access to complex data—the Universal Server can do it for them. Better fit with today's development tools, development processes, and GUIs. While today's development tools and processes have rapidly increased productivity by operating at a high level on objects, components, and templates (and layering higher-level constructs on top of base components), developers must still typically program RDBMS access at the simple-data level. Likewise, data-displaying GUIs based on object-oriented technology must link to crude relational data items. Universal Server programming interfaces operate on the same high level as today's advanced development toolsets, leading to potentially significant improvements in programmer productivity for large-scale data-intensive applications.

Better fit with Internet/Intranet architectures. As enterprises focus on scaling their Internet and Intranet architectures and then connecting them to backend databases, they face challenges in merging text and multimedia-heavy Web pages with simple-data RDBMSs. Universal Servers will allow 'Net implementers to "have their cake and eat it too" —combine complex-datatype-rich Web content with highly scalable RDBMS technology. Effective ROLAP support. Complex data- type requirements translate into support for more complex queries as data-miners drive ever deeper into ever-larger data-warehouse databases. Today's Relational OnLine Analytic Processing (ROLAP) and RDBMS suppliers' bit-mapped indexing, star schemas, and aggregation support can deliver order-of-magnitude improvements in complex-query speed, but further improvements require that multidimensionality, aggregation, and time-series support be driven farther into the RDBMS's core, and especially into the query optimizer. Thus, Universal Server support for multidimensional and time-series complex data types allows both data-warehouse designers and querying-application developers to take advantage of new complex-query speedups. Moreover, it incorporates multidimensionality in the core RDBMS, with significant performance advantages over approaches using separate OLAP engines.

Most important of all, the Universal Server adds extensibility to the RDBMS. The Universal Server's open support for user-defined data types means that IS has far greater flexibility to adapt to new user demands and technologies requiring new data types down the road. Moreover, the new importance of complex-datatype support plus the new extensibility tools together constitute a golden opportunity for RDBMS suppliers, VARs, and IS to participate in a new market delivering customized and vertical-industry-specific complex-datatype support modules.

Business managers asking the simple question, "what are the 13-week average sales for our top-five profitable products?", do not understand the amazing feats a SQL-skilled RDBMS programmer must perform to implement a business-simple 13-word question. The programmer must first calculate the profitability of products, then rank by profitability, and finally calculate the 13-week average sales. The query is useless next week because the 13-week average changes every week. But Universal Server ROLAP extensions that define functions for ranking, profitability, and time series would make the programmer's job much easier and the much shorter program more likely to be error-free. In short, the immediate business benefits of Universal Servers are about making it simpler for programmers, end users with desktop query tools, and ISVs to express their data needs in terms much closer to business reality. This will foster greater programmer productivity, faster "data knowledge" activities by end users, and much more sophisticated tools from ISVs.

However, Aberdeen believes that the long-term benefits of Universal Server technology are even more significant, and will accrue to numerous commercial applications in virtually all industries. For example: Bill-of-materials explosion or economic-order-quantity calculation, so difficult with today's RDBMSs, will become relatively straightforward, and allow much more effective just-in-time resource planning. Enterprises can query their videotape records and onsite-camera video feeds for particular patterns. For example, video cameras monitoring an assembly line can feed video data into a Universal Server database that can detect anomalies such as defects and trigger corrective action, thus improving product quality at lower cost.

Oracle 7.3. Oracle 7.3 has folded the Oracle Video Server Option, Oracle ConText Option, and Oracle Spatial Data Option into Oracle7. The Video Server database is separate, while the ConText text database merged with Oracle7's simple data is scheduled to ship in a few weeks. ConText is a strong text extension to Oracle 7.3. Oracle's Developer/2000 development toolset provides a server-side multimedia-datatype toolkit. However, these complex data type extensions are still distinct database servers, not fully integrated with Oracle7 and not highly extensible. For more extensive integration and user-driven extensibility, Oracle customers must wait for Oracle's "object" release, tentatively called Oracle Universal Database, which is presently scheduled for sometime next year. In the relational OLAP area, IRI's OLAP database functionality is not yet integrated with Oracle7, although Oracle should announce significant IRI-based extensions to its products in summer 1996. IBM DB2. IBM's DB2 Common Server (for OS/2 and Unix platforms) provides functions to access parts of a data type, and the ability to insert a data type too large for main memory into the database from a client file or CD-ROM. DB2 also includes bundles of triggers, user-defined data types, and user-defined functions for particular data types called Relational Database Extenders (e.g., a text server, imaging server, audio server, and video server). For example, Extenders will support fingerprint analysis and querying by SQL of image content—color, shape, or pattern. The text Extender may be particularly valuable to users in the long term, because it includes information-retrieval technology. IBM has not yet driven this complex-datatype support deep into the DB2 architecture or provided a sophisticated client-server development toolset for creating user-defined data types. Extenders are not yet included in DB2 Parallel Edition or DB2/MVS. Computer Associates. CA's two-database strategy includes Jasmine—an OODBMS with a multimedia- and Internet-enabled toolset—plus its CA-Ingres product. CA presently has no plans to combine the two or otherwise offer Universal Server functionality. CA-Ingres has not yet fully integrated complex-datatype extensibility or driven it into the architecture. Sybase and Microsoft have not yet implemented complex-datatype support comparable to Informix. Sybase has announced efforts to include such support in the past, validating the importance of Universal Servers to its customers. Sybase has also announced plans to provide an Adaptive Server combined with Sybase's Object-Connect middleware to allow ISVs to link "snap-in" complex data types with the System 11 RDBMS. Aberdeen anticipates that both Sybase and Microsoft will emphasize providing "base" APIs and class libraries for their customers, giving them added flexibility but requiring them to do more of the work of implementing a Universal Server.

In the opinion of Bloor Research, Illustra is a database product that addresses a wide area of application that, at the moment, is poorly served by database products. Our analysis of the market indicates that there will be areas of application for products like Illustra in virtually all large organizations, with particular area application in: the financial sector, retailing, health care, pharmaceuticals, the oil industry, manufacturing, engineering, transport, media and publishing, leisure, education, telecommunications and other utilities, government, defense and scientific research. Illustra's approach to query optimization is a major positive advantage of the product and one that other vendors will probably be obliged to imitate if they wish to compete on performance in Illustra's main areas of application. Illustra's portfolio of DataBlades is broad and impressive and provides it with a significant lead over competitor products, as well as a useful means of outsourcing the extension of its product. A review of competitive products and the evolution of relational databases indicates that the kind of capabilities that Illustra specializes in are now becoming mainstream requirements for database vendors. In our opinion Illustra had a significant lead over other vendors in the ORDBMS field, by virtue of a combination of features: its global optimization, its flexible architecture and its assembled set of DataBlades. We have some concerns as to whether Informix will be able to meet its timetable in integrating Illustra with Informix DSA. Once Illustra and Informix DSA are merged we expect the resulting product to maintain the lead in the areas of scalable relational performance and the management of complex data. It is clear that the merger of the two products has put Informix's competitors into the position of having to play 'catch-up' and some of them will surely have technical difficulties in doing so. What is also difficult to determine is how strongly the demand for ORDBMS capabilities will grow. If it grows quickly, and our expectation is that it will, then Informix will prosper.

Illustra in Overview Illustra Information Technologies Inc. was set up in 1992 by Professor Michael Stonebraker, previously the technical guru behind Ingres. Its product, the Illustra database, is a database server that can handle query access to complex objects such as images, video, sound and so forth. The product is a development of the Postgres database, the result of a research project at the University of California, Berkeley, that was initiated by Professor Stonebraker. The project attempted to define a database that could store all types of data, but which loosely adhered to the relational approach to database. In practice this meant support for an extended form of SQL as an access mechanism and this in turn meant the ability to access stored objects such as images or video by their content. With Illustra, Michael Stonebraker designed and developed what he believed the next generation of database to be. Illustra is not a relational database in the traditional sense, but in most respects it behaves like one. It provides a framework within which all kinds of data can be stored and accessed using a relatively simple query language based on an early draft of the SQL-3 standard. Neither is Illustra an Object Database, although it can also behave like one in many respects, as it supports features such as inheritance and polymorphism that are dear to the hearts of OO enthusiasts. It can be used in both roles, hence it is referred to as an Object Relational database. In effect it is an extension of the relational approach to encompass the management of complex data. Its major area of application is in providing query access to complex data, which neither relational databases nor object databases are particularly good at. Illustra is not the only product of this kind. The database industry has long realized that there is a need to store complex data types (images, sound, etc.) in many areas of application and all the major database vendors claim to be working on providing solutions in this area. However, at the time that Illustra was purchased by Informix, it was clear that Illustra was building up a market lead over its competitors. The company's revenues were growing at above 500%, it was establishing offices across the world and its customer base was expanding fast. Our own belief, at Bloor Research, is that the market for databases of this type is in its infancy and there may be many applications that will arise which need a database like Illustra. However, now that Informix has acquired Illustra, the plan is to unite the two technologies, Informix's powerful engine and Illustra's capability with complex data, to produce what Informix are referring to as the Universal Server. In this paper we attempt to provide a perspective on this effort, by discussing Illustra's capabilities in detail and examining Informix's general plan to unite the two technologies.

Competitive Products to Illustra The Major Database Vendors The potential market for ORDBMS is difficult to estimate, because many areas of application (multi-media, web sites, geographical data and so forth) have only recently become important areas for software development. Our analysis earlier in this paper suggests that the potential area of application is large, probably larger than the current RDBMS market which is approximately $4 billion per annum. A confirmation of this view is given by the fact that there are an increasing number of products emerging that include Object Relational capabilities and all the major database vendors are providing some capabilities or planning to provide them in the near future. The following is not intended as a competitive analysis, but to validate that the database vendors recognize that a market opportunity exists. Microsoft SQL Server As far as we are aware Microsoft intends to provide access to objects through Microsoft SQL Server in a future release. This will probably coincide with the advent of 'Cairo' and be integrated with WNT's approach to object storage. Sybase Sybase has been working on object extensions to its version of SQL Server. As far as we are aware, there is no firm release date set for this. Oracle In its latest release, version 7.3, Oracle has added a multi-dimensional access technique, similar in function to Illustra's R-tree, and has improved Oracle's ability to store and access complex objects. Some of these capabilities were originally promised by Oracle for version 8 but appear to have been brought forward and released earlier. IBM and DB2 IBM has been working on support for complex data types within DB2 for some time and now provides support for text, audio, video, image and fingerprint data, although currently this is only available in the AIX version of DB2. This does not yet support parallel operation. IBM's approach is similar to Illustra's in its view of global optimization, but as far as we are aware, there is no equivalent feature to the DataBlade, that allows users to define datatypes, functions and access methods. CA and CA-Ingres Computer Associates' acquisition of ASK/Ingres provided it with a relational database that already had object storage capabilities, although not query optimization of object access. CA intend to enhance these capabilities and there is currently a cooperative venture with Fujitsu to link its capabilities with Fujitsu's ODB2, an object database of Japanese origin. Other Object Relational Products UniSQL/X UniSQL/X is an ORDBMS product from UniSQL Inc. of Austin Texas. The company also offers a product called UniSQL/M which is an enhanced version of UniSQL/X that provides a global schema for accessing multiple databases including RDBMS, ODBMS and navigational DBMS. Total ORDB and Total ObjectHub These are products from Cincom, but based on the UniSQL products. Total ORDB combines object, relational and navigational capabilities and Total ObjectHub is a layer on top which integrates heterogeneous data sources. Odapter This is a product from Hewlett-Packard that integrates OO applications with relational and other data sources, using an object wrapper approach. It allows complex objects to be stored in a relational database (either Oracle7 or Hewlett-Packard's own Allbase). Omniscience ORDBMS This is a relatively new product from Omniscience Object Technology Inc. of Santa Clara, California. The product is aimed at C++ users, allowing them to mix OO and relational programming approaches. The Problem for Relational Vendors As explained earlier in this paper, Illustra's approach to SQL optimization of access to complex data is distinctly different to that of a typical relational database. The simple approach originally adopted by the majority of relational database vendors in the storage of objects is illustrated in the following diagram: Effectively, an OO wrapper is placed around the optimizer. This splits the SQL query between a part that passes through the relational database optimizer and a part that accesses the stored BLOB. At best, the database will be able to resolve the query to the point where the number of BLOB accesses is minimized. For content access to some BLOBs, this may be a sensible approach. However, there is no global optimization capability within this architecture. Hence a query that accesses several different kinds of BLOB, video and sound, for example, will be difficult to optimize. The greater the number of complex data types that are added, the more the optimization process will become fragmented. In other words, this approach to optimization can only deliver 'point solutions' at best, solutions that are only appropriate to the storage of one type of BLOB. Moreover, this architecture does not easily provide for the user defining access techniques, functions and so forth within the database so in many instances such functionality will be defined in the client application and sit on the client side of the client/server divide. The consequences of this are inefficient data access, heavy network traffic and possible overloading of the client resource. The global optimization approach taken by Illustra is not fragmented. It takes into account the cost of content access to many complex data types at once and it allows for the cost of function execution in the database. This affects every stage of the optimization process. Relational database vendors that wish to support complex data are faced with the choice of re-engineering the optimization process completely, tailoring it for each new complex data type added or compromising between the two approaches.


Página principal | Dados sobre o autor




Clique aqui para enviar um e-mail

© 1996-1997 João Alexandre Sartorelli.
Todos os direitos reservados