Use Rational Data Architect to integrate data...

来源:百度文库 编辑:神马文学网 时间:2024/04/28 17:38:42
Use Rational Data Architect to integrate data sources
A five-step process for success

Document options

Print this page

E-mail this page
Rate this page

Help us improve this content
Level: Introductory
Davor Gornik (dgornik@us.ibm.com), Product Manager, IBM
21 Mar 2006
No doubt about it -- information integration is challenging. Many business decisions must be documented and many transformations must be performed. IBM Rational® Data Architect can document your decisions and automate part of this process. Read this article to explore a tool-supported process for federation design in just five steps.
When attempting to integrate data sources, you need to consider many activities. Rational Data Architect can help document decisions and automate parts of your tasks. In this article, you are introduced to a process you can use and modify for your specific data integration needs. The five steps to a successful design, covered in this article, are:Annotating existing infrastructureMapping data sources to each otherCreating a federated modelMapping federation data sourcesGenerating federation code
 


Back to top
Rational Data Architect is a data modeling and integration design tool designed to help data architects understand information assets, their relationships and dependencies, map assets to each other and create integration schemas. Architected for teams of any size, Rational Data Architect combines data modeling with mapping discovery and model and database analysis -- all in a single, integrated tool. In addition, Rational Data Architect supports enterprise standards enforcement. Rational Data Architect uses a heterogeneous approach that facilitates federation design and is an essential tool for information integration projects.
Rational Data Architect provides tools that can dramatically reduce design and development hours. This new software, built on the open source Eclipse platform, helps data architects model, discover, map, and analyze data across multiple information sources, automating information integration in complex environments.
The first step of the process helps users assess their current situation. Although this phase involves some automated steps, such as reverse engineering, most of this process is done manually because every annotation is basically a high-probability guess. It is essential to have the participation of the original designers of the data source, and users of the data source.
To annotate your existing infrastructure:
Connect to the existing data source.
To be able to access the data structure, you need to follow standard connectivity protocols. You need to know the type of the data source, the driver used to connect to it, and the login information (in most cases, login and password). Rational Data Architect uses standard JDBC connectivity to connect to the data source. All further communication with the data source is performed using native queries to the system tables of the data source.
Select the subset of available data structures from the data source.
Many data sources include data that is irrelevant for understanding stored information, such as counters, temporary helper tables used to sort data, and multilingual text for the user interface. It is much easier to eliminate such data structures at the beginning of the process.
Rational Data Architect allows filtering at any level of the data structure in the Database Explorer, shown inFigure 1. We will define a filter that will leave only relevant information.

Create a model from the selected subset.
There are two main reasons to create a model from the data source: Most databases are not able to capture business relevant annotations and documentation at the level of detail needed for a successful integration process.
 
Change management. Integrations have to be designed on a stable structure of data. If the of structure of the data source changes, you need to implement an update for the integration, resulting in a new version of the model.
A physical data model that can be created from the data source is basically an abstracted copy of the data structure from the data source. SeeFigure 2.
 

Document data structures in the model.
While a model displays most of the level of detail of specification from the data source, this is not enough for our understanding of the data. For example, CLNR specified as CHAR(16) is not something that every developer would interpret in exactly the same way. In this activity, you add documentation to every element in the model, including every column, every table, every constraint, and every trigger. You should also specify business-relevant names, to allow faster readability of the model.
It‘s also strongly recommend that you create context-relevant diagrams. However, this does not mean you should create a huge diagram gathered from the walls of many meeting rooms. Instead, create small diagrams with approximately seven essential elements. (You can have less, but avoid more, if possible.)
Create or verify a related to the model.
Used with activity 4, you can start creating a glossary that defines the meaning of names in the data source. Designers and developers have always sought to use names that make their jobs easier. Even when severe constraints on the length of names, naming standards were used for simplification reasons. Consistency depended on the discipline and life cycle of each data source.
You can refer to a glossary in Rational Data Architect, which includes a list of valid business names with possible abbreviations, shown inFigure 3. For example, the abbreviation CL could stand for client and the abbreviation NR could mean number. Some data sources could have even more extreme, non-intuitive abbreviations, such as J9 to mean client or O1 to indicate identifier. Rational Data Architect does not limit the number of glossaries that can be used at the same time, although I personally recommend that you use only one glossary per model. (This is, by the way, not a technical recommendation, but a user-experience based recommendation.)

These five activities to annotate your current situation may seem short, but most are very time intense and include a lot of manual work.


Back to top
The integration process typically includes integrating from more than one data source and each data source needs to be annotated before you can proceed. Afterannotating the existing infrastructure, you understand each data source separately, but are still unclear about the overlapping and related information from all data sources.
Mapping existing data sources is optional, because it does not produce results that are required to further automate the process. However, it‘s highly recommend that you do the mapping, to increase your understanding of the completeness of data for integration, and to foresee possible collisions of data between different data sources.
To map data sources to each other:
Create a new mapping model between each pair of data source models.
A mapping is a dependency between two data structures that is not implemented in the realization of the data source. A mapping model is a summary of mappings between two independent data sources or data models. The number of mapping models rapidly increases with the number of data sources. You could have one mapping model for two sources, three mapping models for three sources, six mapping models for four sources -- all counting just one direction of models. If you are working with many data sources, you don‘t typically have to create all of the models. Instead, you can use some of them as references and create mapping models only to those models, as shown inFigure 4.

Discover (automatically or manually) mappings between the data source structures.
Remember theglossary created in the previous section? At this point, the glossary can help you automate an activity. Mapping discovery can use glossaries to create better suggestions for possible mappings. Each mapping expresses the rule of creation of target structure from the source structure. For example, suppose you have a mapping between driver‘s license as a target and birth certificates as a source. A mapping to the "name" on the drivers license would be a concatenation of the "first name," "middle name," and "last name" from the birth certificate. This is an example of a mapping that includes transformation. Models typically have hundreds of such elements. It is possible to define all of the mappings manually, but it would take weeks of work.
Rational Data Architect can help you identify the simplest of all mappings, which realistically represent the vast majority: the one-to-one mappings. Those are mappings from "family name" to "surname," for example. In the first version of Rational Data Architect, mapping discovery can use a combination of up to five discovery algorithms.
The simplest mapping compares the names of model elements, and optionally uses glossary models to increase the precision of results by expanding abbreviations into business names before comparison. More complex mapping discovery uses externally purchased thesauruses to find synonyms or even data samples from the data source to validate possible mappings. The discovery of mappings has to be done for each mapping model and should be accompanied by documentation of individual mappings for easier readability of the model.
Complete annotations of data source models.
You can gain additional understanding of data source models from mapping models. For example, you might discover that some data structure in the first data source is related to a data structure in another data source. It could also be an invalidation notice specifying that part of data should not be considered in the integration process because it is inaccurate. It is extremely valuable to complete the mapping between existing data sources, even if you do not intend to integrate information.
The results of the mappings should be explored from two perspectives: Competing data from different models. Competing data could result in more complex integration specification that either prioritizes data from one data source from the other or includes the most recent data. Exclusivity of data structures. These structures should be examined to determine whether it‘s necessary to include them in the federated model.
 
Both examinations result in business decisions and are dependent on your reasons for information integration.


Back to top
Gaining a good understanding of data sources is essential to validate whether you can complete the process of information integration. A main component of this process is specifying the target, or the schema, that will be visible after the integration. This step should unify the business demand that requires integration with the possibilities of your existing information.
Create a business (logical) model aimed at the solution.
A business model defines entities and relationships between entities, without consideration of the implementation platform. The model has to solve the business problem. If the business problem requires just a summary of all account standings, for example, then you don‘t need to include order details in the model.
Rational Data Architect implements this view as a logical data model, as shown inFigure 5.

A logical data model is not constrained regarding possible relationships between different entities. It can contain any kind of relationship, including subtyping and many-to-many relationships. During the design process of the logical model, the ongoing validation with business stewards, the owners of the business process, is extremely important. Only they can recognize if something is missing or if the model is not correct regarding relationships and rules.
To make the model even more understandable, you should create as many diagrams as required to express different business views. Documentation and annotation are the most important parts of models. Imagine how it would feel if someone gave you a model to read without a single line of additional documentation -- the model would lose some of it meaning and you could end up considering it nothing more than a nice drawing.
Turn the logical model into a physical implementation model.
The logical model expresses the business view of information. The next activity is to turn this model into a physical model that is constrained by the technology we‘ll use to realize it. This process is relatively straightforward for the first transformation and requires care during version upgrades of models.
Rational Data Architect allows you to transform a logical model to a physical model. During the transformation, Rational Data Architect automatically resolves all constraints of the target model, such as lack of many-to-many relationships or subtyping, and implements them correctly for the selected target. Rational Data Architect also lets you compare a logical model to the physical model, and update a physical model from this comparison, using the Compare & Synchronize function.
The resulting physical model is not the model that will actually be implemented as a schema in WebSphere Information Integrator; it is a prototype of the integration model, which will be created during the code generation and will replace tables with corresponding nicknames and views.


Back to top
The fourth major step in this information integration design is to create the mapping between original data sources represented by physical models and the target federation model, also represented by a physical model. This mapping has to be complete and executable to be able to generate code.
The activities in this mapping are very similar toMapping data sources to each other, with only a few alterations.
Create a new mapping model between each data source model and the federated model.
This step results in exactly the same number of models as the number of data sources. The summary of all of those models will define how to create the complete federated schema from existing data sources. There will very likely be competing specifications for an element in different data models. We don‘t address them in this activity, but will eliminate them later on.
Discover (automatically or manually) mappings between the data source structures.
As in the previous case, you need to discover mappings between source and federated schemas, as shown inFigure 6. This activity is almost identical to the activity between different source schemas discussed earlier. You need to take care of more complicated cases that span more than one table structure on the source by using mapping groups. A mapping group is comparable to a result set that you get with one selection of data from the source to receive federated data (or one "select" statement).
You can use the alternative view of mapping groups in Rational Data Architect to evaluate and define joins of any complexity. If joins already exist in the source model, it will be automatically suggested in the mapping editor.

Complete transformations for mappings of data source models.
To use mappings to create federation code, you need to define executable transformations. Whenever there is a need for a change of format, content, or structure of data, you need to specify how this will be performed. This requires transformation code that is known to the server -- in this case, WebSphere Information Integrator.
Use the expression builder or enter the transformation directly in the expression property of a mapping in Rational Data Architect. Expression builder already offers a selection of WebSphere Information Integrator predefined functions that can be used.
Next you need to define all necessary transformations from the source to the federation schema. There is just one problem: there might be too many transformations. Because independent mapping editors were used, you don‘t have any control over the number of mappings that are defined for each element (column) on the target. This is something that you should resolve if you want to generate code.


Back to top
The final step is the transformation from models back to executable code. You‘ll do this from the mapping model. But how can you make sure that you generate the right code?
To receive valid code for information integration from all data sources:
Combine all mapping models into one.
First, you need to get an overview of everything we defined as mapping from any of our data sources to the federated model. You can do this if you overlay all of the source models on one side, and leave the single federated model as the target on the other side. This step results in a very busy model with a lot of mappings, which should not be a big concern, because you‘ll eliminate many of them in the next step.
Rational Data Architect lets you combine two mapping models into one in several ways. The one we‘ll use combines two models with identical targets. We will repeat this until all of the models are joined into one.
Another possible way you could combine two mapping models is when the target of one is identical to the source of the other model.
Eliminate competing mappings.
This activity is essential if you want to receive a single executable model. The result needs to be a single executable mapping for each of the target elements (columns). Combining all mapping models created many elements that are targets for more than one mapping. We will look at such elements and select one single mapping. All other mappings need to be removed.
Alternatively, you could also delete a mapping group if you decide that the whole mapping group (the join) should not be used.
You also need to delete all mapping groups that are empty. You can easily do this by selecting the mapping group details view in the resulting mapping model.
Generate target schema from mapping model.
From the model, you can generate the DDL, though we have to be careful. Remember that every physical model knows about the target capabilities. You need to select a model generated from WebSphere Information Integrator to receive federation code with nicknames and generated views.
While in the code generation wizard from the mapping model, Rational Data Architect allows for changes to the names for any generated element, as shown inFigure 7. The result of the code generation is a schema with all elements in the target integration model, as well as a script for code generation.

Execute schema DDL with WebSphere Information Integrator.
It‘s rewarding to see the generated script and to know it‘s available for changes. I recommend generating code from the model itself because you can compare it with the target and generate code selectively.
When generating code, you‘ll use a connection to WebSphere Information Integrator -- the same as used to reverse engineer initial models.
And now you‘ve finished the design process. At this point, it‘s time to think about test and deployment.


Back to top
This article described a five-step process for federation design that will produce a federated schema. You also end up with a set of intermediate models that are completely reusable, and will shorten the process next time. This process also helps increase your understanding of the overall information infrastructure.
Rational Data Architect was created to help you with your information integration. I invite you to explore more about it using the download inResources.