However, we think in terms of the business identifier. The Time dimension created by the wizard creates an attribute called CODE, which it uses as the business identifier. It is a number that is used to represent the date for the level record. We will shortly use the New Dimension Wizard to create our Product dimension, and there we'll see how to specify a business identifier explicitly.
It will be created automatically for us by the wizard. Continuing to the last step, it will display a progress bar as it performs each step and will display text in the main window indicating the step being performed. When it completes, we click on the Next button and it takes us to the final screen—the summary screen. This screen is a display of the objects it created and is similar to the previous display in step 5 of 6 that shows the pre-create settings.
At this point, these objects have been created and we press the Finish button. Now we have a fully functional Time dimension for our data warehouse.
We could use the Data Object Editor to create our Time dimension, but we would have to manually specify each attribute, level, hierarchy, and sequence to use. Then we would have to create the mapping to populate it. So we definitely saved quite a bit of time by using the wizard.
The Time Dimension Wizard does quite a bit for us. Not only does it create the Time dimension, but also creates a couple of additional objects needed to support it. This is what we will deploy and run to actually build our Time dimension.
We can also see that a sequence was created under the Sequences node. This is the sequence the dimension will use for the ID attribute that it created automatically as the surrogate identifier. This completes our Time dimension, so let's look at the next dimension we're going to create. It is the dimension to hold the product information. The principles of the Time dimension apply to this dimension as well. The first thing we should consider is how each toy or gizmo sold by ACME is represented.
As with any retail operation, a Stock Keeping Unit SKU is maintained that uniquely identifies each individual type of item sold. This is an individual number assigned by the main office that uniquely identifies each type of product sold by ACME, and there could be tens of thousands of different items.
There could be more than one product with the same name, but they won't have the same SKU. An SKU number all by itself is not very helpful.
Therefore, in our Product dimension, we will want to make available more descriptive information about each product such as the description. Every SKU can be grouped together by brand name—the toy manufacturer who makes the product— and then by the category of product, such as game, doll, action figure, sporting goods, and so on. Each category could be grouped by department in the store. For each of those levels in the hierarchy, that is the department, category, and brand, we need to have a business identifier.
For that the NAME will be sufficient as there are no departments, categories, or brands that have the same name. This means they can appear on more than one level. Each level has a name Item, Brand, Category, and Department that identifies the level, but what about the names of the individual brands, or the different categories or departments? There has to be a place to store those names and descriptions, and that is the purpose of these dimension attributes.
By labeling them as dimension attributes, they appear once for each level in the dimension. They are used to store the individual names and descriptions of the brands, categories, and departments.
It is similar to the Time Dimension Wizard we used earlier, but is more generic for applying to other dimensions. As a result, there will be more steps involved in the wizard just because it has to ask us more because it will not be able to make as many assumptions as it did with the Time dimension.
This wizard can be used with any dimension, and therefore things such as attributes, levels, and hierarchies are going to need to be defined explicitly. Choose New and then Using Wizard The very first screen we'll see is the Welcome screen that will describe for us the steps that we will be going through.
We can see that it requires more steps than the Time Dimension Wizard: We will have to provide a name for our dimension, and tell it what type of storage to use—relational or multidimensional—just as we did for the Time Dimension Wizard.
It will then ask us to define our dimension attributes. We didn't have to do that for the Time dimension. That wizard had a preset number of attributes it defined for us automatically because it knew it was creating a Time dimension. We then had to define the levels where we simply chose from a preset list of levels for the Time dimension.
Here we have to explicitly name the levels. This is where we'll have to pay close attention to aggregations. We will then choose our level attributes from the dimension attributes. Then we see in the previous figure that we will have to choose the slowly changing dimension type, which is how we want to handle changes to values in our dimension attributes over time.
We'll then get a last chance to review the settings, and then it will create the dimension for us showing us the progress, which is similar to the last two steps of the Time Dimension Wizard. After reviewing the steps, the wizard will go to the next screen where we enter a name for the dimension that we will call Product. We'll then proceed to step 2, which is where we will select the ROLAP option for relational, as we did for the Time dimension. Proceeding to step 3, we will be able to list the attributes that we want contained in our Product dimension.
Scroll the window to the right if any columns are not visible that need to be changed. We can also expand the dialog box to show additional columns. Click on the drop-down box in the identifier column for SKU, and select Business. Suppose we make a mistake and enter a value and then decide not to keep it. Then we can delete the row by right-clicking on the row number to the left of the row, and then selecting Delete from the pop- up menu.
If we were to scroll that window all the way to the right, or expand it completely, we'd see even more columns such as the Seconds Precision and Descriptor column. If we press the Help button, it will explain what each column is. The Descriptor is applicable to MOLAP multidimensional implementations and provides six standard descriptions that can be assigned to columns.
It presets two columns, the Long description and the Short description. We can safely ignore them for our application. The next step is where we can specify the levels in our dimension. There must be at least one level identified, but we are going to have four in our Product dimension. They are to be entered on this screen in order from top to bottom with the highest level listed first, then down to the lowest level.
Moving on to the next screen, we get to specify the level attributes. At the top are the levels, and at the bottom is the list of attributes with checkboxes beside each. If we click on each level in the top portion of the dialog box, we can see in the bottom portion that the wizard has preselected attributes for us. We are not going to make any changes on this screen. This refers to the fact that dimension values will change over time.
Although this doesn't happen often, they will change and hence the "slowly" designation. We will have the following three choices, which are related to the issue of whether or how we want to maintain a history of that change in the dimension: Type 1: Do not keep a history. This means we basically do not care what the old value was and just change it. Type 2: Store the complete change history.
This means we definitely care about keeping that change along with any change that has ever taken place in the dimension. Type 3: Store only the previous value. This means we only care about seeing what the previous value might have been, but don't care what it was before that.
The Type 2 option to maintain a complete history would result in needing additional attributes where we want to maintain historical information. We need attributes designated as Triggering attributes. If changed, these attributes will generate a historical record.
We also need an Effective Date attribute and an Expiration Date attribute. The Effective Date is when the record is entered. If a triggering attribute changes, the Expiration Date is set and a new record created with the updated information.
For the slowly changing Type 3 option, a new attribute in the dimension will be required to store only the most recent value. Moving on, we get our summary screen of the actions we performed. Here we can review our actions, and go back and make any changes if needed. It will look like the following, based on the selections we've made: 8. Everything looks fine, so we move on to step 8.
This step creates the dimension, showing us a progress bar as it does its work. It will report a successful completion when it's done, and clicking on the Next button at this point will bring us to the summary screen where we see the above information followed by additional information that the wizard has created for us based on our responses. What the wizard has created for us are the definitions of our dimension, and the underlying table and other objects in OWB.
The Store dimension We can create our Store dimension in a similar manner using the wizard. We will not go through it in much detail as it is very similar to how we created the Product dimension.
The only difference is the type of information we're going to have in our Store dimension. This dimension provides the location information for our data warehouse, and so it will contain address information. The creation of this dimension will be left as an exercise for the reader using the following details about the dimension. This is how we will include the region and country information. It may seem a bit redundant to include a description as well as a name for the Country and Region levels as our source data at the moment only includes one field to identify the country and region.
In step 7, the Pre Create settings page, as shown next, we can see what we should have specified for the Store dimension. We can click on the Back button to go back to make any changes. We have already designed our three dimensions, and their links and measures will go together to make up the information stored in our cube. There is a wizard available to us for creating a cube that we will make use of to ease our task.
So let's start designing the cube with the wizard. Creating a cube with the wizard We will start the wizard in a similar manner to how we started up the Dimension wizard. The following are the steps in the creation process: 1. We proceed right to the first step where we give our cube a name. In this step, we will select the storage type just as we did for the dimensions. We will select ROLAP for relational storage to match our dimension storage option, and then move to the next step.
In this step, we will choose the dimensions to include with our cube. We have defined three, and want all them all included. So, we can click on the double arrow in the center to move all the dimensions and select them. If we had more dimensions defined than we were going to include with this cube, we would click on each, and click on the single right arrow to move each of them over ; or we could select multiple dimensions at one time by holding down the Ctrl key as we clicked on each dimension.
Then click the single right arrow to move those selected dimensions. Moving on to the last step, we will enter the measures we would like the cube to contain. Selecting Finish on this screen will close the dialog box and place the cube in the Project Explorer. This final screen is the second-to-last screen when creating a dimension. The dimension wizard will present us with the progress screen as the final step Just as with the dimension wizard earlier, we get to see what the cube wizard is going to create for us in the Warehouse Builder.
The wizard shows us that it will be creating a table named SALES for us that will contain the referenced columns, which it figured out from the dimension and measures information we provided. At this point, nothing has actually been created in the database apart from the definitions of the objects in the Warehouse Builder workspace. These are the tables corresponding to our three dimensions and the cube.
The foreign keys we can see in the previous image are the pointers to the dimension tables. They will make the connection between our cube and our dimensions when they are deployed to the database.
The aggregation the cube will perform for us when we view different levels is one of those behind-the-scenes capabilities we would get with the OLAP feature.
When we view the region amounts, they will automatically be summed up from the amounts of the various stores in the region without us having to do anything extra. This is a nice feature the multidimensional implementation gives us, but aggregations are not created for the pure relational storage option.
As we can generate either a relational or a multidimensional implementation, this had to be specified anyway and so it defaulted to sum. It is possible to use aggregations with a pure relational implementation by creating separate summing tables, and there are OLAP data mining applications that can make use of them for more advanced implementations.
Our cube and dimensions are now complete. Let's take a look next at the Data Object Editor where we can view and edit our objects. We did not have to use it to create a dimension, but more advanced implementations would definitely need to make use of it; for instance, to edit the cube to change the aggregation method that we just discussed. We can get to the Data Object Editor from the Project Explorer by double-clicking on an object, or by highlighting an object by selecting it with a single click , and then selecting Edit Open Editor from the menu.
This is called the Canvas. As we're in the Data Object Editor, the objects in the Canvas will be the objects that we created to hold data, which in this case are our cube and dimensions. These boxes can be moved around and resized manually to suit our tastes. As we're working with cubes and dimensions, these will be displayed on the Dimensional tab.
If we were working with the underlying tables, they would have appeared on the Relational tab. As this is the Data Object Editor, we can see other data objects in the Explorer. The Available Objects tab shows us objects that are available to include on our Canvas, the Selected Objects tab shows the objects that are actually currently on the Canvas, and will highlight the object currently selected.
If nothing shows in this window, just select an object in the Canvas by clicking on it and the configuration will appear. It is here that we can change the deployment option for the object to deploy OLAP metadata if we want a relational implementation to store the OLAP metadata.
In this case, they are all data objects. The list of objects available will change as the tab is changed in the Canvas to view different types of object. We can use this to create objects on our Canvas by clicking and dragging to the Canvas.
This will create a new object where clicking and dragging from the Explorer will place an already created object on the canvas. We can click and drag the blue-colored box around this window to view various portions of the main canvas, which will scroll as we move the blue box. We will find that in most editors, we will quickly outgrow the available space to display everything at once and will have to scroll around to see everything. This can come in very handy for rapidly scrolling the window.
Six tabs will appear, which display information for us. The names on those six tabs will change depending on the type of object we have selected. Name: This tab displays the name of the dimension along with some other information specific to the dimension type we are looking at. In this case, it's a Time dimension created by the Time Dimension Wizard and so it displays the range of data in our Time dimension.
Storage: Here we can see what storage option is set for our dimension object in the database, whether Relational or Multidimensional. If we wanted to switch between the two, this is where we could do it. For a relational implementation, we're able to specify a star or snowflake schema and whether we want to create composite unique keys.
A composite key is one made up of more than one column to define uniqueness for a record. Attributes: The attributes tab is where we can see the attributes that are designed for our dimension. It is here that we can also change the description of our attributes if we wanted, or add descriptions the wizard did not add.
We are able to edit some of the information on this tab for the Time dimension created by the wizard, but not all. We can check and uncheck boxes to indicate which of the various level types we want to use and which attributes are applicable to which level, but that is it.
We are not able to add or remove any levels or attributes. If we were to view one of the other dimensions we created, it would be fully editable. For those other dimensions we could also assign different names and descriptions to the attributes for each level. Hierarchies: This tab will let us specify hierarchy information for our dimension and will even let us create a new hierarchy.
It's possible that we may have selected more levels on the previous page and now need to assign them to a hierarchy.
There is also a Create Map button here that will automatically generate the mapping for us if we modify the hierarchies. This is one of the benefits of the Time dimension created by the wizard. Ordinary dimensions such as our Store and Product dimension will not have this Create Map button displayed on their Hierarchies tab. Data Viewer: The Data Viewer is a more advanced feature that allows us to actually view the data in an object we are editing.
This is only available for an object if it has been deployed to the database and has data loaded into it. It has a query capability to retrieve data and can specify a WHERE clause to get just the data we might need to see. The tabs also change slightly: Name: It has a name tab like the dimensions to display its name. Storage: It has a storage tab as per dimensions. An index is a database feature that allows faster access to data.
It is somewhat analogous to the index of a book that allows us to get to a page in the book with the information we want much faster. A bitmap-type index refers to how it is stored in the database and is generally a better option to use for data warehouse implementations so it is checked by default. There is also a composite unique key checkbox for cubes as there was for dimensions.
Dimensions: Instead of attributes, the cube has a tab for dimensions. The dimensions referenced by a cube are basically its attributes. Measures: The next tab is for the measures of the cube.
It is for those values that we are storing in our cube as the facts that we wish to track. Aggregations: Instead of hierarchies, a cube has aggregations. There are various methods of aggregation that we can select, as seen in the drop-down box, the most common of which is sum, which is the default.
This is where the default aggregation method referred to earlier can be changed. There will be no aggregations in a pure relational implementation, so we will leave this tab set to the defaults and not bother changing it. Data Viewer: There is a tab for the data viewer to view cube data just as there is for a dimension. For pure relational implementations, it views the underlying table data. These are the main features of the Data Object Editor.
We can use it to view the objects the wizards have created for us, edit them, or create brand new objects from scratch. We can start with an empty canvas and drag new objects from the palette, or existing objects from the explorer, and then connect them. We will see other editors very similar to this from the next chapter when we start to look at ETL and mappings.
ETL is the first step in building the mappings from source to target. We have sources and targets defined and now we need to: 1. Work on extracting the data from our sources. Perform any transformations on that data to clean it up or modify it. Load it into our target data warehouse structure. We will accomplish this by designing mappings in OWB. Mappings are visual representations of the flow of data from source to target and the operations that need to be performed on the data.
ETL The process of extracting, transforming, and loading data can appear rather complicated. We do have a special term to describe it, ETL, which contains the three steps mentioned.
We're dealing with source data on different database systems from our target and a database from a vendor other than Oracle. Let's look from a high level at what is involved in getting that data from a source system to our target, and then take a look at whether to stage the data or not. We will then see how to automate that process in Warehouse Builder, which will relieve us of much of the work.
Manual ETL processes First of all, we need to be able to get data out of that source system and move it over to the target system. We can't begin to do anything until that is accomplished, but what means can we use to do so?
We know that the Oracle Database provides various methods to load data into it. This could be one way to get data from our source system. Every database vendor provides some means of extracting data from their tables and saving it to flat files. Reading the documentation that describes how to use that utility, we see that we have to define a control file to describe the loading process and definitions of the fields to be loaded.
In a nutshell, this is the process of extract, transform, and load. Extract the data from the source system by some method. Finally, we have to load it into the target structure.
The good news here is that the Warehouse Builder provides us the means to design this process graphically, and then generate all the code we need automatically so that we don't have to build all that code manually.
Staging Staging is the process of copying the source data temporarily into a table s in our target database. Here we can perform any transformations that are required before loading the source data into the final target tables.
The source data could actually be copied to a table in another database that we create just for this purpose, but it doesn't have to be. This process involves saving data to storage at any step along the way to the final target structure, and it can incorporate a number of intermediate staging steps.
The source and target designations will be affected during the intermediate steps of staging. Now, we'll look at the staging process before we actually design any ETL logic. To stage or not to stage There are a number of considerations we can take into account when deciding whether to use a staging area or not for our source data: The points to consider to keep the process flowing as fast as possible are: The amount of source data we will be dealing with The amount of manipulations of the source data that will be required If the source data is in another database other than an Oracle Database, the reliability of the connection to the database and the performance of the link while pulling data across If a failure occurs during an intermediate step of the ETL process, we will have to restart the process.
If such a failure occurs, we will have to consider the severity of the impact, as in the following cases: Going back again to the source system to pull data if the first attempt failed. The source data is changing while we are trying to load it into the warehouse, meaning that whatever data we pull the second time might be different from what we started with and which caused the failure. This condition will make it difficult to debug the error that caused this failure.
Configuration of a staging area A staging area is clearly an advantage when designing our ETL. So we'll want to create one, but we will need to decide where we want to create it—in the database or outside the database.
Our staging area in this case would be a folder on the file system and the data would be stored in a flat file. The external tables in the Oracle Database now render some of the reasons for keeping source staging data in tables in the database moot. We can treat a flat file as essentially another table in the database.
This option is available to us in the Warehouse Builder for defining a flat file as a source, but not a target. So we must keep this in mind if we want to stage data at some other point during the process after the initial load of data. Earlier, we needed to have all our data in database tables if we were relying solely on SQL in the database. External tables allow us to access flat files using all the benefits of SQL for querying the data, so now that reason is not as big a factor as it once was.
OWB handles this with what are called mappings. A mapping is composed of a series of operators that describe the sources, targets, and a series of operations that flow from source to target to load the data. It is all designed in a graphical manner using the Mapping Editor, which is available from the Design Center. To access the Mapping Editor, we need a mapping to work on.
So to begin with, we can create an empty mapping at this point. Mappings are created in the Mappings node. We can find it under the module we created to hold our data warehouse design under the Databases Oracle node in our project. Instead of creating a new mapping, which will have nothing in it yet, let's open this mapping and take a look at it for our initial exploration of the features in OWB for designing mappings.
We are not going to modify it, but we will use the displayed Mapping Editor to familiarize ourselves with its features. Mappings created automatically like this one do not bother with any layout details, and just place all the objects in the same spot on the canvas that big window on the right. We can go on clicking on the object and dragging it into a new location, thus revealing the one beneath it; but that's too much work. The Mapping Editor, as with all the editors, provides a convenient Auto Layout option that will do all that for us.
Click on the Auto Layout button in the toolbar to spread everything out. This window is also referred to as the canvas. The Data Object Editor used the canvas to lay out the data objects and connect them, whereas this editor lays out the mapping objects and connects them.
This is the graphical display that will show the operators being used and the connections between the operators that indicate the data flow from source to target. It has the same two tabs—the Available Objects tab and the Selected Objects tab. It displays the same information that appears in the Project Explorer, and allows us to drag and drop objects directly into our mapping.
It displays objects defined in our project elsewhere, and they can be dragged and dropped into this mapping. Selected Objects: This tab displays all the objects currently defined in our mapping. When an object is selected in the canvas, the Selected Objects window will scroll to that object and highlight it. Likewise, if we select an object in the Selected Objects tab, the main canvas will scroll so that the object is visible and we will select it in the canvas.
Go ahead and click on a few objects to see what the tab does. Mapping properties The mapping properties window displays the various properties that can be set for objects in our mapping. It was called the Configuration window in Data Object Editor.
When an object is selected in the canvas to the right, its properties will display in this window. We can scroll the Explorer window until we see the operator and then click on it, or we can scroll the main canvas until we see it and then click on the top portion of the frame to select it. We can click on the object we want to place in the mapping and drag it onto the canvas. This list will be customized based on the type of editor we're using. The Mapping Editor will display mapping objects.
The Data Object Editor will display data objects. We can click and drag the blue-colored box around this window to view various portions of the main canvas. The main canvas will scroll as we move the blue box. Go ahead and give that a try. We will find that in most mappings, we'll quickly outgrow the available space to display everything at once and will have to scroll around to see everything. This operator happens to be a Mapping Input Parameter operator. There are two major types of attributes—an input group and an output group.
An attribute group name is edited in the Details window for the group name. This window is accessible by right-clicking on the group name in the canvas and selecting Open Details All of the operators are available to us from the Palette window in the Mapping Editor.
Source and target operators The Warehouse Builder provides operators that we will use to represent the sources of our data and the targets into which we will load data. Cube Operator—an operator that represents a cube. This operator will be used to represent that cube in our mapping.
Dimension Operator—an operator that represents previously defined dimensions. External Table Operator—this operator represents external tables, which we have seen in Chapter 2. They can be used to access data stored in flat files as if they were tables. We will look at using an external table to access the flat file that we imported.
Table Operator—this operator represents a table in the database. We will need to store data in tables in our Oracle Database at some point in the loading of data. Constant—represents a constant value that is needed. It can be used to load a default value for a field that doesn't have any input from another source, for instance. Source data is frequently retrieved via a view in the source database that can pull data from multiple sources into a single, easily accessible view.
Sequence Operator—can be used to represent a database sequence, which is an automatic generator of sequential unique numbers and is most often used for populating a primary key field. Construct Object—this operator can be used to actually construct an object in our mapping. For this, we need to transform the source data into a new structure. That is the purpose of the data flow operators. They are dragged and dropped into our mapping between our sources and targets. Then they are connected to those sources and targets to indicate the flow of data and the transformations that will occur on that data as it is being pulled from the source and loaded into the target structure.
Some of the common data flow operators we'll see are as follows: Aggregator—there are times when source data is at a finer level of detail than we need. This is the purpose of the Aggregator operator. This is implemented behind the scenes using an SQL group by clause with an aggregation SQL function applied to the amount s we want to aggregate. Deduplicator—sometimes our data records will contain duplicate combinations that we want to weed out so we're loading only unique combinations of data.
The Deduplicator operator will do this for us. It's implemented behind the scenes with the distinct SQL function, which returns combinations of data elements that are unique. Expression—this represents an SQL expression that can be applied to the output to produce the desired result. Any valid SQL code for an expression can be used, and we can reference input attributes to include them as well as functions. Written by one of the key figures in its design and construction, Data Warehousing: Using the Wal-Mart Model gives you an insider's view of this enormous project.
Continuously drawing from this example, the author teaches you the general principles and specific techniques you need to understand to be a valuable part of your organization's own data warehouse project, however large or small. You'll emerge with a practical understanding of both the business and technical aspects of building a data warehouse for storing and accessing data in a strategically useful way.
Related business entities like booking and billing should be first implemented and then integrated with each other. Iterative Prototyping: Rather than a big bang approach to implementation, the Datawarehouse should be developed and tested iteratively.
Open navigation menu. Close suggestions Search Search. User Settings. Skip carousel. Carousel Previous. Carousel Next. What is Scribd? Explore Ebooks. Bestsellers Editors' Picks All Ebooks. Explore Audiobooks.
Bestsellers Editors' Picks All audiobooks. Explore Magazines. Editors' Picks All magazines. Explore Podcasts All podcasts. Difficulty Beginner Intermediate Advanced. Explore Documents. Data Warehouse. Uploaded by Mohamed Zaitoon. Document Information click to expand document information Description:. Did you find this document useful? Is this content inappropriate? Report this Document.
Flag for inappropriate content. Download now. Related titles. Carousel Previous Carousel Next. Building Failover with Oracle standard Edition and file watcher. Jump to Page. Search inside document. Data may be: 1.
0コメント