Monday, April 16, 2012

OWB Dimensional Modeling to Oracle RDBMS Object Translation

OWB Dimensional Modeling to object translation - This is a crucial thing in data warehousing. The reasons are -

  • You should know the objects which will be modified during ETL operation. It will be helpful for sql tuning and monitoring. You can understand execution plan well and do the necessary things.
  • It is also necessary from DBA perspective. Not only from storage perspective, is it also necessary to utilize parallelism.

In this discussion, we mainly focus on ROLAP. This is because for MOLAP most of dimensions and facts are managed by Oracle itself within the Analytical Workspace (AW). At first we start with dimension. A dimension will be translated in the following objects -

  • Table - Table will be available in Physical Binding. One dimension may create more than one table depending on implementation. If you choose star implementation, just one table will be created and if you choose snowflake implementation then for each of the level, one table will be created. Note that, in dimensional modeling you have to create at least one level for each dimension.
  • Level & Hierarchy - As mentioned in previous paragraph, every dimension has at least one level. For each dimension, one dimension object must be created. This object has the levels and hierarchy. Hierarchy is optional. If your dimension has hierarchy then it will be reflected in hierarchy of dimension object.
  • Surrogate Key - It is available in attributes tab. For each surrogate key, one sequence will be created.
It is well known that in data warehouse there are three types of slowly changing dimension (SCD). If you are using Type 2 or Type 3 SCD, then you have to define mapping for the dimension. In this case, there will be different input table which maps data to dimension output table. The mapping is always converted to pl/sql procedure.

Next comes the fact table. As we discuss ROLAP model, the fact translation is simple.

  • One table is created for each fact.
  • For each dimension, one bitmap index will be created.
  • For each dimension, one foreign key will be created also.
Multi-dimensional OLAP:
Within the context of the relational database, the AW is a Large Object (LOB) that is a multidimensional data type. It is stored in a column in a relational table and this table will be a partitioned table. The name of the relational table will be AW$workspacename. All dimensions and facts are maintained internally within this LOB column.

Sunday, April 8, 2012

Data Warehous Metadata Management in OWB

Metadata is all the information in the data warehouse environment that is not the actual data itself. We have operational source system metadata including source schemas and copybooks that facilitate the extraction process. Once data is in the staging area, we encounter staging metadata to guide the transformation and loading processes, including staging file and target table layouts, transformation and cleansing rules, conformed dimension and fact definitions, aggregation definitions, and ETL transmission schedules and run-log results. Even the custom programming code we write in the data staging area is metadata. So, metadata management is a crucial thing in any Data Warehouse.

Oracle Warehouse Builder gives the following facility for metadata management -

  1. Source Metadata:
    This includes importing of source metadata, data profiling and attribute analysis. OWB has all these capabilities fully. OWB has Database module, File module and Data Profiling module. File module can sample data. Data profiling module is very much helpful for attribute analysis and to get direction of your data modeling.

  2. Staging Metadata and Transformation:
    Staging data is designed through Database module. Mapping of staging data can be designed using can built in components (e.g. joiner, table operator, etc.) of mapping. Staging data can be designed through custom programming code also. For this instead of mapping, you have to use transformation. In custom programming, you have to maintain proper comments to maintain metadata. So, in this case, you yourself have to maintain the metadata.

  3. Process flow & Schedule Metadata:
    Process flow is maintained by Process Flow module. By this module you can design the sequence of process flow in GUI which is very much comprehensible. You can document success flow, error flow and dependency of processes in this module. Schedule metadata is managed by Schedule/Calendar module. Here you define a schedule and assign this to one or many process flow.

  4. Dimensional Modeling Metadata:
    This metadata is also maintained by Database module. If you design dimension and cube (ROLAP/MOLAP) by OWB, then all your dimensional modeling metadata will be maintained within OWB. Here the dimensional modeling will not be so much visible as you find in ER studio or similar tool but the modeling for a fact will be well documented in tabular form. That is for a cube you can see the dimensions and their levels in a tab, measures and aggregation in other tabs.

  5. Data Linage and dependency Metadata:
    This is another nice feature of OWB. Here you can see the linage and impacts of objects (like tables, file, dimension or cubes) under database module. Linage of an objects shows you from which objects this object has been derived and impact shows you which objects are dependent on this object. This is very helpful for change management. You can plan the impact of change of any object by managing this metadata.
These are the overall metadata management facilities in OWB.