Friday, December 27, 2013

Incorporate t-SQL Merge Operation in SSIS

SQL Server introduced in SQL Server 2008. It is very useful statement specially for data warehousing environment. So, it is very likely that you will need it in SSIS project. The can be two solution to incorporate in SSIS.

First One:

Using the merge join operator, do the followings:

  1. Merge Join - Source Left join Destination
  2. Conditional Split -
    • Destination ID is null - "OLEDB Destination"
    • Destination ID is not null - "OLEDB Command". Use update operation within "OLEDB Command"


Second One:

Using the "Slowly Changing Dimension" operator, do the followings -

  1. Add the "Slowly Changing Dimension" and configure it using wizard
  2. Using OLEDB Connection, choose the destination table and set its ID as "Business key"
  3. After clicking next, select all other destination columns and select the Change Type as "Changing attribute"
  4. Keep all others default.


If you compare these two figures, you will find that how easy it is. The SCD really does a lot for you. It encapsulates sort, merge join and conditional split and auto generate the OLE DB command. So, no programmer forgets to use it instead of using the first option or calling script task from SSIS and make the code less programmer friendly.

If the table size is large, then the first option should be used. On the other hand, if the table size is small, performance of first and second option is almost same. For detail, please read this article -
http://chrisjarrintaylor.co.uk/

Wednesday, May 2, 2012

Oracle Bulk Collect & PL/SQL Collection: Ultimate Way to Speedup Fetching Cursors

PL/SQL should be used only when the work cannot be performed using SQL as SQL engine is faster than PL/SQL engine. Cursors reside at the heart of PL/SQL and it is one of the most eminent reason of slow performance of PL/SQL. But most of the time you can speedup the cursor fetching for large dataset using Bulk Collect.

This option enables you to retrieve multiple rows of data in a single request. The retrieved data is then populated into a collection variables. This query runs significantly faster than if it were done without the bulk collect. The operations on collection variables are in memory operations and thus much much faster. There are three types of PL/SQL collection variables - Nested Tables, VArrays and Associative Arrays.

Here I will show you an example of bulk collect using nested table to speed up fetching. In the sample code, I will use sys_refcursor instead of declaring any explicit cursor. It is a week type cursor available from Oracle 9i. There is also another predefined cursor from Oracle 9i which is refcursor_pkg.strong ref_cursor. Here is the PL/SQL code:

declare


TYPE type_base_rec IS RECORD (cx NUMBER, event_date DATE,
balance NUMBER);
--instead of creating any type,
--I use record and it will be available only within PL/SQL block
TYPE type_base_tbl IS TABLE OF type_base_rec;
ref_cur_base SYS_REFCURSOR;

begin
OPEN ref_cur_base FOR 'select c1,c2,c3 from Tab';
FETCH ref_cur_base BULK COLLECT INTO base_data;
CLOSE ref_cur_base;
end;
/

You can use limit option to limit the number of rows to be fetched. Like -

FETCH ref_cur_base BULK COLLECT INTO base_data limit 5000;

Now, you can use the collection methods on the collection variables. Most of the time, you need count, First, last, trim, extend, delete or indexing like base_data(i). A common example where bulk collect can speed up your operation is writing files from database.

Monday, April 16, 2012

OWB Dimensional Modeling to Oracle RDBMS Object Translation

OWB Dimensional Modeling to object translation - This is a crucial thing in data warehousing. The reasons are -

  • You should know the objects which will be modified during ETL operation. It will be helpful for sql tuning and monitoring. You can understand execution plan well and do the necessary things.
  • It is also necessary from DBA perspective. Not only from storage perspective, is it also necessary to utilize parallelism.

In this discussion, we mainly focus on ROLAP. This is because for MOLAP most of dimensions and facts are managed by Oracle itself within the Analytical Workspace (AW). At first we start with dimension. A dimension will be translated in the following objects -

  • Table - Table will be available in Physical Binding. One dimension may create more than one table depending on implementation. If you choose star implementation, just one table will be created and if you choose snowflake implementation then for each of the level, one table will be created. Note that, in dimensional modeling you have to create at least one level for each dimension.
  • Level & Hierarchy - As mentioned in previous paragraph, every dimension has at least one level. For each dimension, one dimension object must be created. This object has the levels and hierarchy. Hierarchy is optional. If your dimension has hierarchy then it will be reflected in hierarchy of dimension object.
  • Surrogate Key - It is available in attributes tab. For each surrogate key, one sequence will be created.
It is well known that in data warehouse there are three types of slowly changing dimension (SCD). If you are using Type 2 or Type 3 SCD, then you have to define mapping for the dimension. In this case, there will be different input table which maps data to dimension output table. The mapping is always converted to pl/sql procedure.

Next comes the fact table. As we discuss ROLAP model, the fact translation is simple.

  • One table is created for each fact.
  • For each dimension, one bitmap index will be created.
  • For each dimension, one foreign key will be created also.
Multi-dimensional OLAP:
Within the context of the relational database, the AW is a Large Object (LOB) that is a multidimensional data type. It is stored in a column in a relational table and this table will be a partitioned table. The name of the relational table will be AW$workspacename. All dimensions and facts are maintained internally within this LOB column.

Sunday, April 8, 2012

Data Warehous Metadata Management in OWB

Metadata is all the information in the data warehouse environment that is not the actual data itself. We have operational source system metadata including source schemas and copybooks that facilitate the extraction process. Once data is in the staging area, we encounter staging metadata to guide the transformation and loading processes, including staging file and target table layouts, transformation and cleansing rules, conformed dimension and fact definitions, aggregation definitions, and ETL transmission schedules and run-log results. Even the custom programming code we write in the data staging area is metadata. So, metadata management is a crucial thing in any Data Warehouse.

Oracle Warehouse Builder gives the following facility for metadata management -

  1. Source Metadata:
    This includes importing of source metadata, data profiling and attribute analysis. OWB has all these capabilities fully. OWB has Database module, File module and Data Profiling module. File module can sample data. Data profiling module is very much helpful for attribute analysis and to get direction of your data modeling.

  2. Staging Metadata and Transformation:
    Staging data is designed through Database module. Mapping of staging data can be designed using can built in components (e.g. joiner, table operator, etc.) of mapping. Staging data can be designed through custom programming code also. For this instead of mapping, you have to use transformation. In custom programming, you have to maintain proper comments to maintain metadata. So, in this case, you yourself have to maintain the metadata.

  3. Process flow & Schedule Metadata:
    Process flow is maintained by Process Flow module. By this module you can design the sequence of process flow in GUI which is very much comprehensible. You can document success flow, error flow and dependency of processes in this module. Schedule metadata is managed by Schedule/Calendar module. Here you define a schedule and assign this to one or many process flow.

  4. Dimensional Modeling Metadata:
    This metadata is also maintained by Database module. If you design dimension and cube (ROLAP/MOLAP) by OWB, then all your dimensional modeling metadata will be maintained within OWB. Here the dimensional modeling will not be so much visible as you find in ER studio or similar tool but the modeling for a fact will be well documented in tabular form. That is for a cube you can see the dimensions and their levels in a tab, measures and aggregation in other tabs.

  5. Data Linage and dependency Metadata:
    This is another nice feature of OWB. Here you can see the linage and impacts of objects (like tables, file, dimension or cubes) under database module. Linage of an objects shows you from which objects this object has been derived and impact shows you which objects are dependent on this object. This is very helpful for change management. You can plan the impact of change of any object by managing this metadata.
These are the overall metadata management facilities in OWB.

Wednesday, March 28, 2012

Basic Administration Steps of Oracle 11g Warehouse Builder

In Oracle Warehouse Builder (OWB) ETL consists of two main areas:

  1. Mapping (or Data Flow Diagrams)
  2. Processes (or process flow diagrams which contain multiple mappings)
So, Administration also consists of two parts.

Detail information will be found at http://docs.oracle.com/. Here is the summary -

Basics (This is for Mapping)

After installing OWB, you will get two users - OWBSYS, OWBSYS_AUDIT. You have to unlock these users.

  1. Create Repository (which is called workspace in Oracle 11g) and make a user as an owner using Ropository Assistant
  2. Administrate the OWB using the repository browser. For this type of administration, use the user who is also owner of the repository. To use repository browser, you have to start the listener -
    • start the OWB browser listener - owb_home\owb\bin\win32\startOwbbInst.bat
    • stop the OWB browser listener - owb_home\owb\bin\win32\stopOwbbInst.bat
    • start the Web Repository Browser - owb_home\owb\bin\win32\openDB.bat
  3. Register other users to this repository
  4. Start/Stop Control Center Service when necessary

Extended (This is for process flow)

  1. Enable Integration with Oracle Workflow i.e. Installation of Oracle Work Flow. In Oracle 11g, the installation s/w is provided. But for previous version you will need Oracle Companion CD. Location of Installer for Oracle 11g - ORACLE_HOME/owb/wf/install> wfinstall.bat
  2. After installation a user will be created - OWF_MGR. Register the Work flow user, OWF_MGR in workspace - OWB_USER & execute any procedure
  3. Control Center Agent (CCA) manages deployment and execution of the mappings that are based on Code Templates. CCA is not part of the Control Center service hosted within the Oracle Database. It is very much useful for Control Center Agent topology. In non-Oracle database host, this is the only part of OWB that can be used.
You can see also http://gerardnico.com/wiki/owb/owb.

If you want to create and maintain Analytical Workspace (AW) in your database, then you must have two more roles. They are - OLAP_USER and OLAP_DBA. OLAP_USER role allows the user to create AW in its own schema. But for other, you have to utilized OLAP_DBA role.

Wednesday, February 29, 2012

Analytic SQL Function: A Solution for Well Managed ETL in ROLAP

Analytic SQL is not at all a new feature of Oracle. It is available from Oracle 8i. Still it is very much useful in data analysis specially if your data mart is designed on Relational OLAP. But if you are proficient with analytic tool like Analytic Workspace Manager, then you may not need to use analytic SQL so much.

I found analytic SQL very much useful in several data mart ETL. Here I will show an example how it will simplify you SQL. Suppose, you want to calculate the best performance location of every customer from the sales report. Its very simple if you use analytic SQL. Here is it -

select distinct cx, first_value(location) over (partition by cx order by rev desc,location) location
from (select cx,location,sum(cost) rev
from sales group by cx,location)


At first step, you get the aggregate data. Then run your analytic SQL on this aggregate data. This is because, analytic function works well on aggregate data. In the above example, the steps are same. At the beginning get the customer and location wise revenue.

Then partition your data based on customer and in each partition sort the data primarily based on revenue, then based on location in descending manner. So, the first row of each partition must be the highest revenue considering location wise segregation. If two location contains location wise maximum revenue, then get the location whose name comes first in lexical sorting.

Here is a link for the beginner of analytic sql function -

http://www.oracle-base.com/.

I have found AVG, FIRST_VALUE, LAST_VALUE, RANK, ROW_NUMBER, CORR, NTH_VALUE to be most useful. Specially, ROW_NUMBER removes the limitation of rownum and it can be used for data paging.

Wednesday, January 4, 2012

Replacing of complex LIKE with REGEXP_LIKE from Oracle 10g

Oracle Database 10g offers four regular expression functions. You can use these equally in your SQL and PL/SQL statements. Among these REGEXP_LIKE is the most useful. Like LIKE operator it returns boolean values. Thus this can be easily used in WHERE and HAVING clause. The header of this function is -

REGEXP_LIKE(source, regexp, modes)

To know more about this function visit http://www.oracle-developer.net/.

Also here is a recommended site for regular expression http://www.regular-expressions.info

The third parameter is optional and it is rarely used.

Here I will show you how it makes the query simple and faster -

'^(880|88018)?7777[0-9]{0,2}$' - this regular expression matches every starts with 880 or 88018 or none of these, then 7777 then ends or ends after one or two digits. So it matches the following strings -

8807777, 880187777, 7777, 77776, 777758.

So, if you want to query a database table with this filter on a column, you can write the following query -

select * from table where regexp_like(colA, '^(880|88018)?7777[0-9]{0,2}$')

or

select * from table where colA = '7777' or colA='8807777' or colA='880187777'
or colA like '7777_' or colA like '8807777_' or colA like '880187777_'
or colA like '7777__' or colA like '8807777__' or colA like '880187777__'

Thus regexp_like() function makes the query simple, comprehensive and easy to manage. In some case you may find it is absolutely hard to write query using like where that can be easily written using regexp_like().