Wednesday, February 23, 2011

OLAP Reports using C# in .net application

If you are familiar with BI tool and OLAP system, you know that it is inefficient to perform same join operation multiple times for different reports which use same joining. Also, if same report is visited by many users then loading same data in different memory location is also inefficient. For this reason, BI tools are using OLAP cube & universe. Here I will show you, how can you do this by using singleton pattern in C# & .net framework.

The main logic behind this is to create a singleton class which loads the data of universe/cube just once. So, different session & different format of a set of data will read the database just once and keep them in memory. Then same dataset will be used among all sessions and similar reports. Yes, it requires large memory allocation, but it is highly scalable. If you have to do this type of reports and you are not doing this, your BI reporting looses scalability surely. Here is the sample code,

namespace WebSingleton
{
public sealed class LoadUniverse
{
// Static members are 'eagerly initialized', that is,
// immediately when class is loaded for the first time.
// .NET guarantees thread safety for static initialization
private static readonly LoadUniverse _instance = new LoadUniverse();

private DataSet _universe;

// Note: constructor is 'private'
private LoadUniverse()
{
//
// TODO: Init universe data by loading different datatable of dataset
//using joining operation
//
}

public static LoadUniverse Instance
{ get { return _instance; } }

public DataSet Universe
{ get { return _universe; } }

}
}

So, when you design reports, you should call the "Universe" property of the object of the singleton class. As it is singleton, in all modules of your reporting application same object reference is used and the private constructor will be called just once. To load the the datatable of dataset, you should use complex query which joins your fact and dimension tables. This will help you a lot if your database is designed for operational (OLTP) system but you have to perform reporting (OLAP) on same database. Now, from the datatable you can load your own generic list List and use LINQ to run the query on your universe/OLAP cube if you need further filter or aggregation.

Another thing I should mention which is if I don't construct forcefully at the starting of application, the object will be lazily constructed. Thus, the first report viewer sees the first time report loading very slow. So, you can create it in global.asax in following manner -

void Application_Start(object sender, EventArgs e)
{
// Code that runs on application startup
WebSingleton.LoadUniverse loadUniverse = WebSingleton.LoadUniverse.Instance;
}

Some developers may use static class instead of singleton pattern. But it should not be used. http://www.dotnetperls.com/ discusses the comparison between these two nicely.

Wednesday, February 9, 2011

Soft Parse, Bind Variable & dataobj_to_partition in OLAP system

Usage of bind variable for soft parsing is key for application performance. It is nicely discussed in http://www.akadia.com. But most of the time it is focused for OLTP system. You can find this in
http://www.oracle-base.com/. It is also very much helpful to prevent SQL injection attack if you are using dynamic SQL.

But I have found it very much useful in Oracle 11g Data warehouse also. Oracle 11g comes with a new feature adaptive cursor sharing which you can find in http://www.oracle-base.com/. This feature gives the real benefit of soft parsing. By this your query execution plan will adapt depending on query and data. Thus, your query should not deviate because of not having proper statistics or in other way you don't need to spend so much time to gather statistics in your ETL procedures.

But in data warehousing environment, most of the bulky tables are partitioned. If you don't prune your table using partition, your ETL must suffer. If you are using execute immediate, you can write your sql in following manner -

execute immediate 'select count(*) from schema.tbl partition (dataobj_to_partition(shema."tbl",:1))' into tmp_var using $object_id

Here $object_id is the partition id of the table "tbl". Thus using bind variable, you can increase your OLAP system performance also.

Use V$SQLAREA to know SQL Processing Time

Oracle Database allocates memory from shared pool (Library Cache) when a new SQL statement is parsed, to store in the shared sql area. The size of memory depends on the complexity of the SQL. If the entire Library Cache (LC) is already occupied, Oracle has deallocated items from LC using least recently used (LRU) algorithm. So if your SQL has not been edged out yet, you can find it in v$sqlarea or gv$sqlarea (which is important for RAC environment).

v$sqlarea has very precious information which are important for sql tuning. Among these which I find very much useful are
  • SQL_ID (sql id for your sql. This value should be compared to identify soft parse or hard parse).

  • PLAN_HASH_VALUE (ID of Query Execution plan).

  • FIRST_LOAD_TIME (Date & time when this SQL first loaded in LC).

  • LAST_LOAD_TIME (Last Date & time when this SQL is reused from LC i.e. soft parsed).

  • LAST_ACTIVE_TIME (Actually these SQL are the implicit cursors that are sent for Private SQL AREA. This time indicates the last time this cursor is fetched or other way it can be said time at which the query plan was last active).

  • OPTIMIZER_COST (Cost given Oracle Query optimizer).
So, (Last_Active_Time - Last_Load_Time)*24*60 will give you the value of time elapsed to process the sql (including fetch, parse & execute) approximately.

If you want to know this for an sql like - insert into tbl_my_sample ..., you may write the following sql -

select sql_id,plan_hash_value, last_load_time, last_active_time,(last_active_time-last_load_time)*24*60 interval_in_min, OPTIMIZER_COST from gv$sqlarea where sql_text like 'insert into tbl_my_sample%' order by last_active_time desc

Description of V$SQLAREA can be found in http://download.oracle.com/docs/