Tuesday, October 4, 2011

High Level Steps of Creating Dash board in BO XI R2

Here are the steps -

1. Build a metic universe & Export it to performance management
  • This universe need not have any calendar.
  • All the measures must have a self join i.e. under where clause the event date should be mentioned between start date and end date.
    Example - DATE_ID between @Prompt('Begin_Date','D',,mono,free)
    and @Prompt('End_Date','D',,mono,free)
  • select clause should not contain any aggregate function like sum, avg, etc.
  • Import the universe in performance management -
    Performance management>Setup>System Setup>Universes>Add Universe
  • Select the time dimension as you required from time configuration -
    Use Performance management>Setup>Time Config
2. Create/Edit Metrics
  • Create the Metrics -
    Performance management>Dashboard Manager>Metrics>
  • Select proper calculation interval
  • Check "Stop at current period" to stop calculation till today. This helps you to see the dashboard till today.
  • After creation of Metrics, don't forget to refresh it. Otherwise, you will not be able to see the data.
  • Use purge and refresh after editing the calculation period.
3. Create Metric Analytics
  • Create the analytics -
    Performance management>Dashboard Manager>Metric Analytics>
  • Use "Save as" to save it in proper folder
  • Use proper render style and graph properties
4. Create A dashboard
  • Add a dashboard
  • Save the dashboard in proper directory
5. Place the Metric Reports on dashboard
  • Open the dashboard in the directory.
  • Select customize then pick the analytics that you want to see in your dashboard

Monday, September 5, 2011

Solution of File-based Data Collection in Exadata

When I use Oracle Exadata Database in 2010, it was the first instance of this awesome product in South Asia. The biggest problem that we face during Data Warehouse implementation on this machine was that Oracle forbid us to collect file-based data on Exadata. This is because -
  • There is no space available in local disk of the RAC instances as this may create problem such as no space available or instance high availability.
  • External storage of Exadata are used as Automatic Storage Management (ASM). I.e., from Oracle Enterprise Linux (OEL) these are raw disks and the oracle database engine manage these storage. So, these storage cannot be accessed in conventional way.
The solution of this problem can be solved in two ways -
  1. Read External table using DB Link
  2. Store data in a database table using sql loader then read the table using DB Link. The problem in this solution and its previous solution is that you cannot use the parallel processing of Exadata
  3. Use Database File System (DBFS). This is the ultimate solution as in this case external tables can be accessed in parallel manner. So, the extreme performance of exadata can be utilized.
Actually configuring DBFS is fully part of System & DB Admin job. They configure the DBFS server in Exadata and client in the staging area OS. We, the ETL team, mainly use the client of DBFS. Here are the steps -
  1. At first, run the mount script in client which connects to DBFS service (not database service) in Exadata. This script actually run the DBFS client with the following parameter -

    dbfs_client dbfs_user/dbfs_pass@SRV_DBFS -o allow_root -o direct_io /ORABASE/DBFS_DIR

  2. After execution of above script mount point in DBFS server will be available under mount point of DBFS client. Note that the original name of this mount point can be different in DBFS server. As an example in our case,

    In DBFS server the mount point is /u01/app/DBFS_DIR/
    In DBFS client the mount point is /ORABASE/DBFS_DIR/

    So, now if you create a directory in DBFS server under the server mount point, the directory will be available in same name under DBFS client mount point. The system & db admin will ensure that both server and client should be in same user group so that no privilege related problem will appear during accessing the directories within DBFS mount point.

  3. One advice - please collect the files in the client's own mount point then copy those files to DBFS client mount point. Its performance is optimum.

  4. Now rest of the things are within Exadata server. Create Oracle Directory using OS directory under DBFS server mount point which is visible to OEL if DBFS server is up and running.

  5. Create an external table in Exadata database server using the above database directory.
Thus, you can create and use External table within Exadata and can be benefited by its power.

Wednesday, June 15, 2011

Calculation Context : Ulmitate Solution for Complex Query in Business Object

Calculation contexts give you more control over how a formula or measure is evaluated. To understand calculation contexts, you need to be familiar with basic report concepts.

A report contains two kinds of objects:

  • dimensions - which are types of data about your business that can have measures associated with them (for example: products, years, states)

  • measures - that you can calculate in relation to dimensions (for example: sales revenue, number of sales).

For example, a report could show sales revenue (a measure) by year (a dimension). In this case the sales revenue results are calculated by year.

The important thing to remember about measures is that they are semantically dynamic. This means that the results returned by a measure depend on the dimension(s) with which the measure is associated. In other words, the results returned by a measure depend on the context in which the measure is placed.

WebIntelligence calculates measures according to default contexts depending on where the measures appear in a report. However, you can change these default contexts. This is what is meant by defining the calculation context. You define the calculation context by including context_operators and context_keywords (Report, Section, Break, Block and Body) in your formulas.

A calculation has an input context and an output context. You can specify neither, either or both explicitly. The input context is the set of dimensions used to make the calculation. The output context is a set of dimensions that functions like a break in the calculation - it causes the calculation to output values as if they are calculated in the footer of a table break (or an SQL COMPUTE BY clause).

Example

The final column in the table below is calculated as follows:

calculation - minimum

input context - year, quarter

output context - year

This combination of input and output contexts tells Web Intelligence to sum the revenue for each year/quarter combination (the dimensions used to make the calculation), then output the minimum such value in each year ( the value that appears in the break footer if the table contains a break on Year and a minimum calculation in the break footer).

Year

Quarter

Product

Revenue

Minimum

2005

Q1

Product 1

10000

41000

2005

Q1

Product 2

15000

41000

2005

Q1

Product 3

16000

41000

2005

Q2

Product 1

17000

41000

2005

Q2

Product 2

17000

41000

2005

Q2

Product 3

16000

41000

2006

Q1

Product 1

14000

33000

2006

Q1

Product 2

15000

33000

2006

Q1

Product 3

16000

33000

2006

Q2

Product 1

12000

33000

2006

Q2

Product 2

11000

33000

2006

Q2

Product 3

10000

33000

This is how the calculation looks when made explicit in a table:

Year

Quarter

Revenue

2005

Q1

41000

2005

Q2

50000


Min:

41000

2006

Q1

45000

2006

Q2

33000


Min:

33000

In Web Intelligence formula syntax, this can be expressed as either

Min ([Revenue] In ([Year];[Quarter])) In ([Year])

or

Min ([Revenue] ForAll ([Product])) In ([Year])

The input context appears inside the function parentheses and the output context appears after the function parentheses. The set of dimensions in each context is itself enclosed in parentheses and the dimensions separated by semi-colons.

You use the context operators In, ForEach and ForAll to set up input and output contexts. You can also use context keywords to specify the data referenced in the contexts.

For more information, please download the manual from help.sap.com and read chapter 3 - Understanding calculation contexts.

Sunday, May 29, 2011

Pagination in Data View : How to Facilitate My SQL Limit in Oracle

The solution provided here probably is not new to many. I try to combine the optimum solution that I know in one post.

Those who are acquainted with My SQL, surely knows about LIMIT keyword in My SQL. Oracle does not have LIMIT keyword feature. But you can make the pagination of Data View better using two feature of Oracle - ROWNUM and REF CURSOR.

create or replace procedure paging(p_pagenum IN number,p_pagesize IN number, p_cursor OUT SYS_REFCURSOR)is
begin
open p_cursor FOR
SELECT DATA
FROM
(SELECT ROWNUM RNUM,a.*
FROM
(
SELECT
object_name data
FROM
TBL_SOURCE
ORDER BY created
) A
WHERE ROWNUM<=p_pagesize*p_pagenum )
WHERE RNUM > (p_pagesize*p_pagenum) - p_pagesize
ORDER BY RNUM;
end paging;

This procedure will be sufficient for pagination. It is somewhat complex due to the fact that ROWNUM is a pseudo column. From the select statement it is clear that the first page data loading will be the fastest and the last page data loading will be the slowest. This is because, for the last page, you have to fetch all rows.

Next comes the REF_CURSOR. It is a handler of the source data like other cursors. In stead of fetching all data, it fetches data one by one and provides streaming. But unlike other cursor this cursors can be consumed by other pl/sql or high level languages like C# or java.

See also - http://www.oradev.com/ref_cursor.jsp and http://support.microsoft.com/kb/322160

If you only use ROWNUM and do not use REF_CURSOR, you will still get better performance. In that case, you have to write inline SQL in java or C# code.

Tuesday, March 29, 2011

Oracle External Table with strong data type & datetime mask

Those who are familiar with Oracle Data Warehousing and ETL must be familiar with Oracle External Table. It is actually nothing but showing the file content in Oracle table by SQL Loader parsing.

I have seen some developers who always use external table without any strong data type i.e. they keep data types of all columns as varchar/varchar2. So, though the data should be datetime field or number field they put the data type of that field as varchar2 and do the type casting in stored procedures or views. This is certainly a bad practice due to the following reasons -

  1. Explicit type casting is slower than the parsing time type imposing.

  2. If data quality is not good it will be horrible to use varchar2 as all columns types. In case of using strong data type column, the rows in the files that do not meet the type constraints will go to the bad record and you will find the reason in log file why these records are discarded. While in case of weak type column, you will get lost during type casting. Because, these type miss-match records are loaded in the external table not in the bad file and during type casting in procedures or views you will get exception and you cannot identify which column and row in the file causes this problem.
Using strong data in external table is not at all a difficult task. You can easily do it by mentioning the type in external table definition. Your file access parameter can be same; you just need to change the column type in the external table. But there is one exception and that is the date type column.

For date type column, you need to change the access parameter also. It is not at all complex rather very easy. Just instead of using char as the field type in file access parameter use date and mention the date mask. As an example, to load date type data in date type column of external table we can use -

EVENT_TIME date 'YYYYMMDDHH24MISS'
instead of
EVENT_TIME char
where the sample data in this field is - 20110522180652
You see in the example date mask takes same pattern mask as the to_date() function. For more detail you can visit download.oracle.com and forums.oracle.com.

Friday, March 4, 2011

Task scheduler in Business Object XI

The purpose of task schedule in BO (or any other BI tool) is very much significant. By scheduling and viewing instances (An instance is a version of the object or report that contains data from the time that the object was run), you can ensure that the objects have the most up-to-date information available for viewing, printing, and distributing. Its purpose can be -
  1. regular (or periodic) report refreshing so that the user will not face any delay when he sees the report or dash board.
  2. checking the data integrity of data mart on regular basis without running the sql manually.
  3. sending email attaching the report on regular basis.
Here are the steps:-
  • Configure your job server from -
    administration launch pad > central management console > servers> respected server (e.g. Web_IntelligenceJobServer).
    Select proper destination. As an example, if you want to use email, you have to configure mail template and SMTP account configuration.
  • Go to info view and find your report and click on the schedule hyperlink. Then schedule time (daily, monthly, ...), destination (default location, inbox or email group configured from job server).
  • You have to make sure that you user proper OS authentication configure. You can check it from -
    Performance Management > Set up > Parameters > Scheduler Parameters > Scheduler Parameter.
  • You can also troubleshoot your scheduler from -
    Performance Management > Set up > Tools >Check & Cleanup > Scheduled Program
For more detail, visit the http://www.sdn.sap.com and download the guide book.

Wednesday, February 23, 2011

OLAP Reports using C# in .net application

If you are familiar with BI tool and OLAP system, you know that it is inefficient to perform same join operation multiple times for different reports which use same joining. Also, if same report is visited by many users then loading same data in different memory location is also inefficient. For this reason, BI tools are using OLAP cube & universe. Here I will show you, how can you do this by using singleton pattern in C# & .net framework.

The main logic behind this is to create a singleton class which loads the data of universe/cube just once. So, different session & different format of a set of data will read the database just once and keep them in memory. Then same dataset will be used among all sessions and similar reports. Yes, it requires large memory allocation, but it is highly scalable. If you have to do this type of reports and you are not doing this, your BI reporting looses scalability surely. Here is the sample code,

namespace WebSingleton
{
public sealed class LoadUniverse
{
// Static members are 'eagerly initialized', that is,
// immediately when class is loaded for the first time.
// .NET guarantees thread safety for static initialization
private static readonly LoadUniverse _instance = new LoadUniverse();

private DataSet _universe;

// Note: constructor is 'private'
private LoadUniverse()
{
//
// TODO: Init universe data by loading different datatable of dataset
//using joining operation
//
}

public static LoadUniverse Instance
{ get { return _instance; } }

public DataSet Universe
{ get { return _universe; } }

}
}

So, when you design reports, you should call the "Universe" property of the object of the singleton class. As it is singleton, in all modules of your reporting application same object reference is used and the private constructor will be called just once. To load the the datatable of dataset, you should use complex query which joins your fact and dimension tables. This will help you a lot if your database is designed for operational (OLTP) system but you have to perform reporting (OLAP) on same database. Now, from the datatable you can load your own generic list List and use LINQ to run the query on your universe/OLAP cube if you need further filter or aggregation.

Another thing I should mention which is if I don't construct forcefully at the starting of application, the object will be lazily constructed. Thus, the first report viewer sees the first time report loading very slow. So, you can create it in global.asax in following manner -

void Application_Start(object sender, EventArgs e)
{
// Code that runs on application startup
WebSingleton.LoadUniverse loadUniverse = WebSingleton.LoadUniverse.Instance;
}

Some developers may use static class instead of singleton pattern. But it should not be used. http://www.dotnetperls.com/ discusses the comparison between these two nicely.

Wednesday, February 9, 2011

Soft Parse, Bind Variable & dataobj_to_partition in OLAP system

Usage of bind variable for soft parsing is key for application performance. It is nicely discussed in http://www.akadia.com. But most of the time it is focused for OLTP system. You can find this in
http://www.oracle-base.com/. It is also very much helpful to prevent SQL injection attack if you are using dynamic SQL.

But I have found it very much useful in Oracle 11g Data warehouse also. Oracle 11g comes with a new feature adaptive cursor sharing which you can find in http://www.oracle-base.com/. This feature gives the real benefit of soft parsing. By this your query execution plan will adapt depending on query and data. Thus, your query should not deviate because of not having proper statistics or in other way you don't need to spend so much time to gather statistics in your ETL procedures.

But in data warehousing environment, most of the bulky tables are partitioned. If you don't prune your table using partition, your ETL must suffer. If you are using execute immediate, you can write your sql in following manner -

execute immediate 'select count(*) from schema.tbl partition (dataobj_to_partition(shema."tbl",:1))' into tmp_var using $object_id

Here $object_id is the partition id of the table "tbl". Thus using bind variable, you can increase your OLAP system performance also.

Use V$SQLAREA to know SQL Processing Time

Oracle Database allocates memory from shared pool (Library Cache) when a new SQL statement is parsed, to store in the shared sql area. The size of memory depends on the complexity of the SQL. If the entire Library Cache (LC) is already occupied, Oracle has deallocated items from LC using least recently used (LRU) algorithm. So if your SQL has not been edged out yet, you can find it in v$sqlarea or gv$sqlarea (which is important for RAC environment).

v$sqlarea has very precious information which are important for sql tuning. Among these which I find very much useful are
  • SQL_ID (sql id for your sql. This value should be compared to identify soft parse or hard parse).

  • PLAN_HASH_VALUE (ID of Query Execution plan).

  • FIRST_LOAD_TIME (Date & time when this SQL first loaded in LC).

  • LAST_LOAD_TIME (Last Date & time when this SQL is reused from LC i.e. soft parsed).

  • LAST_ACTIVE_TIME (Actually these SQL are the implicit cursors that are sent for Private SQL AREA. This time indicates the last time this cursor is fetched or other way it can be said time at which the query plan was last active).

  • OPTIMIZER_COST (Cost given Oracle Query optimizer).
So, (Last_Active_Time - Last_Load_Time)*24*60 will give you the value of time elapsed to process the sql (including fetch, parse & execute) approximately.

If you want to know this for an sql like - insert into tbl_my_sample ..., you may write the following sql -

select sql_id,plan_hash_value, last_load_time, last_active_time,(last_active_time-last_load_time)*24*60 interval_in_min, OPTIMIZER_COST from gv$sqlarea where sql_text like 'insert into tbl_my_sample%' order by last_active_time desc

Description of V$SQLAREA can be found in http://download.oracle.com/docs/