Monday, January 24, 2011

Execute Procedure in Background - Dynamic Scheduling

To execute procedure in background, most of the time you will be advised to execute the procedure using OS shell. This is not practical if you need it frequently or you have no privilege on OS server.

In Data warehousing environment, often we want to execute several transformations based on same operational data. That is, after loading the operational data in staging area, we have to do several transformation on this raw data. The obvious solution of this problem is to put the transformations in different procedure and schedule them depending on raw data loading. In this case you have to estimate the raw data loading acutely. Otherwise, if you execute a single procedure and within that call the three procedures (one raw data load and two transformation) sequentially, then you have to wait for the other transformation.

In stead all of the above options, you can schedule the two transformation after the raw data load with the main procedure to run them in parallel. For this you have to schedule the two transformations and pass null value as interval parameter in DBMS_JOB.SUBMIT procedure call. As interval is set to null this procedure will be scheduled to run only once. So after execution of the procedure you will not find these jobs dba_jobs view.

You can use the following procedure for this -

CREATE OR REPLACE PROCEDURE proc_run_bg(proc_name_with_param varchar2, delay_time number default -1) IS
job_id NUMBER;
BEGIN
--schedule the procedure to run it after the delay time mentioned in parameter

DBMS_JOB.SUBMIT
(
job => job_id
,what => proc_name_with_param||';'
,next_date => sysdate+(delay_time/(24*60))
,interval => NULL
);

commit;

END proc_run_bg;
/

A sample procedure utilizing this proc_run_bg procedure -


CREATE OR REPLACE PROCEDURE proc_etl(refdate date) IS
BEGIN

load_operational_data(refdate);

proc_run_bg('proc_transform1('''||refdate||''')');

proc_run_bg('proc_transform2('''||refdate||''')');

/* proc_transform1 & proc_transform2 will run in parallel and proc_etl will be completed just after submitting of these two procedure to dbms_job. It does not execute these transformation directly.*/

END proc_etl;
/


I have named this procedure proc_run_bg as in essence it runs the procedure in background after the time (in minute) elapses that you passed as parameter. Thus you can execute two procedure within one session literally but not in reality!!

This is very useful to me for dynamic scheduling as dependent procedures can be scheduled at run time instead of using fixed expression. That is, without scheduling dependent procedures to run everyday at 7 am. (by estimating that staging data get loaded before 7 am.) , I use it to run two transformations just after loading the staging data.

Saturday, January 15, 2011

Buffer Sort: Reason and Problems

Buffer sort uses the buffering mechanism of a traditional sort, but it does not do the sort itself.

The system simply buffers the data, in the UGA or PGA, to avoid multiple table scans against real data blocks. The whole sort mechanism is reused, including the swap to disk (TEMP tablespace) when not enough sort area memory is available, but without sorting the data.

So it is similar to temporary table but difference is temporary table gets stored in SGA thus buffer cache; so it introduces latching delay. Now comes the reason of buffer sort:-
  1. If sql query has sub-query, then the sub-query result will be buffered in PGA using buffer sort. e.g. select * from A where A.c1 in (select c1 from B). The result of select c1 from B will be buffer sorted.
  2. If sql query get transformed specially in case of star transformation, thus sub-query comes
    into picture and you will see buffer sort.
  3. If abnormal join takes place, then you will see buffer sort. An example of this abnormal join is joining of two table where parallel degree of one table is one and that of another table is high then buffer sort appears. The less degree table is buffered then joined with high degree table. This is more discussed in BUFFER SORT and Parallel query. In this case it cause long execution of query. Hence it should be removed.
For case 1 & 2, it is unavoidable in most of the case if your query is OK. In first two cases, the main problem of buffer sort is it uses sort_area of PGA. As usual when PGA is filled up it takes place in temp tablespace. In first case, you can remove it by using result_cache hints to save sort_area. E.g. select * from A where A.c1 in (select /*+result_cache*/ c1 from B). Alternatively, if you want to increase sort_area_size, you can set the following
two parameters -

alter session set sort_area_size=100000;

alter session set workarea_size_policy = 'MANUAL';