The situation is very common in Data Integration where a good performing ETL load starts to slow down. The job that been running and loading data just fine, suddenly becomes too slow and becomes a bottleneck.
There are many technical explanations for this to happen. Just to name a few, stale statistics, outdated hardware, software, load on server due to additional components and last but not the least, rapid growth of data in the target table.
In a datawarehousing environment, growth of data target tables such as facts or dimension tables is normal and intentional. However, many times, due to lack of proactive planning about handling data volume, eventually, slow running ETL becomes a huge bottleneck in many projects, especially those in maintenance mode.
One of the ways you can overcome the challenge of loading data in a very large target table is by leveraging Oracle’s Partition Exchange Load (PEL) feature. The beauty of this approach is such that it can reduce the loading time from several hours to just a few minutes and I am going to explain now, the ‘how to’ aspect of it.
Earlier on this blog, we had shared about Partition Exchange Load’s basic concept and for your reference, here is the link:
An Excellent Use Case For Partition Exchange Load
In this post, I am going to share one of the use cases of such a scenario and how Partition Exchange Load can help you in cutting down the time taken by ETL mappings from several hours to just a few minutes based on an experience of making it happen for one of my clients.
The ETL Performance Bottleneck Challenge
As the diagram above shows, the data comes from multiple source systems frequently (read ‘on a daily basis’.) This incoming data processed and loaded into a large Oracle table hosted in an Oracle 11g environment. The data in the target is loaded on a 30 days rolling basis based on transaction date using Oracle Warehouse Builder (OWB) as the ETL tool.
The target warehouse table was not partitioned although indexed for query by analytical/reporting tools and over the years of data accumulation, got bloated to a point where it began slowing down the loads gradually to the point where the ETL mappings to load data from each source system used to take about 6 to 10 hours generally and many times, way longer than that.
Analysis of Poor Oracle Database Performance
The ETL logic was designed to use the MERGE statement on the target table which was supposed to UPDATE/INSERT rolling 1 day’s data (Update/Insert) into the target table.
At the time of writing the ETL logic the target table must not have been so huge and therefore it would have worked fine for a few years. But, as the table grew in size, this MERGE statement ended up consuming all the CPU cycles and consuming almost all system memory.
At this point, Partition Exchange Load feature came to my mind as it would give us the flexibility to load data and yet not require massive DML statements on the target table. So, here is how it all worked out.
Oracle’s Partition Exchange Load (PEL) – The Perfect Solution
Before I explain the logic and the method to the implementation of Partition Exchange Load (PEL) mechanism, let me tell you that I was able to achieve a dramatic performance improvement in the ETL execution time. This improvement was also significant because the load would never take more time no matter how large the table grew in future.
So, here is what I did to implement Partition Exchange Load mechanism effectively:
1.Create Composite Partition On The Existing Non-Partitioned Table
You can’t add partition to a table with data which wasn’t partitioned at the time of designing without doing some additional work.
Ideally, for tables that have foreign keys involved and many constraints involved, you would do a much sophisticated way to get this job done using DBMS_REDEFINITION package. You can follow the steps involved in adding partitions to an existing table with this method in this beautifully written article:
However, I knew that I didn’t have any foreign key constraints defined at database level. So, I decided to take the easy route and manage using table renaming after loading data in a separate table.
The logic in this PEL implementation is to partition the large target table using composite partitioning technique where the top level partitioning is based on key date field and then by the Source System Code (which tells us which source system the data came from.) That means, the main partition holds monthly data within which there are subpartitions based on source system code field.
So, here is what you can do if you are in a similar situation:
i> Create a new portioned table with the same structure as the existing table using “CREATE TABLE <table name> AS” command in the same schema as shown below:
Please note that the names of the fields, their data types and their sequence in this new table need to exactly match that of the original large table on which we plan to implement PEL load mechanism.
Also, as you can see from the table definition above, I have created a subpartition template in the table definition itself so that Oracle will automatically add those subpartitions as soon as a new partition is added.
ii> The next step is to create partitions and subpartitions starting from the MIN value of the key date field in the target table till SYSDATE. Instead of doing this job on a manual basis, I wrote this procedure to do the job and if you like the idea, feel free to use it:
Call this procedure from SQL prompt by passing name of the table and the data of the first month for which you want to add partition and the procedure will do the rest automatically.
Now, the new table is ready with structure, partitions and subpartitions but empty still at this point. The next task is to copy data from the original large table into this table. That can be simply done using a simple insert statement like this:
INSERT INTO NEW_PARTITIONED_TABLE
SELECT * FROM <ORIGINAL_LARGE_TABLE> //Replace ORIGIAL_LARGE_TABLE with your table name.
The statement will take time based on size of your server and number of rows in the table. For me, it took about 6 hours.
iii> Finally, create all indexes on the new table as the original table, drop indexes on the original table, rename original table to something else and then rename the new table to become the main target table. For this example test case, say I renamed my target table to “WAREHOUSE_TABLE”
Once the existing table is partitioned, a major work is done. Now, let’s move to the next step:
1. Create A New Table To Hold Current Month’s Data For Each Source
Since the target table is partitioned on a monthly partition basis, it made sense to create a table for each source data (with the same structure as the main table) to be used for Partition Exchange to work. This table will hold only current month’s data coming from a particular source system.
The purpose of this table is to perform the MERGE operation in the ETL process on a smaller set of rows for faster execution time. Later, this table would be used in Partition Exchange Load process once the ETL complete.
Create A Custom Procedure To Exchange Partitions With Current Month Table
The idea behind this custom procedure was to have a reusable routine which would be called to Exchange Partitions for each time data from a source system is being loaded. The Partition Exchange would always refresh the CURRENT_MONTH data in the WAREHOUSE_TABLE table in the end.
The definition of this procedure is as below:
The procedure above can be called after final current month’s data is loaded in the current month table for a given source system to move the data finally into the final table WAREHOUSE_TABLE. The procedure takes two parameters:
- Code for the source system (in capital letters to match the name of the partition and the name of the current month table prefix correctly.
- A value of 1 if we want to rebuild indexes of the tables after exchanging the partitions.
1.Make ETL Mappings MERGE Source Data Into CURRENT_MONTH Table
For this use case, this isn’t a major new development. It is rather a step needed to reduce load on the server so that the mappings that load source data into the main WAREHOUSE_TABLE table do not have to merge the source data into a table with millions of rows.
To make this happen, I created a new mapping for each source system that loaded (executed MERGE statement) on the CURRENT_MONTH table instead of the main large table (WAREHOUSE_TABLE) keeping rest of the logic of the original mappings unchanged.
Because of this step, the ETL mappings which used to run for 10 hours, started to finish in 20-30 minutes.
2. Putting it All Together – Overall Workflow with Exchange Partition Load
Once all the mappings, tables and procedures are in place, the next step is to put everything in sequence so the Partition Exchange Load (PEL) can do the job correctly. So, here is a diagram that shows the sequence of steps that are scheduled to be executed on a daily basis for this load process:
This step ensures that the latest updated data (current month’s rolling data) for the source system in question is now reflected in WAREHOUSE_TABLE due to Partition Exchange Load mechanism’s effect. This process flow is to be executed every day reducing the overall load time by at least 80% compared to its original execution time.
Schedule A Monthly Job To Create New Partition For Next Month
To make the process take care of itself as we move from one month to the other, there is one more task that needs to be executed once a month to make sure a new partition is created in WAREHOUSE_TABLE for the next month.
Oracle 11g provides the facility to define interval partitioning technique that automatically adds a new partition at the end of interval. However, I decided not to use the feature and instead have my own logic to create a new partition each month because I wanted my control on the name that each partition got.
Here is a procedure I created for that purpose and again, please feel free to use it if it fits your situation:
This procedure (above) is to be scheduled to be executed on the first day of the month, every single month. You can schedule it to execute on 20th or each month or 25th of each month. It has to run once a month, day of the month is immaterial.
Your Turn, Share Your Thoughts
I understand that my solution may not be the most optimized solution in the world. However, it helped me solve a problem that was a major headache for my client.
If you have experience in working with Partition Exchange Loads and have some suggestions for me, please add it through your comments and I would love to learn from you.
If this is something you have never done and you landed on this site looking for some practical examples of Partition Exchange Load implementations, please let me know if this post added value to you.
Thank you kindly!