Many of the tables in your databases contain dimensional data – descriptive information about objects that can be grouped and organized at a higher level than an individual transaction. And most of these dimension tables are sure to fit one of the definitions of a slowly changing dimension. When loading data into these tables using SSIS, you’ve likely used the slowly changing dimension (SCD) transform SSIS provides to handle at least a few of these data flow tasks. After all, you know you’re dealing with a SCD table and SSIS provides a SCD transform, so why not? Well, there are a few reasons why an alternative approach to maintaining your SCD tables may be to your benefit.
Issues with the SSIS SCD Transform
Performance is one area where your use of the SCD transform should be scrutinized. If you have a SSIS package that’s running longer than expected and you use the SCD transform in it, pay attention to the time it takes your data to get all the way through the transform. There’s a lot going on under the hood of the SCD component, and it can slow down the flow of your data considerably.
Maintenance of the SCD component can also be an issue. To create the SCD transform you have to walk through a wizard. If you’re a developer opening up a SSIS package for the first time that was created by someone else, the details of how a SCD transform was set up can be a bit of a mystery. Even if you know how it was set up, updating the transform to handle a change in your SCD logic or to accomodate a new data column involves rerunning the wizard and potentially losing some of the properties you had set up previously.
Alternatives to the SSIS SCD Transform
One way to avoid using the SCD transform is with simple lookups. Lookups are fast, especially if you have enough memory available to be able to do a full lookup. Lookups are also easy to maintain. You can open a lookup, see exactly what fields are being matched on, and make all the modifications you want.
If you’re working with a type 1 SCD, where all you’re worried about is inserting new records and updating existing records that have changed, two lookups can be all you need to replace your SCD transform. The first lookup checks your natural or business key to handle your new inserts. The second lookup takes a thorough look at the columns where you care if they change to determine if anything requires an update.
A type 2 SCD can be implemented similarly with lookups, only this time we require 3 lookups. The first lookup is used the same way, to check your natural or business key to handle your new inserts. The second lookup checks against your SCD2 columns, the columns that you want to retain history on if they change, and if one of these columns have changed you will update the original record to indicate it’s an old record and insert a new current record. The third lookup checks against your SCD1 columns, columns that you want to keep updated but that according to your business rules should not trigger an SCD2 update that tracks history, and update anything here that has changed.
The SSIS Dimension Merge SCD Component
Another alternative to the SSIS SCD transform is to use the free, open source, third party SSIS Dimension Merge SCD Component. They claim their transform delivers a 100x speed boost over the standard component, and while I can’t vouch for that number, I can say that its speed improvement is significant. It also has a nice interface, where you can clearly see how it has been set up and modify everything that is going on inside the component without having to suffer through a wizard.
This SSIS Dimension Merge SCD Component requires its input data to be sorted by the natural key. When possible, try to sort your input at the SQL level rather than with a sort transformation. The SSIS Sort transformation is slow and is a fully blocking operation, so it won’t start outputting rows until it has processed every row. It can also eat up a lot of memory. I’ve seen a Sort transform that normally would complete in seconds take over an hour on a memory starved system where the sort was competing with Sql Server for the system’s memory resources. SQL Server is good and fast at sorting data, so if you can do your sorting there instead you’ll get better performance. Then, in the advanced editor of your data source or wherever your sorted data is coming from, you can set the IsSorted property to True for the required columns.
I think the SSIS Dimension Merge SCD Component works quite well, but one thing that may turn you away from it is that there currently (as of July 2013) is not a version available for SSIS 2012. As it is an open source project, a few people in the project’s forums have had some success in compiling it to run on 2012, but none without quirks. A company called Pragmatic Works offers a supported version of this component in their Task Factory suite of SSIS tools, and their version is 2012 compatible, but you will have to pay for this supported version.
There you have it, a couple of solid ways to handle your slowly changing dimensions without having to use the SSIS SCD transform. I’ve had success with both of these methods at speeding up lethargic data flows. I also find these methods easier to maintain when changes have to be made. Try out one of these alternate SCD methods if you’re looking to remove some performance bottlenecks from your SSIS packages, and let me know how they worked for you.