The Merge Join Transformation and a SSIS Sorting Tip
My previous blog post was about the SSIS Lookup task and how it really works. Now that I have shown that the Lookup task shouldn’t be used for one-to-many or many-to-many joins, let’s take a look at the Merge Join transformation task. If you follow along with this blog, you will learn a little tip that will eliminate the requirement for you to add a SORT transformation task within your data flow task.
Previously, we isolated our results sets down to one employee in the AdventureWorks database and joined two tables together. I’m going to do the same thing here. This time, I am going to introduce a third table into the join logic. The three tables are listed below:
- Employee Department History
- Person
- Department
Here is what it would look like via SSMS with T-SQL:
Let’s see how we can mimic this result set in SSIS without using T-SQL to join the three tables. First, I want to say it is not always going to be the best option not to use T-SQL instead of individual tasks in SSIS. I have learned over time that it is easier to write the join logic directly in you data flow source task sometimes. However, this is for demonstration purposes.
Let’s say you received a request to extract a result set, order the results set, and load it to another location. Here is what your package would look like in SSIS using the Merge Join transformation task:
Here are our results:
Notice, I used the SORT transformation task in the example above. I used this to depict what has to occur in a step by step approach:
- Extracted data from the Person and Employee Department History tables
- Sorted each result set
- Merged the two results into one using inner join logic
- Extracted data from the Departement table
- Sorted the first Joined result set and the Department result set
- Merge the Joined result set from Persons and Employee History with the Department table
Let’s talk about best practice for this example. This is where the Sort tip is introduced. Since we need an ordered result set per the request, we are using the merge transformation instead of the Union All task. Additionally, we used the Sort task. The Sort task can heavily impact the performance of an SSIS package, particularly when you have larger result sets than what we are going to extract from the AdventureWorks database.
Best practice is to bring in an ordered result set at the source and then merge your record sets. Well, how do you do that? Let’s walk through ordering your result set at the source and configuring your source to define the sorted column for merging your record sets.
First, we open the task and add our ORDER BY clause to our source.
Next, close the source task, right click on the same source task, and choose the Show Advanced Editor option.
There are two specifications in the Advanced Editor that need to be defined in order to make this work:
- Click on the Input and Output Properties tab
- Click on the OLE DB Source Outputs
- Change the IsSorted parameter to “True”
- Drill down into the OLE DB Source Output to Output columns.
- Click on your columns that you used in your ORDER BY clause.
- Change your SortKeyPosition parameter from “0” to “1”.
The desired results should look similar to those below:
Next, you can remove each sort task that directly follows your OLE DB Source task by repeating the steps above to reconfigure each source editor. Now, my data flow task looks like this:
We get back the same results:
In case you are wondering why I got rid of all of the Sort tasks except for the one that follows the first merge join, I’ll explain. There are two reasons for this. My second join is on DepartmentID and, most importantly, the merge transformation task is not considered a data flow source task and does not come with the functionality to define the sorted order.
To conclude my second blog post of this series, the Merge Join transformation task can be used to merge columns from two different tables using Join logic similar to the Joins that can be used in T-SQL. We have looked at a step by step break down of what has to occur to implement a Merge Join transformation task as well as discussing some tips and best practice in regards to using the Sort task in SSIS.
I hope this blog post has been informative and that you look forward to reading my third post soon.