OK, So quite a strange one this.
Client has a Database with a number of Large History Tables.
Developers decided that as they were running out of space on the current drive, the only solution was to rename the old Tables and re Create the new Tables on another FileGroup on another LUN.
Not the best solution, but not a total disaster, however, the application also requires to be able to read data from the current year and as they did this change midway through the year. The developers again came up with a Brilliant (or so they thought|) solution which involved copying the current years data back out of the now archived Tables into the New Tables.
OK, so if we now consider the fact that they also want to use these tables for BI, things start to get interesting.
The Archived Table contains close to 500 Million rows, as does the New Versions of the Table with around 100 million Duplicate rows.
Ultimately, I will partition the Table, but the first step is to load all of the data into one Table without Duplicates.
After playing around with various methods, I settled on using the TSQL MERGE Function which first appeared in SQL Server 2008 :-
http://msdn.microsoft.com/en-us/library/bb510625.aspx
Basically I used it to compare the two tables, if the row in the Archive table did not appear in the New table, then the row was Inserted into the New Table, thus resulting in One Table containg all of the rows without any duplicates.
A simplified version of the query can be found below :-
MERGE
[dbo].[HistoryTable] SRC
USING
[dbo].[OldHistoryTable] OLD
ON
SRC.[PrimaryKeyField_1] = OLD.[PrimaryKeyField_1]
AND
SRC.[PrimaryKeyField_2] = OLD.[PrimaryKeyField_2]
WHEN
NOT MATCHED BY TARGET THEN
INSERT
([PrimaryKeyField_1],
[PrimaryKeyField_2],
[Field_1],
[Field_2])
VALUES
(OLD.[PrimaryKeyField_1],
OLD.[PrimaryKeyField_2],
OLD.[Field_1],
OLD.Field_2]);