Once you have migrated your data from SQL Server to Amazon SQL Server Relational Database Services as described in Migrate your SQL Server database to Amazon RDS in 3 easy steps, you need to compare your source and destination database and make sure that all the data is identical.
There are two ways to do this. You can use a “counting technique” where you count the tables in each database and then count each row in each table and compare them. The danger with this approach is that it doesn’t tell you anything about the content of each record. It is not uncommon for certain records to change when they are exported to a flat file and then imported into a new database. For example, when there are collation differences between the databases or due to character encoding differences.
Tablediff.exe to the rescue!
The best way to validate your SQL Server to RDS migration is to use a utility called tablediff.exe. This is an executable that is shipped out-of-the box with every SQL Server installation. It is commonly used to validate SQL Server replication but it is really a great utility for comparing any two tables or databases. Tablediff.exe is located in the COM directory of your SQL Server installation path:
C:\Program Files\Microsoft SQL Server\90\COM (SQL Server 2005)
C:\Program Files (x86)\Microsoft SQL Server\100\COM (SQL Server 2008R2)
C:\Program Files\Microsoft SQL Server\110\COM (SQL Server 2012)
Make sure you find out where tablediff.exe is located on your source server because you will need to enter it in the script below.
Step # 1: Create the script
I am assuming that you followed my instructions for migrating SQL Server to Amazon RDS and that you have created all the tables in the destination database. We are going to use the same technique as in the migration post to generate a script that can be run from a batch file. Run this on your source server:
select '"C:\Program Files\Microsoft SQL Server\110\COM\tablediff.exe" -sourceserver [] -sourcedatabase [] -sourcepassword -sourceuser -sourcetable [' + name + '] -sourceschema [dbo] -sourcelocked [TABLOCK] -destinationserver [] -destinationdatabase []-destinationtable [' + name + '] -destinationschema [dbo] -destinationlocked [TABLOCK] -destinationpassword -destinationuser -o \TableDiff.txt -c' from sys.tables where name not in ('')
Make sure that you replace everything between < > with the correct values. Don’t delete the square [ ] brackets but do delete the < >. Here is an example of what this looks like after you plug in the values:
select '"C:\Program Files\Microsoft SQL Server\110\COM\tablediff.exe" -sourceserver [10.37.29.666] -sourcedatabase [Navision] -sourcepassword mypwdisnotcomplexenough -sourceuser sa -sourcetable [' + name + '] -sourceschema [dbo] -sourcelocked [TABLOCK] -destinationserver [CC90Navision.cmsq1l7xxxnh.us-east-1.rds.amazonaws.com] -destinationdatabase [Navision] -destinationtable [' + name + '] -destinationschema [dbo] -destinationlocked [TABLOCK] -destinationpassword IsThisC0mp1exEnough? -destinationuser admin -o c:\RDS\Navision_TableDiff.txt -c' from sys.tables where name not in ('dtproperties')
A few attention points about this script:
- Execute this script on one line. I have used carriage returns above for ease of reading.
- Replace the path to tablediff.exe in the script with the location of tablediff.exe on your server. This example is from SQL Server 2012.
- The script assumes that all your objects exist in the “dbo” schema. You would need to update it if you are using a different schema.
After you run this on your SQL Server you should get an output similar to this:
output
Now copy the entire resultset from SSMS into a text file. At the bottom of the file enter a carriage return and write “PAUSE”. This will cause the CMD window to wait when the batch file finishes executing so that you can see if there are any errors.
Then save your text file and rename the extension to “.bat” so that you can run it as a batch file and copy this batch file to the directory that you entered for <Working Directory> above.
Step # 2: Compare your tables
Now run the batch file by double-clicking on it. As it starts running, you should see the output file appear in your working directory and it will contain the output of the comparisons. Note that this can take some time for large databases because the script is comparing table per table and row-per row. So go get a coffee while this runs but don’t get a beer yet because you’ll still need your brain cells if you find some differences :).
Step # 3: Check the tablediff.exe output
Once the batch file has run, open the output file and you will see any differences listed. If the file contains only lines that say the following that means that there is no difference:
Table [Navision].[dbo].[TB_Screen] on 10.37.29.666 and Table [Navision].[dbo].[TB_Screen] on CC90Navision.cmsq1l7xxxnh.us-east-1.rds.amazonaws.com are identical.
If there is a difference you will see something like this:
Table [Navision].[dbo].[TB_Screen] on 10.37.29.666 and Table [Navision].[dbo].[TB_Screen] on CC90Navision.cmsq1l7xxxnh.us-east-1.rds.amazonaws.com have 32 differences.
Then this means that something went wrong during the migration and the tables contents are not the same. See point 4 below for information on syncing different tables. Tablediff.exe will also tell you if the tables are not the same, for example, if the table in your source has fewer fields than it does in your destination.
Troubleshooting Tips
- If you have tables without a primary key (PK) then tablediff.exe will not be able to compare them. There are two workarounds to this:
- You could create a PK temporarily and drop it after the comparison.
- If the tables are small you can do a manual compare by copying the data into Excel.
- Try to avoid passwords with single or double quotes because it can cause errors when the tablediff.exe command is generated.
- When you run the generation script in SSMS, make sure it is all on one line with no carriage returns as I have already mentioned above.
- Tablediff.exe has a lot of options which can be seen in the documentation. For example, it can generate a SQL script to synchronize table differences or you can change the timeout setting which is sometimes needed if you are connecting across a slow link.
I would highly recommend that you run a baseline on your SQL Server before migrating to RDS. Performance problems in the cloud are not uncommon because of the shared infrastructure. If you run a baseline beforehand you will have a reference point and will be able to find the root cause of performance problems much more quickly.