Improving Linked Server Performance With OPENQUERY

Linked Server enables SQL Server Database Engine to execute commands against OLE DB data sources outside of the instance of SQL Server. In the following figure you can see that application can execute SQL statements against MS Access database through SQL Server.

Linked Server & SQL Server

Using Linked Server is something you might have to use in order to integrate with legacy systems such as MS Access database. Usually you can just connect to database by using address and instance name. This is not the case with MS Access. The connection is made to a physical file (e.g. C:\Databases\db.accdb). This means creating connection through the network would require mapping of network drive and opening the file. There are quite many things that can go wrong and you end up with corrupted database.

Using Linked Server means that you need to specify the server name in the SQL

-- Normal INSERT statement
INSERT INTO Table1(Column1, Column2) VALUES('Value1', 'Value2')
-- Statement against linked server
INSERT INTO MSACCESS...Table1(Column1, Column2) VALUES('Value1', 'Value2')

Note the usage of server name MSACCESS and three dots in the INSERT statement.

Performance Issues

Linked Server usage can be extremely slow because SQL Server might try to perform part of the logic locally. E.g. if you have WHERE statement in the query SQL Server might just send SELECT * FROM to the remote server and filter the results locally. Also single query will cause at least two connections to remote server: one to get the statistics for query analysis and another one to get the actual data. Depending on the table some queries might take several seconds to complete.

OPENQUERY to The Rescue

If you are stuck with Linked Servers one way to improve things is to start using OPENQUERY. The difference is that when using OPENQUERY SQL Server doesn’t try to process the query locally. It will just send it to the remote server as is. You can think of OPENQUERY as a pass-through query.

The syntax of the query is a bit different.

-- Normal INSERT against Linked Server
INSERT INTO MSACCESS...Table1(Column1, Column2) VALUES('Value1', 'Value2')
-- OPENQUERY INSERT against Linked Server
INSERT OPENQUERY (MSACCESS, 'SELECT Value1, Value2 FROM Table1') VALUES ('Value1', 'Value2')

Performance Measurement

  • The execution times presented are averages
  • Times are in milliseconds but the absolute values should not be looked at because there are many factors that affect those.

The performance has been measured with Stopwatch.

var watch = Stopwatch.StartNew();
using(var cmd = new SqlCommand(sql, connection))
{
cmd.CommandTimeout = commandTimeout;
cmd.ExecuteNonQuery();
}
watch.Stop();
Log.Write("Executing statement took {0} ms", watch.ElapsedMilliseconds)

DELETE Statement

I wasn’t expecting DELETE statements to be much faster so I was surprised to see almost 25% improvement in the execution time.

DELETE statement execution time. Lower is better.

INSERT Statement

The INSERT statements were executed almost 30% (28.5) faster with OPENQUERY.

INSERT statement execution time. Lower is better.

UPDATE Statements

The slowest statements were UPDATE statements so I had high hopes for the performance gains. The first UPDATE statement is 35% faster when OPENQUERY is used.

UPDATE statement execution time. Lower is better.

Another UPDATE statement with 81% decrease in execution time.

UPDATE statement execution time. Lower is better.

Finally, the slowest UPDATE statement. This statement took on average over 8 seconds to execute. With OPENQUERY the execution time was cut down to 100 milliseconds! That is 82 times faster and almost 99% reduction in execution time.

UPDATE statement execution time. Lower is better.

Notes and Conclusion

In my limited experience you can expect 25-35% decrease in execution times in normal scenarios. If you have statements that take several seconds to complete you might get really good results with OPENQUERY. Unfortunately, it is difficult to know how much of an improvement you get. Start by measuring the statements and convert those that are the slowest.

If you are planning to move to OPENQUERY I highly recommend having configuration option to switch back to normal SQL if needed. The OPENQUERY is pickier about the SQL it accepts. Being able to switch back to slower but working solution will save your ass.

Migrating from TFS to Git

Sometimes the existing tools are making it difficult to do the things you need to do. In our case we needed a way to do multiple things at the same time:

  • Implement and test features in isolation
  • Fix things in production
  • Have code reviews
  • Have continuous integration for all branches
  • Test and review the next release(s)

TFS’s way of having physical folder per branch and merges happening on server side wasn’t just working for us and it reminded me of the following CommitStrip.

Git, SVN or...?

So this fall I migrated the team from TFS to Git. The actual migration included the following:

  • Migrating source code (4 TFS projects)
  • Setting up continuous integration (TeamCity)
  • Setting up automated deployments (Octopus, not done by me)
  • Short training and instructions

The source code was already hosted in Visual Studio Team Services (VSTS) so there was no need to migrate other project related data such as work items.

Plan

Before jumping in I wrote down all the things I needed to do in order to successfully implement the migration:

  • Write short instructions on how to use basic Git features (Git was new to most people)
  • Test the migration, continuous integration and automated deployments before the Big Day
  • Give training on Git (basic usage from the command line and Visual Studio)
  • Make sure everyone can clone test repository, create pull requests etc.
  • Configure server side (access rights, policies, branches, pull requests)
  • Migrate every branch from every TFS project with complete version history
  • Support Git usage via Skype/face-to-face

When it comes to actual development model I decided to go with a successful Git branching model with the addition of using pull requests. It is a proven model and quite simple. Developers do most of the work in feature branches and merge them into main development branch. Every now and then fixes must be merged into multiple branches but that is about as complicated as it gets.

Tools

As the plan was to migrate full version history I needed a tool to do the heavy lifting. Google gave me Git-TF and git-tfs. The Git-TF project hadn’t been active for a while and it seems it is dead.

Unfortunately, we do not have any distinct plans regarding the new official git-tf release. Sorry for inconvenience.

This left me with only one choice. Luckily the git-tfs project was actively developed and it looked to have the features I would need.

Testing

As I started test migration I noticed that git-tfs wasn’t able to traverse the branch trees without problems. Every now and then it would stop and ask me to manually input the changeset id:

warning: git-tfs was unable to find the root changeset (ie the last common commit) between
the branch '$/Project/QA'
and its parent branch '$/Project/Production'.
(Any help to add support of this special case is welcomed!
Open an issue on https://github.com/git-tfs/git-tfs/issues )
To be able to continue to fetch the changesets from Tfs,
please enter the root changeset id between the branch '$/Project/QA'
and its parent branch '$/Project/Production' by analysing the Tfs history...
Please specify the root changeset id (or 'exit' to stop the process):

Finding the correct changeset between two branches isn’t straight forward. If you write the wrong id, then the end result of the migration is wrong for some branches. After running into this issue more than few times I started to look into the branches. I found out that there have been lot of merges between different branches. It wasn’t just from develop to master and back.

Theory vs. Reality

The migration would sometimes go into infinite loop because of all the strange merges back and forth that had been done. At this point I decided that I would migrate the whole version history from the main development branch and manually take the latest changes from the other important branches. After all the other branches didn’t contain changes that were not in the main development branch.

Continuous Integration and Deployment

We were already using TeamCity and we decided to continue to do so. Setting up the builds was easy. Even if you are not familiar with TeamCity all you have to do is to create project, configure the version control connection (tell TeamCity address of Git repository) and setup build steps. Automated deployment was done with Octopus.

Instructions and Training

I didn’t want to end up in a situation where developers are asking me why they cannot clone the repository or how to do branches etc. so I did the following:

  1. Write instructions on how to install Git, configure it etc.
  2. Write short tutorial how to do daily things (branches, pull requests etc.)
  3. Create test repository and let people try themselves

When the decision to start using Git was made, I already knew I would be giving training using command line. No GUIs. Just written commands. But as time passed and I was solving other issues I started to second guess my decision. I came into the conclusion that using only command line in the training would look like the following xkcd comic:

Git

The team had a lot of people who had no experience on Git. Most had used only version control via GUI. What would be the point of showing command line usage in one-hour training? After all the goal was to have everyone working after the migration. So I changed my mind and I used Visual Studio’s Git integration because it had the features developers use every day. There is plenty of time to learn neat command line tricks and Git philosopfy later.

Migration

Note that creating 1:1 copy of your TFS project might take a long time for large projects. I started the migration by exporting the main development branch into Git repository. This will fetch every changeset from TFS and create equal commit history into the Git repository.

git tfs clone https://company.visualstudio.com:443 $/TFSProject/BranchName . --branches=none --export

If you do this in Windows you might run into issue where the path to repository is changed to an absolute path. In that case prefix the command with MSYS_NO_PATHCONV=1 like this (see Unable to clone repository after upgrading to Git v2.5.0).

MSYS_NO_PATHCONV=1 git tfs clone https://company.visualstudio.com:443 $/TFSProject/BranchName . --branches=none --export

Next I wanted to clean up the version history by removing the git-tfs metadata

git filter-branch -f --msg-filter "sed 's/^git-tfs-id:.*$//g'" -- --all

For some reason the NuGet packages had been stored in the version history. I decided to remove them to make the repository smaller.

#Remove packages folder from the history
git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch -r ./packages/*'
#Remove nuget folder from the history
git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch -r ./.nuget/*'

I also had to remove TFS binding from the solution by removing the GlobalSection(TeamFoundationVersionControl) section from the solution (.sln) file. Finally I removed the branches that I didn’t need.

git branch -a
master
remotes/tfs/default
git branch -rd tfs/default

At this point I had a pretty clean Git repository. I created empty repository into VSTS and pushed the local repository into VSTS.

Configuration

On VSTS I configured repositories to use pull requests for develop and master branches and disabled push --force.

Use pull requests Disable push --force

First day after migration

First day the team run into following problems:

  • Not being able to see repositories (user connected to wrong TFS project)
  • Cannot push changes (user didn’t use pull request)
  • Fetch, Push, Pull confusion
  • VS UI confusion (creating branches, publishing changes)

Using Skype and screen sharing it was easy to help people to solve the minor problems they faced on the first day. It was great to see how team was able to transition from TFS to Git so easily!

Few weeks after migration

Few weeks after migration things are running smoothly. Sure there are times when someone makes changes to wrong branch or forgets to merge hotfix back to develop but this is expected. We are only humans. The team has already solved merge conflicts, released new versions, made hotfixes into production and used cherry pick!

Only change we made into the way we work is that DevOps doesn’t have to use pull requests. This is mainly because DevOps will create releases by merging code from release preparation branch into master branch. That code has already been reviewed.

In conclusion I would say the migration was successful and well worth the effort!