INTERNET APPLICATION DEVELOPMENT
MID MARKET ERP DEVELOPMENT
by Derek Du
Recently, we were asked to improve the performance of some SQL queries that run for 15 seconds and 30 seconds. At first, we were not sure whether the performance could be improved, given that there were about 80 million data pieces in the database. I have to admit that SQL Server is really powerful as eventually we reduced the time to 2 seconds and 6 seconds!
During the process, I had a few interesting findings about SQL Server and learned a lot about troubleshooting SQL Server performance issues. I want to share a few tips on optimizing SQL statements I learned from the experience.
We sometimes jump directly into analyzing SQL queries, but the answer to the performance issue may lie on the indexing itself, so always check whether necessary indexes are present. A good way to find out what indexes you need is to use Dynamic Management Views to monitor system health and create necessary indexes. With appropriate indexes in place, performance increase is substantial.
There are tons of articles about Clustered and Non-clustered index and their benefits. Knowing the difference between a clustered and non-clustered index helps troubleshooting performance issue for sure. Here is a nice article explaining Clustered and Non-clustered indexes.
Here is a metaphor I liked that helped me understand Clustered and Non-clustered index – Clustered index is like the page numbers of a book, where the logic order of the pages is the same as the physical order of them in the book, e.g. page 1 is at the first and page 100 is the last page for a 100-page book. On the other hand, Non-clustered index is like the Index page of a book where the logic order of the terms on the index page is not the same as their actual locations in the book. For instance, the word Apple is listed at the beginning of the index page but it may firstly appear on the last page of the book.
To really find out why a certain query is running slow, more information is needed.
Execution plans can be quite helpful to figure out which parts of a complex query take time and how SQL server optimizes the query. Though it does not provide a solution directly, it does provide some hints or directions on troubleshooting performance issues. SQL Management Studio or even Visual Studio now provide the ability to view an execution plan. A tool I find quite useful is SQL Sentry Plan Explorer, which displays the execution plan diagram with much more information. For example, it displays the number of records processed above each line, which is very convenient.
Another quite handy tool, Statistics IO, displays statistics information for your query. Set Statistics IO to ON and information such as the number of scans will be displayed.
However, the execution plan may not be very straightforward sometimes, especially for a complex query with many nested sub-queries. The large query will be optimized by SQL Server, which makes the execution plan harder to understand. Instead of trying to figure out what SQL Server is doing, a faster way for me is to just break the large query into smaller ones and test the performance of each of them. It is possible that SQL Server is not picking the best route to run the query and the "optimized" query is actually running much slower, which is exactly what happened in my case. My query should only take 2 seconds to run whereas the "optimized"query by SQL Server takes 15 seconds. Why? Optimizing query also takes time and SQL Server simply may not find the best optimization in time.
If SQL Server is selecting a slower way to execute the query, there are some workarounds:
Another quick and easy approach is try different ways of writing the same query. Swap IN with JOIN, change INNER JOIN to LEFT JOIN to EXIST, or change Non-Unique Non-clustered index to Clustered index. Some changes take just a few seconds to make and they may work. Once it works, you can research a bit on why or may just choose to forget about it if not in the mood. Sometimes, I'm just happy I found a solution, not necessarily why it worked.
This blog post originally appeared on Derek's blog, Stuff.