It’s been a while since I have posted, so I wanted to make sure I had something good to speak about. My co-worker(Senior DBA) has been out on leave since the end of November. Since then, I have been working on ALL SQL issues in my department. We have over 800 clients, with more than 1500 installations of SQL. Not to mention our own internal systems, so its been pretty hectic.
The main thing I have been working on though constantly is performance issues. I am not much of a performance guy, I am more of the maintenance and keeping the instances healthy and in tip top shape. But, I’ve had to get more hands on experience since I am running solo. I wanted to find out from folks, what things do they look at when troubleshooting performance? I did a bunch of google searches, and pretty much, alot of them say to start with the query found here: http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/. So I run the query in there, and document the results. I have been basically troubleshooting the top 3 wait types for each client complaining of issues.
The top one I have worked with is parallelism, which is shown by seeing CXPACKET waits. When I see this I know to check the following settings and see if any tweaking should be made: MAX degree of parallelism, and Cost Threshold for Parallelism. I generally do not like to see it with the default values of 0, and 5. This means that it can use as many threads of CPU it likes to run the query, and the cost is 5, which isnt much. I generally have the MAX setting for half the cpu cores(IE. if they have 8, I set for 4). And I normally set the cost threshold to 50. Now, some more tweaking can take place if needed, but that is generally safe enough for our applications.
The one I saw last night, I had never encountered before: SOS_Scheduler_YIELD. So after doing some research in our internal tickets, as long as google, I found that the client was having issues related to spinlocks. I then remembered that a previous client of ours had issues on SQL 2012 with the same issue and they applied the latest service pack to resolve it. I made the same recommendation to this client, but suggested it by saying that they were not on the latest SP, and they should update due to all of the bug fixes. We are still awaiting confirmation, but I felt pretty confident in the answer.
The last one was backupio, which after doing some research its just due to a slow medium of where they back up their transaction logs to.
So, I am still very green and new to all of this, and was wondering if anybody else had different steps they take for troubleshooting performance conditions? Any and ALL feedback is welcomed and appreciated. Thanks everybody!