Bad statistics are a fairly common issue in SQL Server. In a nutshell it means the statistics that SQL uses during Query Processing are out of date and not reflecting true data in terms of number of rows and distribution.

Bad statistics can result in bad decisions being made by the optimiser as it uses them to determine how to process data in the execution plan, if it chooses incorrectly this can mean a sub-optimal query being executed.

There’s already a stack of information out there on statistics and how to manage them, here’s a post from Erin Stellato but in this post I wanted to demonstrate how to make statistics bad on purpose.

It’s very simple, but first let’s see what accurate statistics look like in an execution plan. I’m going to run the following query against the Wide World Importers sample database:

SELECT InvoiceID, Description FROM Sales.InvoiceLines

Very straightforward, and by turning on actual execution plan in Management Studio I can see a very simple query plan containing a clustered index scan. If we hover over the scan operator we can see its properties as shown below. The two highlighted properties in green are the actual and estimated number of rows that the scan operator has processed:

Both values are identical, 228265, which is good because this time SQL has got its row estimation absolutely right and overall is exactly how we’d want our queries to be executed. We can also hover over the arrow between the scan and select operators where we can see our estimated and actual row counts but also the estimated data size of the index scan too; in this case a relatively small 25MB.

This is all well and good but what if we wanted to deliberately make our statistics completely incorrect?

To do this we’ll use an undocumented option of the UPDATE STATISTICS command WITH ROWCOUNT by running the following:

WITH ROWCOUNT = 777777777

And that’s it, we’ve now forced a much higher row count on this particular table which will be used by the optimiser. Before this change does get used though we need to ensure the optimiser does not reuse our previous query from cache. As I’m on a test system I’ll run DBCC FREEPROCCACHE and now repeat our query from earlier, here’s the clustered index scan properties this time:

This time our estimated and actual rows are way different; SQL Server expected 777778000 (its rounded them up slightly) rows in the table but of course that’s not true and the actual rows have come back with a very different but real value of 228265.

We can also check if this has affected the data size estimation and we can now see that our value for this has increased from 25MB up to 83GB.

These “bad” estimations can cause easily significant query performance problems; incorrect join types, bad memory grants and unintended parallelism are just some of the things that can result from having out of date statistics.

That’s the bad, but being able to force row counts does have its uses as well. By altering these values we can test changes in our query plans should our data grow (or shrink for that matter) which can be really useful if you don’t have large sets of data available in test systems.

The final part as with any forced modification such as this is to undo our changes so we’ll reissue the update statistics command this time using WITH ROWCOUNT with the correct value that we started with.


Creating Bad Statistics

Leave a Reply

Your email address will not be published. Required fields are marked *