Did My Query Eliminate Table Partitions in SQL Server?

November 17, 2015, 8:00 am

≫ Next: Does OPTION (RECOMPILE) Prevent Query Store from Saving an Execution Plan?

≪ Previous: DBCC USEROPTIONS: See Your Session Settings in SQL Server

Working with table partitioning can be puzzling. Table partitioning isn’t always a slam dunk for performance: heavy testing is needed. But even getting started with the testing can be a bit tricky!

Here’s a (relatively) simple example that walks you through setting up a partitioned table, running a query, and checking if it was able to get partition elimination.

In this post we’ll step through:

How to set up the table partitioning example yourself
How to examine an actual execution plan to see partition elimination and which are accessed. Spoiler: you can see exactly which partitions were used / eliminated in an an actual execution plan.
Limits of the information in cached execution plans, and how this is related to plan-reuse
A wrap-up summarizing facts we prove along the way. (Short on time? Scroll to the bottom!)

How to Get the Sample Database

We’re using the FactOnlineSales table in Microsoft’s free ContosoRetailDW sample database. The table isn’t very large. Checking it with this query:

SELECT 
    index_id, 
    row_count, 
    reserved_page_count*8./1024. as reserved_mb
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('FactOnlineSales');
GO

Here’s the results:

The table has 12.6 million rows and only takes up 363 MB. That’s really not very large. We probably wouldn’t partition this table in the real world, and if we did we would probably use a much more sophisticated partition scheme than we’re using below.

But this post is just about grasping concepts, so we’re going to keep it super-simple. We’re going to partition this large table by year.

First, Create the Partition Function

Your partition function is an algorithm. It defines the intervals you’re going to partition something on. When we create this function, we aren’t partitioning anything yet — we’re just laying the groundwork.

CREATE PARTITION FUNCTION pf_years ( datetime )
    AS RANGE RIGHT
    FOR VALUES ('2006-01-01', '2007-01-01', '2008-01-01', '2009-01-01', '2010-01-01');
GO

Unpacking this a bit…

DATETIME data type: I haven’t said what column (or even table) I’m partitioning yet — that comes later. But I did have to pick the data type of the columns that can use this partitioning scheme. I’ll be partitioning FactOnlineSales on the DateKey column, and it’s an old DateTime type.

RANGE RIGHT: You can pick range left or range right when defining a partition function. By picking range right, I’m saying that each boundary point I listed here (the dates) will “go with” the columns on the partition to the right.

This means that the boundary point ‘2007-01-01’ will be included in the partition with the dates above it. That’s the rest of the dates for 2007.

Usually with date related boundary points, you want RANGE RIGHT. (We don’t usually want the first instant of the month, day, or year to be with the prior year’s data.)

VALUES: Why doesn’t the partition function go to present day? Well, the Contoso team apparently decided to use some other database after the end of 2009. That’s the lastest data we have.

Second, Create the Partition Scheme and Map it to the Function

A partition scheme tells SQL Server where to physically place the partitions mapped out by the partition function. Let’s create that now:

CREATE PARTITION SCHEME ps_years 
    AS PARTITION pf_years
    ALL TO ([PRIMARY])
GO

Let’s talk about “ALL TO ([PRIMARY])”. I’ve done something kind of awful here. I told SQL Server to put all the partitions in my primary filegroup.

You don’t always have to use a fleet of different filegroups on a partitioned table, but typically partitioned tables are quite large. Dumping everything in your primary filegroup doesn’t give you very many options for a restore sequence.

But we’re keeping it simple.

Now Partition the Table on the Partition Scheme

This is where it gets real. Everything up to this point has been metadata only.

Currently, the FactSales table has a clustered Primary Key on the SalesKey column and no nonclustered indexes. We’re going to partition the table by the DateKey column. The first step is to drop the clustered PK, like this:

ALTER TABLE dbo.FactSales 
  DROP CONSTRAINT PK_FactSales_SalesKey;
GO

Now partition the table by creating a unique clustered index on the partition scheme, like this:

CREATE UNIQUE CLUSTERED INDEX cx_FactSales
  on dbo.FactSales (SalesKey, DateKey)
ON [ps_years] (DateKey)
GO

We made a couple of important changes. The table used to have a clustered PK on SalesKey, but we replaced this with a unique clustered index on TWO columns: SalesKey, DateKey. There’s a reason for this: if we’re partitioning on DateKey and we try to create a unique clustered index on just SalesKey, I’ll get this message:

Msg 1908, Level 16, State 1, Line 31
Column 'DateKey' is partitioning column of the index 'cx_FactSales'. Partition columns for a unique index must be a subset of the index key.

DateKey is elbowing its way into that clustered index, whether I like it or not.

All right, now that we have a partitioned table, we can run some queries and see if we get partition elimination!

Query the Partitioned Table and Look at the Actual Execution Plan

Our example query is this stored procedure:

CREATE PROCEDURE dbo.count_rows_by_date_range
  @s datetime,
  @e datetime 
AS
  SELECT COUNT(*)
  FROM dbo.FactSales
  WHERE DateKey between @s and @e;
GO

exec dbo.count_rows_by_date_range '2008-01-01', '2008-01-02';
GO

If we run that call to dbo.count_rows_by_date_range with “Actual Execution Plans” enabled, we get the following graphic execution plan:

It’s a clustered index scan, but don’t jump to conclusions.

We have a clustered index scan operator on the fact sales table. That looks like it’s scanning the whole thing– but wait, we might be getting partition elimination! This is an actual execution plan, so we can check.

Hovering over the Clustered Index Scan operator on Fact Sales, a tooltip appears!

Partitioned = True!

It knows the FactSales table is partitioned, and “Actual Partition Count” is 1. That’s telling us that it only accessed a single partition. But which partition?

To tell that, we need to right click on the Clustered Index Scan operator and select “properties”:

Decoding this: The clustered index scan accessed only one partition. This was partition #4.

Let’s re-run our query to make it access more than one partition! We’re partitioning by year, so this should touch two partitions:

exec dbo.count_rows_by_date_range '2007-12-31', '2008-01-02';
GO

Running this query with actual execution plans on, right clicking the Clustered Index Scan, and looking at properties, this time we see it accessing two partitions, partition #3 and partition #4:

Just because you see “Clustered Index Scan” doesn’t mean you didn’t get partition elimination. However, even if you did get partition elimination, it may have needed to read from multiple partitions.

Can You See Partition Elimination in the Cached Execution Plan?

So far we’ve been looking at Actual Execution plans, where I’ve run the query in my session. What if this code was being run by my application, and I wanted to check if it was getting partition elimination?

If the execution plan was cached, I could find information on its execution and cached plan with this query:

SELECT 
  eqs.execution_count,
  CAST((1.)*eqs.total_worker_time/eqs.execution_count AS NUMERIC(10,1)) AS avg_worker_time,
  eqs.last_worker_time,
  CAST((1.)*eqs.total_logical_reads/eqs.execution_count AS NUMERIC(10,1)) AS avg_logical_reads,
  eqs.last_logical_reads,
    (SELECT TOP 1 SUBSTRING(est.text,statement_start_offset / 2+1 , 
    ((CASE WHEN statement_end_offset = -1 
      THEN (LEN(CONVERT(nvarchar(max),est.text)) * 2) 
      ELSE statement_end_offset END)  
      - statement_start_offset) / 2+1))  
    AS sql_statement,
  qp.query_plan
FROM sys.dm_exec_query_stats AS eqs
CROSS APPLY sys.dm_exec_sql_text (eqs.sql_handle) AS est 
JOIN sys.dm_exec_cached_plans cp on 
  eqs.plan_handle=cp.plan_handle
CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) AS qp
WHERE est.text like '%FROM dbo.FactSales%'
OPTION (RECOMPILE);
GO

Here’s the results for our query:

Sys.dm_exec_query_stats has great info! The difference between the average logical reads and the last logical reads shows us that sometimes this query reads more than others– that’s because the first time we ran it, it had to scan one partition. The second time we ran it, it had to query two. If it was always scanning the whole table, we’d have the same number of logical reads for the average and the last.

We can also see that the same execution plan was reused for both queries. Clicking on the cached query plan to open it up, we see something similar… but it doesn’t have all the same info.

The clustered index scan is the same…

But in the properties we can only see that it knows the table is partitioned

The cached execution plan does not contain information on the number of partitions accessed or which ones were accessed. We can only see that in the Actual Execution plan.

TLDR; (Too long, didn’t eliminate partitions)

Here’s a quick rundown of what we did and saw:

We partitioned the FactSales table by creating a partition function and partition scheme, then put a unique Clustered Index on the SalesKey and DateKey columns
When we ran our query with actual execution plans enabled, we could see how many partitions were accessed and the partition number
When we looked at the cached execution plan, we could see that the same execution plan was able to be re-used across multiple runs, even though:
- It was a parameterized stored procedure
- The query accessed a different number of partitions on each run (one partition on the first run, two partitions on the second run)
The cached execution plan did not contain the number of partitions accessed. (Makes sense, given the plan re-use!)
We could see the average and last number of logical reads from sys.dm_exec_query_stats, which could give us a clue as to whether partition elimination was occurring

Super simple, right?

If you liked this post and you’re ready for something more challenging, head on over to Paul White’s blog and read about a time when partition elimination didn’t work.

↧

Does OPTION (RECOMPILE) Prevent Query Store from Saving an Execution Plan?

November 25, 2015, 8:00 am

≫ Next: Joins, Predicates, and Statistics in SQL Server

≪ Previous: Did My Query Eliminate Table Partitions in SQL Server?

Recompile hints have been tough to love in SQL Server for a long time. Sometimes it’s very tempting to use these hints to tell the optimizer to generate a fresh execution plan for a query, but there can be downsides:

This can drive up CPU usage for frequently run queries
This limits the information SQL Server keeps in its execution plan cache and related statistics in sys.dm_exec_query_stats and sys.dm_exec_procedure_stats
We’ve had some alarming bugs where recompile hints can cause incorrect results. (Oops! and Whoops!)
Some queries take a long time to compile (sometimes up to many seconds), and figuring out that this is happening can be extremely tricky when RECOMPILE hints are in place

The new SQL Server 2016 feature, Query Store may help alleviate at least some of these issues. One of my first questions about Query Store was whether recompile hints would have the same limitations as in the execution plan cache, and how easy it might be to see compile duration and information.

Let’s Turn on Query Store

I’m running SQL Server 2016 CTP3. To enable query store, I click on the database properties, and there’s a QueryStore tab to enable the feature. I choose “Read Write” as my new operation mode so that it starts collecting query info and writing it to disk:

Query Store: ACTIVATE!

If you script out the TSQL for that, it looks like this:

USE [master]
GO
ALTER DATABASE [ContosoRetailDW] SET QUERY_STORE = ON
GO
ALTER DATABASE [ContosoRetailDW] 
SET QUERY_STORE (OPERATION_MODE = READ_WRITE, 
CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 367), 
DATA_FLUSH_INTERVAL_SECONDS = 900, 
INTERVAL_LENGTH_MINUTES = 60, 
MAX_STORAGE_SIZE_MB = 100, 
QUERY_CAPTURE_MODE = ALL, 
SIZE_BASED_CLEANUP_MODE = AUTO)
GO

And Now Let’s Test Drive that RECOMPILE Hint

Now that Query Store’s on, I make up a few queries with RECOMPILE hints in them and run them– some once, some multiple times. After a little bit of this, I check out and see what query store has recorded about them:

SELECT 
  qsq.query_id,
  qsq.query_hash,
  qsq.count_compiles,
  qrs.count_executions,
  qsq.avg_compile_duration,
  qsq.last_compile_duration,
  qsq.avg_compile_memory_kb,
  qsq.last_compile_duration,
  qrs.avg_logical_io_reads,
  qrs.last_logical_io_reads,
  qsqt.query_sql_text,
  CAST(qsp.query_plan AS XML) AS mah_query_plan
FROM sys.query_store_query qsq
JOIN sys.query_store_query_text qsqt on qsq.query_text_id=qsqt.query_text_id
JOIN sys.query_store_plan qsp on qsq.query_id=qsp.query_id
JOIN sys.query_store_runtime_stats qrs on qsp.plan_id = qrs.plan_id
WHERE qsqt.query_sql_text like '%recompile%';
GO

Note: I’ve kept it simple here and am looking at all rows in sys.query_store_runtime_stats. That means that if I’ve had query store on for a while and have multiple intervals, I may get multiple rows for the same query. You can add qrs.runtime_stats_interval_id to the query to see that.

Here’s a sample of the results:

query store results for recompile queries

(Click to see the beauty of query store in a larger image)

YAY! For all my queries that were run with RECOMPILE hints, I can see information about how many times they were run, execution stats, their query text and plan, and even information about compilation.

And yes, I have the execution plans, too — the “CAST(qsp.query_plan AS XML) AS mah_query_plan” totally works.

Want to Learn More about Query Store and Recompile?

In this post, I just talked about observing recompile overhead with Query Store. Grant Fritchey has an excellent post that addresses the question: what if you tell Query Store to freeze a plan for a query with a recompile hint? Will you still pay the price of recompile? Read the answer on Grant’s blog here.

↧

Joins, Predicates, and Statistics in SQL Server

December 8, 2015, 7:00 am

≫ Next: 3 Things I Wish I’d Learned Earlier as a SQL Server DBA

≪ Previous: Does OPTION (RECOMPILE) Prevent Query Store from Saving an Execution Plan?

Joins can be tricky. And where you put your ‘where’ clause may mean more than you think!

Take these two queries from the AdventureWorksDW sample database. The queries are both looking for data where SalesTerritoryCountry = ‘NA’ and they have the same joins, but the first query has a predicate on SalesTerritoryCountry while the second has a predicate on SalesTerritoryKey.

/* Query 1: Predicate on SalesTerritoryCountry */
select 
  ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey, CurrencyKey, 
  fis.SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, 
  UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, 
  TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, 
  OrderDate, DueDate, ShipDate
from dbo.FactInternetSales fis
join dbo.DimSalesTerritory st on 
  fis.SalesTerritoryKey=st.SalesTerritoryKey
where st.SalesTerritoryCountry = N'NA'
GO

/* Query 2: Predicate on SalesTerritoryKey (for the exact same country) */
select 
  ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey, CurrencyKey, 
  fis.SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, 
  UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, 
  TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, 
  OrderDate, DueDate, ShipDate
from dbo.FactInternetSales fis
join dbo.DimSalesTerritory st on 
  fis.SalesTerritoryKey=st.SalesTerritoryKey
where st.SalesTerritoryKey = 11;
GO

Take a look at the difference in their estimated execution plans: 1_estimated_plan_differences

Although these queries return the same data, the plans and performance are very different. Query 1 (predicate written against SalesTerritoryCountry) estimates too high and chooses a much larger plan than it needs. It doesn’t have a clue that there are zero rows for SalesTerritoryCountry = ‘NA’.

2_query_comparison

Hash joins aren’t necessarily bad, but we don’t need one for this query. Why do the heavy lifting for no rows?

Where is Query #1 Getting That 6,039.8 Row Estimate?

SQL Server uses statistics for estimates. It’s using them for both of these queries, just in different ways. For the query “where st.SalesTerritoryCountry = N’NA'”, it uses two statistics:

dbo.DimSalesTerritory This is a small dimension table. SQL Server uses a column statistic on the SalesTerritoryCountry column. It’s able to look the value NA up in a detailed histogram that describes the data distribution to see that there’s just one row for that value in the table. Super simple!

3_statistics_dimension

dbo.FactInternetSales Things get more complicated here. The FactInternetSales table doesn’t know anything about SalesTerritoryCountry. It only has the column SalesTerritoryKey.

And although it’s joining on the column, it doesn’t understand that the SalesTerritoryCountry = NA is the same thing as SalesTerritoryKey = 11.

Query optimization has to be fast, and SQL Server has to figure everything out before it begins executing the query. It doesn’t have the ability to go run a query like “SELECT SalesTerritoryKey from dbo.DimSalesTerritory WHERE SalesTerritoryCountry = N’NA'” before it can even optimize the query.

So it needs to make a guess about how many rows an unknown Country has in FactInternetSales.

It does this using a part of the statistics called the “Density Vector”. SQL Server has statistics on an index that I created on the SalesTerritoryKey column in this case. The density vector describes how many rows on average any given SalesTerritoryKey has associated with it in the fact table.

4_statistics_density_fact

The average density is .1 and there are 60398 rows in the table. 60398 * 0.1 = 6039.8 … there’s our row estimate!

In this case, 6,039.8 rows is enough that SQL Server decides that many nested loop lookups would be a drag. It decides to build some hash tables and figure it out in memory. Honestly, it’s not a terrible choice in this case. Yeah, it needs a memory grant, but it gets the work done in a very small amount of milliseconds and calls it a day.

If this was just one part of a much larger and more complex plan, it could have much bigger consequences, and make a more significant difference in runtime.

One Cool Thing About Query #2

Notice that on Query #2, I wrote the predicate against the dimension table, not the fact table. It was able to see that I joined on those columns and use that predicate against the fact table itself to get a very specific estimate.

That’s pretty cool!

What Does this Mean for Writing Queries?

Whenever you have a chance to simplify a query, it can be beneficial.

In this case, if we’re writing a predicate against the SalesTerritoryKey column, it’s fair to ask if we need to join the two tables at all. If we have a checked foreign key that ensures that every SalesTerritoryKey has a matching parent row in DimSalesTerritory and we don’t actually want to return any columns from DimSalesTerritory, we don’t even need to do the join.

In complex situations when performance is important, thinking carefully about how you write queries and where you put predicates can sometimes help you tune.

↧

3 Things I Wish I’d Learned Earlier as a SQL Server DBA

January 14, 2016, 8:00 am

≫ Next: How to Find Missing Index Requests in Query Store

≪ Previous: Joins, Predicates, and Statistics in SQL Server

Back in my day, we used a courier for log shipping.

Hindsight is everything. I was lucky to be trained by a great team of DBAs back when I first started with SQL Server. But it’s hard to know exactly what you really need to know, particularly as new tools are becoming available.

Here’s the three things I wish I’d caught on to sooner.

3. How to See What’s Running in SQL Server (and How Long It’s Been Running)

I used the built in sp_who2 procedure for a long, long time.

Sp_who2 doesn’t tell you much. It doesn’t tell you exactly which queries are running, how long they’ve been running, or what they’re waiting on. It shows you information on a lot of sessions that are just sitting there, sleeping, doing nothing. And it even orders things funny if you have a lot of sessions.

Now, I’m a huge fan of Adam Machanic’s free sp_WhoIsActive procedure. One of the most popular posts I’ve ever written on this blog is how to log results from sp_WhoIsActive into a table. That’s still cool!

sp_WhoIsActive has less overhead than SQL Server’s Activity Monitor, and I find it easier to understand, too.

2. How to Deal With Wait Stats

There’s two things to know:

Wait stats are critical for performance tuning SQL Server
Learning wait stats isn’t like riding a bike, it’s like learning a foreign language

Much like learning a language, you need to spend some time with it and get used to what’s normal and listen a lot. By sampling wait stats regularly and baselining, you learn what’s normal.

You also need to research individual wait stats and dig into their meaning. You need to learn how different wait stats are related to one another. That’ll take time.

A great free tool to get started with wait stats is Brent Ozar’s sp_AskBrent. You can sample wait stats since startup, for a defined interval, or log them to a table. (Bonus: it shows you some wait stats that Activity Monitor hides.)

1. Why the Transaction Log File is So Important

Even after I learned how to set up log backups, I didn’t really understand what the point was. When I finally got the concept I felt very much like a light bulb had come on over my head. Everything about backups suddenly made much more sense.

Here’s the basics:

When modifications happen in SQL Server, they must go to two places, and two places only: memory and the transaction log. This is because SQL Server uses “write-ahead logging”. (AKA “WAL“. Creative acronym, right?)

Lots of committed inserts, updates, and deletes may be recorded only in your SQL Server’s memory or the transaction log, and they may not be in the data files at all. So if you suddenly lose your data files, the ONLY place those modifications might be written to disk is in your transaction log– and that’s why log backups are so critical.

This is also why the transaction log file is even more important to protect on the SQL Server than the data files are. If the data files vanished and the SQL Server was still online, you could back up the tail of the log to avoid data loss.

↧

How to Find Missing Index Requests in Query Store

January 19, 2016, 6:30 am

≫ Next: Does Query Store’s “Regression” Always Catch Nasty Parameter Sniffing?

≪ Previous: 3 Things I Wish I’d Learned Earlier as a SQL Server DBA

Query-Store

SQL Server 2016’s new Query Store feature makes it easier than ever for DBAs and developers to identify the most important queries to tune– and perhaps apply a quick fix by pinning an execution plan.

But how does the new Query Store feature work with SQL Server’s existing “missing index” request feature? When the query optimizer generates a plan, it’s frequently able to see a place where an index might make the query faster and flag it in as an “index request”. Does that work with Query Store?

I’m using the free SQLIndexWorkbook database to look at how these two features work together.

Enabling Query Store

You can enable Query Store through the GUI just by right clicking on the database and going into its properties. Here’s the same thing being done in TSQL:

USE master
GO
ALTER DATABASE SQLIndexWorkbook SET QUERY_STORE = ON
GO
ALTER DATABASE SQLIndexWorkbook SET QUERY_STORE 
    (OPERATION_MODE = READ_WRITE, CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 30), 
    DATA_FLUSH_INTERVAL_SECONDS = 300, INTERVAL_LENGTH_MINUTES = 10)
GO

You have a few things to select when you enable QueryStore. I picked an interval length of just 10 minutes just for testing purposes.

Let’s Generate Some Missing Index Requests

I generated some activity in the database for Query Store to see using these queries. The first two queries generate a missing index request, the third does not.

USE SQLIndexWorkbook;
GO

/* I ran this once to create the index */
CREATE NONCLUSTERED INDEX ix_FirstNameByYear_FirstNameId ON agg.FirstNameByYear (FirstNameId ASC);
GO

/* I ran the queries below many times to generate activity */
SELECT 
    fact.ReportYear,
    fact.Gender,
    fact.NameCount
FROM agg.FirstNameByYear AS fact
JOIN ref.FirstName AS dim 
    ON fact.FirstNameId=dim.FirstNameId
WHERE 
    fact.Gender = 'M' AND 
    dim.FirstName = 'Sue';
GO

SELECT 
    COUNT(*)
FROM agg.FirstNameByYear AS fact
JOIN ref.FirstName AS dim 
    ON fact.FirstNameId=dim.FirstNameId
WHERE 
    fact.Gender = 'M';
GO

SELECT 
	fact.ReportYear,
	fact.Gender
FROM agg.FirstNameByYear AS fact
JOIN ref.FirstName AS dim 
	ON fact.FirstNameId=dim.FirstNameId
WHERE 
	fact.Gender = 'M';
GO

I’m not recompiling these queries or causing their execution plans to vary, so I don’t expect to see any “regression” in Query Store. In this case I just want to see how it shows me which queries have an index request and play around with how to find them.

Using the Query Store GUI to see Missing Index Requests

Query Store focuses on the queries that are using the most resources, overall resource consumption, and the queries that have gotten slower with different plans. Query Store makes it easy to see the plans for those types of queries and you can visually identify if there’s an index request in the execution plan for that time period.

Here’s how you can view Query Store’s reports in Object Explorer. Just doubleclick on the interactive report you want to use:

QueryStore-Reports

Opening “Top Resource Consuming Queries”, it shows me the top 25 plans by duration. It shows detail information for the #1 top plan by default, and colors that bar green.

Plan #1: No Missing Index Request

In my case, there’s no missing index request on the top plan:

How can I tell there’s no index request? In the bottom pane for the graphical execution plan, there’s no green hint appearing!

Even though there’s no index request, this is the query that’s run for the greatest duration. And looking at that plan, I can see that it’s doing an index scan. It’s worth looking at the query to see if it might be able to be improved, or ask what it’s for. I can also tell from the chart in the top right that this query ran 5 times, all in the same time interval. Pretty handy.

Plan #2: A Missing Index Request Appears

Clicking on the second bar in the graph of top resource consumers, I do find a missing index request for the execution plan I’m examining:

Now the plan on the bottom shows an index request with the green hint.

One little issue: when I right click on the query plan and select “Missing Index Details”, the GUI throws an error. But I’m using a pre-release CTP, so I’m not judging.

I can still right click on the plan and say “Show Execution Plan XML” and see the missing index request details in the execution plan itself. I can also click on a “script” button at the top and script out the TSQL for the query itself if I’d like to test it directly.

Querying the Query Store DMVs Directly

I like the GUI quite a bit for Query Store so far. But sometimes, you just want to run a query.

Here’s a bit of TSQL that looks for the queries that have asked for an index in the last 24 hours (and been seen by QueryStore), and aggregates information about them.

The query shows the most recent plan collected only. This might be a little confusing, because the query could potentially have different plans over that time, and not all of them might have missing index requests. In other words, you may need to do a little more digging.

SELECT
    SUM(qrs.count_executions) * AVG(qrs.avg_logical_io_reads) as est_logical_reads,
    SUM(qrs.count_executions) AS sum_executions,
    AVG(qrs.avg_logical_io_reads) AS avg_avg_logical_io_reads,
    SUM(qsq.count_compiles) AS sum_compiles,
    (SELECT TOP 1 qsqt.query_sql_text FROM sys.query_store_query_text qsqt
        WHERE qsqt.query_text_id = MAX(qsq.query_text_id)) AS query_text,	
    TRY_CONVERT(XML, (SELECT TOP 1 qsp2.query_plan from sys.query_store_plan qsp2
        WHERE qsp2.query_id=qsq.query_id
        ORDER BY qsp2.plan_id DESC)) AS query_plan,
    qsq.query_id,
    qsq.query_hash
FROM sys.query_store_query qsq
JOIN sys.query_store_plan qsp on qsq.query_id=qsp.query_id
CROSS APPLY (SELECT TRY_CONVERT(XML, qsp.query_plan) AS query_plan_xml) AS qpx
JOIN sys.query_store_runtime_stats qrs on qsp.plan_id = qrs.plan_id
JOIN sys.query_store_runtime_stats_interval qsrsi on qrs.runtime_stats_interval_id=qsrsi.runtime_stats_interval_id
WHERE	
    qsp.query_plan like N'%<MissingIndexes>%'
    and qsrsi.start_time >= DATEADD(HH, -24, SYSDATETIME())
GROUP BY qsq.query_id, qsq.query_hash
ORDER BY est_logical_reads DESC
GO

This isn’t a work of art, and may need tuning for large Query Store implementations in busy environments. Look at the end of the query– see that non sargable query_plan like N’%<MissingIndexes>%’? Yeah, not pretty, but that query_plan column is stored as NVARCHAR(MAX), and we don’t have an “index request” column.

Don’t Shoehorn a New Feature into an Old Process

I really like the way the Query Store feature was designed. It wasn’t just wrapped around the existing index request feature: it shows the requests if they are there, but it doesn’t fixate on them or prioritize by them.

And that’s totally fine, because the older index request feature isn’t perfect. The top duration query in my example above isn’t asking for an index, but it returns a ton of rows. It’s worth examining!

↧

Does Query Store’s “Regression” Always Catch Nasty Parameter Sniffing?

January 21, 2016, 8:00 am

≫ Next: SQL Server’s YEAR() Function and Index Performance

≪ Previous: How to Find Missing Index Requests in Query Store

SQL Server 2016’s new Query Store feature has an option that looks for “regressed” query plans. Plan choice regression is explained in Books Online this way:

During the regular query execution Query Optimizer may decide to take a different plan because important inputs became different: data cardinality has changed, indexes have been created, altered or dropped, statistics have been updated, etc. For the most part new plan it picks is better or about the same than one was used previously. However, there are cases when new plan is significantly worse – we refer to that situation as plan choice change regression.

Simply put, we’re looking for queries that have gotten slower.

Of course I started wondering exactly which scenarios might still be hard to identify with the new feature. And I wondered– does it have to be two different query execution plans? Or could it find same plan which is sometimes fast and sometimes slow? In this post I’ll show:

An example of parameter sniffing that doesn’t qualify as “regressed”, but is sometimes significantly slower
The TSQL for the “Regressed Queries” report
There is no restriction that it has to be two different query plans — the same plan could show up as regressed if it was slower (it depends on the patterns in which it’s run)
Why looking at “max” in the “Top Resource Consumers” report is important for queries like this

Enter Parameter Sniffing

When you reuse parameterized code in SQL Server, the optimizer “sniffs” the parameters you compile the query with on the first run, then re-uses that plan on subsequent executions. Sometimes you get a case where that first run is fast, but later runs are slow because the execution plan really isn’t great for every set of input values.

Here’s an example of some code that’s particularly troublesome. This uses the free SQLIndexWorkbook database:

CREATE INDEX ix_FirstNameByYear_FirstNameId on 
    agg.FirstNameByYear (FirstNameId);
GO

IF OBJECT_ID('NameCountByGender','P') IS  NULL
    EXEC ('CREATE PROCEDURE dbo.NameCountByGender AS RETURN 0')
GO

ALTER PROCEDURE dbo.NameCountByGender
    @gender CHAR(1),
    @firstnameid INT
AS
    SET NOCOUNT ON;

    SELECT 
        Gender,
        SUM(NameCount) as SumNameCount
    FROM agg.FirstNameByYear AS fact
    WHERE 
        (Gender = @gender or @gender IS NULL)
        AND 
        (FirstNameId = @firstnameid or @firstnameid IS NULL)
    GROUP BY Gender;
GO

I’ve enabled QueryStore on the SQLIndexWorkbook database. I clear out any old history to make it easy to see what’s going on with my examples with this code:

ALTER DATABASE SQLIndexWorkbook SET QUERY_STORE CLEAR;
GO

dbo.NameCountByGender is Sometimes Fast, Sometimes Slow

I first run the stored procedure 100 times for males named Sue. This gets a plan into cache that is under 100 milliseconds when run for specific names, but takes more than a second if the @firstnameid parameter is passed in as NULL.

Here’s that first run:

EXEC dbo.NameCountByGender @gender='M', @firstnameid=91864;
GO 100

After running the “fast query” 100 times, I open QueryStore’s “Top Resource Consuming Queries” report and can see it right away. The average duration of the query was 73.5 milliseconds:

Let’s re-run the stored procedure with different values — the slow ones, that looks at the values for all men. This will reuse the exact same plan, but it will perform much more slowly. We only run it twice:

EXEC dbo.NameCountByGender @gender=M, @firstnameid=NULL;
GO 2

I happened to run the query in a new “interval” or time period. This means the slow plan now shows up right away in the top resource consumers to the right of the old plan. Hovering over the circle for the plan, it saw both runs and saw an average duration of 1.2 seconds.

But then, the procedure is run 100 more times for a specific first name, the fast version:

EXEC dbo.NameCountByGender @gender='M', @firstnameid=91864;
GO 100

The Query Store report is showing us the bubbles based on the average duration. This makes the bubble move downward in the graph after those executions:

It’s possible to see that there was a slow run by changing the graph to show the “max duration”, but you have to know to do that.

This Doesn’t Show as a Regressed Query

Here’s what my Regressed Queries report looks like:

Nobody Home!

Even looking at the “Max” statistic, this query doesn’t show up.

What Qualifies as a Regressed Query in Query Store?

There’s one easy way to figure this out. I opened SQL Server Profiler, selected the Tuning template, and refreshed the Regressed Query report. (I still use Profiler for stuff like this because I can do that sequence in about three seconds. Hey, I’m lazy.) Here’s the query that the Regressed Query report looks like it’s using (as of 2016 CTP 3.2 at least):

exec sp_executesql N'WITH
hist AS
(
    SELECT 
        p.query_id query_id, 
        CONVERT(float, MAX(rs.max_duration)) max_duration, 
        SUM(rs.count_executions) count_executions,
        COUNT(distinct p.plan_id) num_plans 
    FROM sys.query_store_runtime_stats rs
        JOIN sys.query_store_plan p ON p.plan_id = rs.plan_id
    WHERE NOT (rs.first_execution_time > @history_end_time OR rs.last_execution_time < @history_start_time)
    GROUP BY p.query_id
),
recent AS
(
    SELECT 
        p.query_id query_id, 
        CONVERT(float, MAX(rs.max_duration)) max_duration, 
        SUM(rs.count_executions) count_executions,
        COUNT(distinct p.plan_id) num_plans 
    FROM sys.query_store_runtime_stats rs
        JOIN sys.query_store_plan p ON p.plan_id = rs.plan_id
    WHERE NOT (rs.first_execution_time > @recent_end_time OR rs.last_execution_time < @recent_start_time)
    GROUP BY p.query_id
)
SELECT TOP (@results_row_count)
    results.query_id query_id,
    results.query_text query_text,
    results.duration_regr_perc_recent duration_regr_perc_recent,
    results.max_duration_recent max_duration_recent,
    results.max_duration_hist max_duration_hist,
    ISNULL(results.count_executions_recent, 0) count_executions_recent,
    ISNULL(results.count_executions_hist, 0) count_executions_hist,
    queries.num_plans num_plans
FROM
(
    SELECT
        hist.query_id query_id,
        qt.query_sql_text query_text,
        ROUND(CONVERT(float, recent.max_duration-hist.max_duration)/NULLIF(hist.max_duration,0)*100.0, 2) duration_regr_perc_recent,
        ROUND(recent.max_duration, 2) max_duration_recent, 
        ROUND(hist.max_duration, 2) max_duration_hist,
        recent.count_executions count_executions_recent,
        hist.count_executions count_executions_hist   
    FROM hist 
        JOIN recent ON hist.query_id = recent.query_id        
        JOIN sys.query_store_query q ON q.query_id = hist.query_id
        JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
    WHERE
        recent.count_executions >= @min_exec_count
) AS results
JOIN 
(
    SELECT
        p.query_id query_id, 
        COUNT(distinct p.plan_id) num_plans 
    FROM sys.query_store_plan p       
    GROUP BY p.query_id
) AS queries ON queries.query_id = results.query_id
WHERE duration_regr_perc_recent > 0
ORDER BY duration_regr_perc_recent DESC
OPTION (MERGE JOIN)',N'@results_row_count int,@recent_start_time datetimeoffset(7),@recent_end_time datetimeoffset(7),@history_start_time datetimeoffset(7),@history_end_time datetimeoffset(7),@min_exec_count bigint',@results_row_count=25,@recent_start_time='2016-01-18 14:00:42.7253669 -08:00',@recent_end_time='2016-01-18 15:00:42.7253669 -08:00',@history_start_time='2016-01-11 15:00:42.7253669 -08:00',@history_end_time='2016-01-18 15:00:42.7253669 -08:00',@min_exec_count=1

First up, I don’t see anything in that TSQL that insists that a “regressed” query must have different execution plans at different sample times. That’s good! (The Books Online text raised this as a question for me.)

Our query isn’t showing up in the results because of the “WHERE duration_regr_perc_recent > 0” predicate. I dug into the data, and our query (query_id=4), has max_duration_recent = 1231557 and max_duration_hist = 1231557. My ‘recent’ period was defined as the latest hour. Based on the pattern sampled, it sees the regression as zero.

In other words, our query was sometimes fast and sometimes slow in the last hour. It hadn’t run before that, so it wasn’t slower than it had been historically.

The Regressed Query Report Won’t Always Catch Parameter Sniffing

If a frequently executed query is usually fast but occasionally slow, and that pattern happens frequently across sample times, that query may not always show up in the regressed queries pane.

But if it’s one of the most frequent queries on your system, it will still show up in Query Store’s “Top Resource Consumers” report, regardless. We had no problem at all seeing it there.

Takeaway: When Looking at Top Resource Consumers, Look at Max (Not Just Averages)

The Query Store reports have lots of great options. Averages are interesting, but switching to view the max can help identify when you’ve got something causing a high skew in query execution time. I also prefer looking at CPU time (how much time did you spend working) rather than duration, because duration can be extended by things like blocking.

Here’s what the chart in the “Top 25 Resource Consumers” report looks like for Max CPU time for our query for that time period:

That jumps out a little more as having an anomalous long run!

↧

SQL Server’s YEAR() Function and Index Performance

March 1, 2016, 8:00 am

≫ Next: Live Query Statistics Don’t Replace Actual Execution Plans

≪ Previous: Does Query Store’s “Regression” Always Catch Nasty Parameter Sniffing?

SQL Server’s really clever about a lot of things. It’s not super clever about YEAR() when it comes to indexes, even using SQL Server 2016 — but you can either make your TSQL more clever, or work around it with computed columns.

Short on time? Scroll to the bottom of the post for the summary.

The Problem With YEAR()

I’ve created a table named dbo.FirstNameByBirthDate_2005_2009 in the SQLIndexWorkbook database. I’ve taken the history of names by year, and made them into a fake fact table– as if a row was inserted every time a baby was born. The table looks like this:

I want to count the females born in 2006. The most natural way to write this query is:

SELECT
    COUNT(*)
FROM dbo.FirstNameByBirthDate_2005_2009
WHERE 
    Gender = 'F'
    AND YEAR(FakeBirthDateStamp) = 2006
GO

Pretty simple, right?

This looks like it’d be a really great index for the query:

CREATE INDEX ix_women 
    ON dbo.FirstNameByBirthDate_2005_2009
        (Gender, FakeBirthDateStamp);
GO

All rows are sorted in the index by Gender, so we can immediately seek to the ‘F’ rows. The next column is a datetime2 column, and sorting the rows by date will put all the 2006 rows together. That seems seekable as well. Right? Right.

After creating our index, here’s the actual execution plan. At first, it looks like it worked. There’s a seek at the very right of this plan!

But if we hover over that index seek, we can see in the tooltip that there’s a hidden predicate that is NOT a seek predicate. This is a hidden filter. And because this is SQL Server 2016, we can see “Number of Rows Read” — it had to read 9.3 million rows to count 1.9 million rows. It didn’t realize the 2006 rows were together– it checked all the females and examined the FakeBirthDateStamp column for each row.

Solution 1: Rewrite Our TSQL

We can make this better with a simple query change. Let’s explain to the optimizer, in detail, what we mean by 2006, like this:

SELECT
	COUNT(*)
FROM dbo.FirstNameByBirthDate_2005_2009
WHERE 
	Gender = 'F'
	AND FakeBirthDateStamp >= CAST('1/1/2006' AS DATETIME2(0)) 
		and FakeBirthDateStamp < CAST('1/1/2007' AS DATETIME2(0))
GO

Our actual execution plan looks the same from the outer shape. We still have a seek, but the relative cost of it has gone up from 86% to 89%. Hm. Did it get worse?

Hovering over the index seek, the tooltip tells that it got much better. We have two seek predicates, and we only needed to read the rows that we actually counted. Way more efficient!

Solution 2: Add an Indexed Computed Column

What if you can’t change the code? There’s a really cool optimization with computed columns that can help.

First, I’m going to add a column to my table called BirthYear, which uses the YEAR() function, like this:

ALTER TABLE dbo.FirstNameByBirthDate_2005_2009
    ADD BirthYear AS YEAR(FakeBirthDateStamp);
GO

Then I’m going to index BirthYear and Gender:

CREATE INDEX ix_BirthYear on dbo.FirstNameByBirthDate_2005_2009 (BirthYear, Gender);
GO

Now here’s the really cool part of the trick. I don’t have to change my code at all to take advantage of the BirthYear column. I’m going to run the same old query that uses the year function. (Here it is, just to be clear.)

SELECT
    COUNT(*)
FROM dbo.FirstNameByBirthDate_2005_2009
WHERE 
    Gender = 'F'
    AND YEAR(FakeBirthDateStamp) = 2006
GO

SQL Server auto-magically matches YEAR(FakeBirthDateStamp) to my computed column, and figures out it can use the index. It does a beautiful seek, every bit as efficient as if I’d rewritten the code:

NitPicker’s Corner: Disclaimers and Notes

When considering indexed computed columns:

Test to make sure that SET OPTIONS don’t cause inserts/updates/deletes to fail
Test to make sure that DBCC CHECKDB doesn’t become significantly slower after the indexed computed column is created. (This doesn’t happen on every dataset per my testing.)

This issue isn’t specific to the DATETIME2 data type. It still happens with good old DATETIME as well.

My tests were all run against SQL Server 2016 CTP3.3.

TLDR; Just the Facts Please

There’s three main things to remember here:

A seek isn’t always awesome. Look for hidden predicates on the tooltip to the seek, because there may be hidden predicates in there which are NOT seek predicates.
The optimizer isn’t as smart with YEAR() as you might think, so consider other code constructs.
If you can’t rewrite the code and these queries need optimization, test out indexed computed columns to see if they may help.

↧

Live Query Statistics Don’t Replace Actual Execution Plans

March 17, 2016, 8:00 am

≫ Next: 3 Tricks with STATISTICS IO and STATISTICS TIME in SQL Server

≪ Previous: SQL Server’s YEAR() Function and Index Performance

I like SQL Server’s new Live Query Statistics feature a lot for testing and tuning large queries. One of my first questions was whether this could replace using actual execution plans, or if it’s useful to use both during testing.

Spoiler: Both are useful. And both can impact query performance.

Live Query Statistics Gives Insight into Plan Processing

I’m loading a bunch of rows into a partitioned heap. Here’s what the insert looks like in Live Query Statistics when run with parallelism. Note that the progress on the Sort operator remains at 0% done until the Clustered Index Scan and Compute Scalar operators feeding it are 100% complete:

Here’s what that same insert looks like when run at maxdop 1. In this case the operators are ordered differently, and the Sort Operator is able to make progress while the Clustered Index Scan is still running.

This insert runs significantly faster at MAXDOP 1, and Live Query Statistics really helps see why. The single threaded plan doesn’t have to wait — it can stream more through at once.

The single threaded plan uses teamwork, like this:

Live Query Statistics Didn’t Tell Us About the tempdb Spill

When the query completes, Live Query Statistics tells us that everything is 100% done. It doesn’t say anything about a tempdb spill.

To see the that this tempdb spill occurred as part of the query, we need to have turned on “Actual Execution Plans”. Then we can see click over to the Execution Plans tab and see this:

Hovering over the warning on the sort operator, we can see more details about the tempdb spill:

Observation Has a Performance Cost

Getting plan details isn’t free. The amount of impact depends on what the query is doing, but there’s a stiff overhead to collecting actual execution plans and to watching live query statistics.

These tools are great for reproing problems and testing outside of production, but don’t time query performance while you’re using them– you’ll get too much skew.

↧

3 Tricks with STATISTICS IO and STATISTICS TIME in SQL Server

March 24, 2016, 8:00 am

≫ Next: Updating Statistics in SQL Server: Maintenance Questions & Answers

≪ Previous: Live Query Statistics Don’t Replace Actual Execution Plans

When you need to measure how long a query takes and how many resources it uses, STATISTICS TIME and STATISTICS IO are great tools for interactive testing in SQL Server. I use these settings constantly when tuning indexes and query.

Here’s three tricks that come in really handy to up your STATISTICS game.

1. You can turn both STATISTICS IO and STATISTICS TIME on and off with a single line of code

I learned this trick from Michael J. Swart a while back. Most people do this, because it’s what the documentation shows:

SET STATISTICS IO ON;
GO
SET STATISTICS TIME ON;
GO

But you can just do this, and it works perfectly:

SET STATISTICS IO, TIME ON;
GO

The same trick works for turning the settings off.

This shortcut has probably saved me an hour of typing in the last year alone. (I totally made that metric up, but hooray for saved keystrokes.)

2. When writing demo code, you should remember to turn off the lights

I’m sure he’s not the first person in the universe to do this, but Tim Ford was the first person I noticed who did this consistently in his code samples:

SET STATISTICS IO, TIME ON;
GO

/* Query you want to measure goes here*/
SELECT name
FROM sys.databases;

SET STATISTICS IO, TIME OFF;
GO

This is worth doing, because having output spewing out to the Messages tab when you don’t want to look at it can be distracting, and might slow some queries down.

Yep, this is more keystrokes creeping back in. Look, we already saved up a bunch so we’ve got keystrokes to spare.

When STATISTICS IO gets opinionated

3. You can make someone else format your output

Want to make your output pretty and keep it for yourself offline? Vicky Harp built an Excel parser for STATISTICS IO output.

Want to format your output online in a web browser? Richie Rump built StatisticsParser.com.

Thanks to Michael, Tim, Vicky, and Richie for making query tuning easier and more effective!

↧

Updating Statistics in SQL Server: Maintenance Questions & Answers

April 18, 2016, 7:45 am

≫ Next: Managing Statistics in SQL Server for DBAs (videos)

≪ Previous: 3 Tricks with STATISTICS IO and STATISTICS TIME in SQL Server

I’ve been asked a lot of questions about updating statistics in SQL Server over the years. And I’ve asked a lot of questions myself! Here’s a rundown of all the practical questions that I tend to get about how to maintain these in SQL Server.

I don’t dig into the internals of statistics and optimization in this post. If you’re interested in that, head on over and read the fahhhbulous white paper, Statistics Used by the Query Optimizer in SQL Server 2008. Then wait a couple days and chase it with it’s charming cousin, Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator.

I’m also not talking about statistics for memory optimized tables in this article. If you’re interested in those, head over here.

Quick Links to Jump Around This Article

General Advice on Statistics Maintenance

Be moderate in thy statistics updates, but not in thy hair styling.

Back when I read philosophy, I found Aristotle a bit annoying because he talked so much about “moderation”. Doing too little is a vice. Doing too much was a vice. Where’s the fun in that?

Unfortunately, Aristotle was right when it comes to statistics maintenance in SQL Server. You can get in trouble by doing too little. And you can get in trouble by doing too much.

⇒ Be a little proactive: If you have millions of rows in some of your tables, you can get burned by doing no statistics maintenance at all if query performance stats to get slow and out of date statistics are found to be the cause. This is because you didn’t do any proactive work at all.

⇒ If you’re too proactive, you’ll eventually be sorry: If you set up statistics maintenance too aggressively, your maintenance windows can start to run long. You shouldn’t run statistics maintenance against a database at the same time you’re checking for corruption, rebuilding indexes, or running other IO intensive processes. If you have multiple SQL Servers using shared storage, that maintenance may hit the storage at the same time. And what problem were you solving specifically?

⇒ The moderate approach:

Start with a SQL Server Agent job that updates statistics and index maintenance as part of a single operation/script run. Most folks begin running this once a week. If you have a nightly maintenance window, that can work as well.
Only run statistics updates more frequently when it’s clear that you need to, and then customize the more frequent job to only update the specific statistics that are problematic. Only use FULLSCAN on individual stats if you must, and document which queries require it.
If your tables have millions of rows and you run into estimate problems for recent data, consider using Trace Flag 2371 to increase the frequency of statistics updates for these tables instead of manually updating statistics throughout the day. (SQL Server 2008R2 SP1 +, not needed in SQL Server 2016.)
If you have a batch process or ETL that loads significant amounts of data, consider having it update statistics on impacted tables at the end of the process. Exception: if the job creates or rebuilds indexes at the end of its run, the statistics related to those indexes are already updated with FULLSCAN and do not require any maintenance.
Beware statistics updates that need to run frequently throughout the day: if this is your only option to fix a problem, you are applying a reactive fix which consumes IO and the query is periodically still slow. This is rarely a solid long term solution. Investigate stabilizing execution plans via indexes and query tuning instead.

Which Free Third Party Tools Help Update Statistics?

One widely used free script is Ola Hallengren’s SQL Server Index and Statistics Maintenance script.

If you’re managing lots of SQL Server instances and want ultimate customization, there is a free version of Minion Reindex.

What are Statistics?

Statistics are small, lightweight objects that describe the distribution of data in a SQL Server table. The SQL Server query optimizer uses statistics to estimate how many rows will be returned by parts of your query. This heavily influences how it runs your queries.

For example, imagine you have a table named agg.FirstNameByYear and you run this query:

SELECT-BY-FirstName-Id

SQL Server needs to estimate how many rows will come back for FirstNameId=74846. Is it 10 million rows, or 2? And how are the rows for Gender distributed?

The answers to both of these questions impact what it does to GROUP those rows and SUM the NameCount column.

Statistics are lightweight little pieces of information that SQL Server keeps on tables and indexes to help the optimizer do a good job.

What Creates Column Statistics?

If the agg.FirstNameByYear table was freshly created when we ran our query, it would have no column statistics.

By default, the SQL Server optimizer will see that no statistics exists, and wait while two column statistics are created on the FirstNameId and Gender columns. Statistics are small, and are created super fast– my query isn’t measurably any faster when I run it a second time.

Here’s what the statistics look like on the table in Object Explorer. Notice the artisanally crafted names.

Auto-Created-Statistics

If you want to verify which column is in each auto-created statistic, you can do that with this query:

SELECT s.name,
    s.auto_created,
    s.user_created,
    c.name as colname
FROM sys.stats AS s
JOIN sys.stats_columns as sc 
    on s.stats_id=sc.stats_id
    and s.object_id=sc.object_id
JOIN sys.columns as c on 
    sc.object_id=c.object_id
    and sc.column_id=c.column_id
WHERE s.object_id=OBJECT_ID('agg.FirstNameByYear')
    and s.auto_created = 1
ORDER BY sc.stats_column_id;
GO

Sure enough, here are our statistics, and they are on Gender and FirstNameId. These are not considered ‘user created’ even though our user query was the cause of them being auto-generated. (“User created” means someone ran a CREATE STATISTICS command.)

Decoding-Statistics-Column-Names

SQL Server can now use the statistic on Gender and the statistic on FirstNameId for future queries that run.

What are Index Statistics?

Whenever you create an index in SQL Server, it creates a statistic associated with that index. The statistic has the same name of the index. Our agg.FirstNameByYear table has a clustered primary key, and here is the statistic that was created along with that index:

Index-Statistic

If columns are important enough to index, SQL Server assumes that it’s also important to estimate how many rows would be returned by that index when you query it. You can’t drop statistics associated with indexes (unless you drop the index).

Do I Need to Create Statistics Manually?

Nope! SQL Server does a really good job creating single-column statistics automatically.

Statistics will continue to be created on single columns when queries run as long as the “Auto Create Statistics” database property remains on. You can check that setting on your databases with the query:

SELECT is_auto_create_stats_on
FROM sys.databases;
GO

You should leave auto_create_stats_on set to 1 unless an application is specifically designed to manually create its own statistics. (That is pretty much limited to weirdoes like SharePoint.)

In rare situations, manually creating a multi-column statistic or a filtered statistic can improve performance… but keep reading to find out what those are and why it’s rare to require them.

Do I Need to Worry About Statistics Taking Up Too Much Storage in My Database?

Auto-created statistics are incredibly small, and you only get one per column in a table. Even if you have a statistic on every column in the table, this is a very small amount of overhead.

Statistics take up a negligible amount of space compared to indexes.

How Does Auto-Update Work for Statistics?

With default database settings, the SQL Server optimizer looks at how many changes have occurred for a given column statistic as part of query optimization. If it looks like a significant amount of rows in the column have changed, SQL Server updates the statistic, then optimizes the query. Because why optimize a query on bad data estimates?

The thresholds for when auto-update statistics kicks in are a little complicated.

For SQL Server 2005 – SQL Server 2014 (with no trace flags)

If the table has 1-500 rows, if 500 rows have changed, statistics are considered not fresh enough for optimization
If the table has 500+ rows, 500 rows + 20% of total rows in the table are the threshold

Note that the statistics don’t auto-update when the rows are changed. It’s only when a query is being compiled and the statistics would actually be used to make an optimization decision that the update is kicked off. Erin Stellato proves that in her post here.

Should I Turn On Trace Flag 2371 to Control Automatic Statistics Updates?

Trace Flag 2371 makes the formula for large tables more dynamic. When tables have more than 25,000 rows, the threshold for automatic statistics update becomes more dynamic, and this adjusts as the rowcount goes up. See a graph of the adjusting threshold in this post from the SAP team. (I think we know which team really felt some pain and wanted this trace flag to exist, because the trace flag was announced on their blog!)

Trace flag 2371 is available in SQL Server 2008R2 SP1-SQL Server 2014.

SQL Server 2016 automatically uses this improved algorithm. Woot! So if you’re using SQL Server 2016, you don’t need to decide. Erik Darling tested out the behavior in 2016 and wrote about it here.

Prior to 2016, here’s a quick rundown of pros and cons of TF2371:

Benefits to turning on Trace Flag 2371

This trace flag is documented in KB 2754171
This has been used by SAP for a while, and has become the default behavior in SQL Server 2016. That seems like a big vote of confidence.

Risks of turning on Trace Flag 2371

The trace flag is instance-wide (“global”). You can’t change this behavior for a specific database.
The documentation in KB 2754171 covers its behind a little conspicuously. They advise you that “If you have not observed performance issues due to outdated statistics, there is no need to enable this trace flag.” (Unless you run SAP.)
This trace flag didn’t win the popularity contest to make it into the main Books Online list of trace flags.
If you have this trace flag on and you need to open a support ticket, you may have to spend time explaining why you have this one and jumping through extra hoops to get a problem figured out.

Overall, this is a low risk trace flag. But in general it does NOT pay off to enable trace flags “just in case” for most people.

Should I Turn on Trace Flag 7471 for Concurrent Statistics Update?

Trace Flag 7471 is a global trace flag released in SQL Server 2014 SP1 Cumulative Update 6.

This trace flag changes the locking behavior associated with updating statistics. With the trace flag on, if you set up jobs to concurrently update statistics on the same table, they won’t block each other on metadata related to statistics updates.

Jonathan Kehayias has a great post on TF 7471 on SQL Performance.com. He demos the trace flag in a video for the SQL Skills Insider Newsletter where he shows the changes in locking behavior this flag introduces. Download the video in WMV format or MOV format to watch.

Benefits to turning on Trace Flag 7471

This trace flag is documented in KB 3156157.
Parikshit Savjani explains the trace flag on the MSSQL Tiger team blog

Risks of turning on Trace Flag 7471

Running concurrent statistics updates against the same table can use more IO and CPU resources. Depending on what you’re doing, that might not be awesome.
Running concurrent parallel statistics updates against very large tables can use more workspace memory resources than executing them serially. Running low on workspace memory can cause queries to have to wait for memory grants to even get started (RESOURCE_SEMAPHORE waits). (See more on this in Jonathan Kehayias’ post here.)
Microsoft’s testing showed “TF 7471 can increase the possibility of deadlock especially when creation of new statistics and updating of existing statistics are executed simultaneously.” (source)
The trace flag is instance-wide (“global”). You can’t change this behavior for a specific database.
The documentation in KB 3156157 is pretty minimal.
This is a new trace flag introduced in a cumulative update, and it isn’t very widely used yet.
This trace flag also didn’t win the popularity contest to make it into the main Books Online list of trace flags.
If you have this trace flag on and you need to open a support ticket, you may have to spend time explaining why you have this one and jumping through extra hoops to get a problem figured out.

Any trace flag can have unintended side effects. If I had a really good reason to run concurrent statistics updates against one table after exhausting other options to avoid the maintenance, I’d consider using this. I’d also only turn it on for that specific maintenance window, and turn it off afterward. (Edit: I wrote this before Microsoft blogged about the trace flag, but it turns out it’s exactly what the recommend in their blog post due to the deadlocking issue called out in “risks”.)

Should I Update Statistics Asynchronously?

Remember how I said that when SQL Server is optimizing a query, it smells the statistics to see if they’re still fresh, and then waits to update them if they’re smelly?

You can tell it not to wait to update them, and just use the smelly stats. It will then optimize the query. The stats will still be updated for any queries that come later, but boy, I hope those statistics were good enough for that first query!

You control this with the ‘Auto Update Stats Async’ setting. You can query this setting on your databases like this:

SELECT is_auto_update_stats_async_on
FROM sys.databases;
GO

Asynchronous statistics updates are usually a bad choice. Here’s why:

Statistics update is typically incredibly quick
It’s usually a greater risk that the first query will get a bad plan, rather than it having to wait

How Often Should You Manually Update Statistics?

If you know that a lot of data is changing in a table, you may want to manually update statistics. Possibly:

A lot of data is changing, but it’s below the 20%+500 rows limit of where auto-update kicks in because the table has hundreds of millions of rows
A small amount of data is changing in a large table, but it’s frequently queried

Data Loading: Update Statistics After Each Load

A classic example of a stats problem is “recent” data from an ETL. You have a table with 10 million rows. 500,000 rows of data are loaded for the most recent batch. Important queries are looking at the table for rows with LoadDate > two hours ago.

Statistics won’t update automatically for queries, because < 2,000,500 rows have changed.

Those queries will estimate that there is one row. (To be safe. Wouldn’t wanna estimate zero!) That’s a huge mis-estimation from 500,000, and you might end up with some very bad query plans. Gail Shaw wrote a great post on this– it’s called the “ascending date” problem.

In this kind of data load situation, it is helpful to run UPDATE STATISTICS against the entire table where the data has loaded at the end of the load. This is typically a lightweight command, but in very large tables, UPDATE STATISTICS may be run against specific indexes or columns that are sensitive to recent data to speed up the process. (Exception: if the job creates or rebuilds indexes at the end of its run, the statistics related to those indexes are already updated with FULLSCAN and do not require any maintenance.)

If you have found optimization problems and you can’t change your data load to manually update statistics, Trace Flag 2371 might help.

What if a Lot of Data Changes Through the Day Based on User Interactions?

Most databases are small. They don’t have millions and millions of rows. And most of those databases are just fine without any manual update of statistics.

If you’ve got tables with millions of rows, and you’ve found that statistics aren’t updating fast enough, you can consider:

Turning on Trace Flag 2371 to make auto-update statistics run more frequently on tables with large amounts of rows
Using a third party script to handle statistics maintenance along with your index maintenance (see the list above)

Should I Use a Maintenance Plan to Update Statistics?

No. You really shouldn’t.

Maintenance Plans are really dumb when it comes to statistics. If you run UPDATE STATISTICS against a table, index, or statistic, by default it will use a sampling. It won’t look at every row.

Maintenance Plans don’t do that. Instead, they require you to either:

Scan every row in the table, for every statistic in the table. So if you’ve got a 100 million row table with 50 column statistics and no nonclustered indexes, it’ll scan the 100 million row table fifty times.
Specify an exact percentage of each table to scan. For every table. So if you specify 3%, then 100 row tables would…. yeah.

Update-Statistics-Task-Maintenance-Plan

Neither of these options are good options.

You could use a maintenance plan to kick off a third party script that’s smarter about this, but don’t use a maintenance plan with an Update Statistics task. I found cases where that was set up using the default fullscan and it was taking many hours to run against a tiny database.

Can Statistics be Updated Using Parallelism?

If you update statistics with FULLSCAN, SQL Server may choose a parallel plan since SQL Server 2005.

If you create or update statistics with sampling, including the default sampling used in automatic statistics creation, SQL Server may choose a parallel plan as of SQL Server 2016.

When Should You Update Statistics with FULLSCAN?

As we just covered, if you update statistics on an entire table with FULLSCAN:

For any column statistic where there is not an index leading on that column, SQL Server will scan the whole table
For any column statistic where there IS an index leading on that column, or for an index statistic, SQL Server will scan the index

As tables grow, updating with FULLSCAN starts taking longer and longer. It uses IO and resources on the instance. You started updating statistics because you were concerned about performance, and now you’re impacting performance.

Only update statistics with FULLSCAN if you are solving a specific problem, and you don’t have a better way to handle it. If you need to do this:

Identify the column or index statistic that needs the special treatment, and only update that one with FULLSCAN
Run the update as rarely as possible. If you start having to run this throughout the day against multiple statistics, it becomes increasingly hard for you to diagnose performance regressions and manage the instance

Manual statistics update with FULLSCAN can sometimes be used as a temporary measure to contain the pain of a problem while you research what query changes may stabilize the execution plans of the problem query, but this solution rarely satisfies users and business owners. Those folks are usually happier if something changes that guarantees the problem query will be consistently fast, even if a bunch of data changes. This usually means:

A query rewrite
A table or query hint
A mechanism to “freeze” a specific query plan, using a dreaded Plan Guide or the new Query Store feature in SQL Server 2016

The essence of the problem with using manual statistics updates for a performance problem is that this is a reactive measure, and almost never prevents the problem from occurring entirely.

Should I Update Statistics with sp_updatestats?

The built in sp_updatestats procedure is smarter than the “Update Statistics Task” in maintenance plans. It rolls through every table in the database, and is smart enough to skip tables if nothing has changed. It’s smart enough to use the default sampling.

It’s a smaller sledgehammer than the maintenance plan task, but arguably still wastes resources, particularly when it comes to statistics associated with indexes. Consider this scenario:

sp_updatestats is run
Nightly maintenance runs a script to rebuild all indexes which are more than 70% fragmented, and reorganize all indexes which are between 40% and 70% fragmented

The ALTER INDEX REBUILD command creates a beautiful new index structure. To do this, it has to scan every row in one big operation. And while it’s doing it, it updates statistics with FULLSCAN, meaning based on every row.

So our maintenance did a bunch of work updating stats with the default sampling. Then redid the same work for every index that got rebuilt.

Erin Stellato is not a big fan of sp_updatestats. Read why in her excellent article here.

If you’re going to the trouble to set this up, it’s a better use of your resources to use an indexing script that can handle that statistics update inline for columns and for indexes where REORGANIZE is done, and just skip it for REBUILD.

Why Doesn’t Updating Statistics Help if Queries Use a Linked Server?

Linked servers are special. Especially frustrating.

Prior to SQL Server 2012 SP1, any query using a linked server would not have permission to use statistics on the remote (target) database unless it used an account with db_owner or db_ddladmin permissions, or higher. In other words, read-only permissions meant that SQL Server couldn’t use statistics to generate a good query plan.

Stated differently: prior to SQL Server 2012 SP1, you must choose between better security and better performance. You can’t have both.

Great reason to upgrade! Read more about this issue in Thomas LaRock’s article on linked server performance.

How do Duplicate Statistics Get Created?

Let’s revisit our example table. Two column statistics were created on this table, one on FirstNameId, and one on Gender:

Auto-Created-Statistics

Let’s say that later on we create an index on Gender named ix_Gender. It will have index statistics created for it! I now have a column statistic on Gender, and an index statistic on Gender.

Someone could also manually create another column statistic on Gender using the CREATE STATISTICS statement. That rarely happens, but never say never. These statistics are technically ‘duplicates’.

Do I Need to Drop Duplicate Statistics?

I’ve never found a case where dropping duplicate statistics improved performance in a measurable way. It is true that you don’t need them, but statistics are so small and lightweight that I wouldn’t bother writing and testing the code to clean them up unless I had a super good reason.

What are Multi-Column Statistics?

Multi-Column statistics are only sort of what the name sounds like. These can get created in a few different ways:

When you create an index with multiple columns in the key, the associated statistic is a multi-column statistic (based on the keys)
You can create a multi-column statistic with the CREATE STATISTICS statement
You can run the Database Tuning Advisor, which you can tell to apply a bunch of those CREATE STATISTICS statements. It seems to love them. A lot.

But multi-column statistics don’t contain complete information for all the columns. Instead, they contain some general information about the selectivity of the columns combined in what’s called the “Density Vector” of the index.

And then they contain a more detailed estimation of distribution of data for the first column in the key. Just like a single column statistic.

What Can Go Wrong if I Create Multi-Column Statistics?

It’s quite possible that nothing will change. Multi-column statistics don’t always change optimization.

It’s also possible that the query you’re trying to tune could slow down. The information in the density vector doesn’t guarantee better optimization.

You might even make a schema change fail later on, because someone tries to modify a column and you created a statistic on it manually. (This is true for any user created statistic, whether or not it’s multi-column.)

It’s pretty rare to make a query faster simply by creating a multi-column statistic, and there’s a very simple reason: if a multi-column statistic is critically important, an index probably is even more critical to speed up execution. So usually the big win comes in creating or changing an index. (And yes, that also creates the multi-column statistic.)

What are Filtered Statistics, and Do I Need Them?

Filtered statistics are statistics with a “where” clause. They allow you to create a very granular statistic. They can be powerful because the first column in the statistic can be different from the column or columns used in the filter.

Filtered statistics are automatically created when you create a filtered index. That’s most commonly where you’ll find a filtered statistic: helping a filtered index.

It’s rare to need to create filtered statistics manually, and beware: if your queries use parameters, the optimizer may not trust the statistics when it optimizes plans — because the plan might be reused for parameters that go outside of the filtered. So you could potentially create a bunch of filtered statistics for nothing.

For loads of details on filtered stats, watch Kimberly Tripp’s free video, Skewed Data, Poor Cardinality Estimates, and Plans Gone Bad.

How Can I Tell How Many Times a Statistic Has Been Used to Compile a Query?

There is no dynamic management view that lists this. You can’t look at an instance and know which statistics are useful and which aren’t.

For an individual query, there are some undocumented trace flags you can use against a test system to see which statistics it’s considering. Read how to do it in Paul White’s blog here (see the comments for a note on the flag to use with the new cardinality estimator).

Can I Fake Out/ Change the Content of Statistics without Changing the Data?

You can, but don’t tell anyone I told you, OK?

Read how to do this undocumented / bad idea / remarkably interesting hack on Thomas Kejser’s blog.

What are Temporary Statistics, and When Would I Need Them?

Temporary statistics are an improvement added in SQL Server 2012 for read-only databases. When a database is read-only, queries can’t create statistics in the database– because those require writes. As of SQL Server 2012, temporary statistics can be created in tempdb to help optimization of queries.

This is incredibly useful for:

Readable secondaries in an AlwaysOn Availability Group
Readable logshipping secondary databases
A point in time database snapshot which is queried (whether against a database mirror or live database)
Any other read-only database that has queries run against it

Prior to SQL Server 2012, if you use logshipping for reporting and the same workload does not run against the logshipping publisher, consider manually creating column level statistics. (Or upgrading SQL Server.)

Disambiguation: temporary statistics are unrelated to statistics on temporary tables. Words are confusing!

Do I Need to Update Statistics on Temporary Tables?

Table variables don’t have statistics, but temporary tables do. This is useful, but it turns out to be incredibly weird… because statistics on temporary tables can actually be reused by subsequent executions of the same code by different sessions.

Yeah, I know, it sounds like I’m delusional. How could different sessions use the same statistics on a (non-global) temporary table? Start by reading Paul White’s bug, “UPDATE STATISTICS Does Not Cause Recompilation for a Cached Temporary Table,” then read his post, “Temporary Tables in Stored Procedures.”

How Can I Tell if Statistics are Making a Query Slow?

If you run into problems with slow queries, you can test the slow query and see if updating statistics makes the query run faster by running UPDATE STATISTICS against a column or index statistic involved in the query.

This is tricker than it sounds. Don’t just update the statistics first, because you can’t go back!

First, get the execution plan for the ‘slow’ query and save it off. Make sure that if they query was run with parameters, you test each run of the query with the same parameters used to compile it the first time. Otherwise you may just prove that recompiling the query with different parameters makes it faster.
Do not run the UPDATE STATISTICS command against the entire table. The table likely has many statistics, and you want to narrow down which statistic is central to the estimation problem.

In many cases, queries are slow because they are re-using an execution plan that was compiled for different parameter values. Read more on this in Erland Sommarskog’s excellent whitepaper, Slow in the Application, Fast in SSMS – start in this section on “Getting information to solve parameter sniffing problems.”

↧

Managing Statistics in SQL Server for DBAs (videos)

April 21, 2016, 8:00 am

≫ Next: Why Table Partitioning Doesn’t Speed Up Query Performance (video)

≪ Previous: Updating Statistics in SQL Server: Maintenance Questions & Answers

Want to learn more about managing statistics updates in SQL Server? Watch my free 27 minute presentation on managing statistics:

You can also watch 12 minutes of audience Q&A on statistics from when I presented this on a live Hangout on Air. Questions include:

Should I create single column statistics for non-leading key columns in an index?
What is asynchronous statistics update?
What are these statistics with “WA_sys” names?
I have a bunch of user created statistics and it’s blocking a release. It is safe to drop them?

Got more questions on statistics? Check out my reference post on Updating Statistics in SQL Server, and ask away in the comments of the post!

↧

Why Table Partitioning Doesn’t Speed Up Query Performance (video)

May 3, 2016, 8:00 am

≫ Next: The Case of the Blocking Merge Statement (LCK_M_RS_U locks)

≪ Previous: Managing Statistics in SQL Server for DBAs (videos)

Learn why SQL Server’s table partitioning feature doesn’t make your queries faster– and may even make them slower.

In this 20 minute video, I’ll show you my favorite articles, bugs, and whitepapers online to explain where table partitioning shines and why you might want to implement it, even though it won’t solve your query performance problems.

Articles discussed are by Gail Shaw, Remus Rusanu, and the SQL Customer Advisory Team (SQLCAT). Scroll down below the video for direct links to each resource.

Ready to Dig In? Here are the Links Discussed in the Video

Gail Shaw’s SQL Server Howlers – “I Partitioned my Table, but My Queries Aren’t Faster.”

Remus Rusanu’s Stack Overflow answer – “Partitioning is Never Done for Query Performance.”

SQL CAT Team – Diagnosing and Resolving Latch Contention

Connect Bug – Partition Table Using Min/Max Functions and Top N – Index Selection and Performance

Kendra Little on the Brent Ozar Unlimited blog – “Why is this Partitioned Query Slower?”

Books Online – Using Clustered Columnstore Indexes

↧

The Case of the Blocking Merge Statement (LCK_M_RS_U locks)

May 24, 2016, 8:00 am

≫ Next: Target Recovery Interval and Indirect Checkpoint – New Default of 60 Seconds in SQL Server 2016

≪ Previous: Why Table Partitioning Doesn’t Speed Up Query Performance (video)

Recently I got a fun question about an “upsert” pattern as a “Dear SQL DBA” question. The question is about TSQL, so it lent itself to being answered in a blog post where I can show repro code and screenshots.

Here’s the Scenario in the Anonymized Question

We have a lookup table, which is the parent table in a foreign key relationship with a child table. The child table has lots more rows. When data comes in, we need to:

Check the Parent Table (the lookup table) to see if the ParentValue is already present.
If ParentValue is not present, insert a row for the new ParentValue into the Parent Table. This happens very rarely, but we need to check for it.
Then insert a row into the Child Table, using the ParentId.

The Problem: Blocking against the Parent Table

When run under a few sessions, locking and blocking issues were creeping up fast. A merge statement was being used to check for the values in the Parent Table and insert them when they weren’t present.

Let’s Look At the Code

This creates a database named upsert_test, then creates ParentTable and ChildTable objects in the database. Finally, a Foreign Key is created on ChildTable, referencing the ParentTable.

IF DB_ID('upsert_test') is not null
BEGIN
    USE MASTER;
    ALTER DATABASE upsert_test SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    DROP DATABASE upsert_test;
END
GO

CREATE DATABASE upsert_test;
GO

use upsert_test;
GO

CREATE TABLE dbo.ParentTable
(
    ParentId int IDENTITY(1,1) NOT NULL,
    ParentValue varchar(50) NOT NULL,
    CONSTRAINT pk_ParentTable PRIMARY KEY CLUSTERED (ParentId ASC)
);
GO

CREATE TABLE dbo.ChildTable
(
    ChildId INT IDENTITY(1,1) NOT NULL,
    ParentId INT NOT NULL,
    ChildValue VARCHAR(50) NOT NULL,
    CreatedDate DATE NOT NULL CONSTRAINT DF_Work_created DEFAULT (getdate()),
    CONSTRAINT PK_Work PRIMARY KEY CLUSTERED (ChildId ASC)
);
GO

ALTER TABLE ChildTable ADD CONSTRAINT FK_Work_Source FOREIGN KEY (ParentId) REFERENCES ParentTable (ParentId);
GO

Here’s the Upsert (aka Merge Query)

A stored procedure is used to handle incoming values. It uses MERGE to look for matching rows in ParentTable, and insert when not matched.

IF OBJECT_ID('dbo.DoStuff') IS NULL
    EXEC ('CREATE PROCEDURE dbo.DoStuff as RETURN 0;');
GO

ALTER PROCEDURE dbo.DoStuff (
    @ParentValue varchar(50),
    @ChildValue varchar(50)
)
AS

    MERGE ParentTable with (HOLDLOCK) as p
    USING (SELECT @ParentValue NewParentValue) as new
        ON p.ParentValue = new.NewParentValue
    WHEN NOT MATCHED THEN
    INSERT (ParentValue) VALUES (new.NewParentValue);

    INSERT INTO ChildTable (ParentId, ChildValue)
    SELECT p.ParentId, @ChildValue
    FROM ParentTable p
    WHERE p.ParentValue=@ParentValue;
GO

Why is that HOLDLOCK Hint in the Merge Query?

My reader quite rightly used this hint in their merge query. Although MERGE looks like a single query, it’s actually just “syntactic sugar”. Behind the scenes, merge can be implemented as a select and an insert in two separate commands. Developers are advised to use HOLDLOCK to avoid race conditions with MERGE.

I asked one clarifying question — was the lock wait type they were seeing “LCK_M_RS_U”?

It was.

This confirmed that HOLDLOCK and merge were slowing them down instead of helping them.

Let’s Populate Some Rows for Testing and Reproduce the Blocking

exec dbo.DoStuff @ParentValue='Stuff', @ChildValue='Things';
GO
exec dbo.DoStuff @ParentValue='MoreStuff', @ChildValue='MoreThings';
GO
exec dbo.DoStuff @ParentValue='EvenMoreStuff', @ChildValue='EvenMoreThings';
GO

exec dbo.DoStuff @ParentValue='EvenMoreStuff', @ChildValue='EvenMoreThings x 2';
GO
exec dbo.DoStuff @ParentValue='EvenMoreStuff', @ChildValue='EvenMoreThings x 3';
GO

/* Create 5000 more ParentValues */
SET NOCOUNT ON;
DECLARE @namevarchar varchar(50), @i int=1
BEGIN TRAN
    WHILE @i <= 5000
    BEGIN
        SET @namevarchar= cast(RAND() AS VARCHAR(50));
        EXEC dbo.DoStuff @ParentValue=@namevarchar, @ChildValue='Whatever';
        SET @i=@i+1;
    END
COMMIT
GO

To see the blocking issue, just run the following code in three session windows at the same time. Note that we’re running this over and over with the same ParentValue, and the ParentValue of “Stuff” is already in the table. This will not have to insert any rows into ParentTable.

SET NOCOUNT ON;
exec dbo.DoStuff @ParentValue='Stuff', @ChildValue='Things';
GO 1000000

Here’s what the blocking looks like in Adam Machanic’s sp_WhoIsActive:

HOLDLOCK = Serializable Isolation Level = Key Range Locks

The holdlock hint is a way to get serializable isolation level in SQL Server for a specific table, without having to change the isolation level for your entire session. Serializable is the highest isolation level in SQL Server using pessimistic locking.

When you “HOLDLOCK”, you tell SQL Server to protect any rows you read with a range lock– just in case someone comes along and tries to change one or sneak one in.

That means that even when you’re just reading ParentTable and not inserting a row, you’re taking out a key range lock. You’re willing to fight other users over those rows to protect your statement.

There’s two parts to getting around the blocking and making this faster…

Index The Parent Table (Solution Part 1)

Currently, the only index on ParentTable is on ParentId.

Even if ParentTable is tiny, if we’re frequently accessing the table and looking up a ParentValue, we’ll benefit from creating a nonclustered index on that column. We should also allow only unique values into ParentValue for data integrity reasons. A unique nonclustered index is exactly what we need:

CREATE UNIQUE NONCLUSTERED INDEX ix_ParentTable_ParentValue on dbo.ParentTable(ParentValue)
GO

In my simple three session test, this makes the merge statement very efficient, and performance goes way up. You can no longer catch those LCK_M_RS_U waits in sp_WhoIsActive. However, I’m still concerned about them, and would still…

Ditch the Merge (Solution Part 2)

The “merge” command in SQL Server is often a let-down for folks. The syntax is confusing, most folks find out about the race conditions/concurrency issues the hard way, and the biggest problem is that it often seems “better” than other TSQL options because it was introduced as an enhancement in SQL Server 2008… but it isn’t always the better choice.

In this case, ditching the merge gives me more granular control of when I want to use that high level lock on ParentTable. The code is longer mostly because of a lot of comments.

ALTER PROCEDURE dbo.DoStuff (
    @ParentValue varchar(50),
    @ChildValue varchar(50)
)
AS
    DECLARE @ParentId INT;

    /* ParentId is very rarely new, so check for it first with only a shared lock */
    SELECT @ParentId=ParentId
    FROM dbo.ParentTable
    WHERE ParentValue=@ParentValue

    /* Insert the new value if we have to. */
    /* Use the SELECT WITH UPDLOCK in case of race conditions */
    /* Get the new ParentId so we don't have to rejoin back to the table */
    IF @ParentId IS NULL
    BEGIN
        DECLARE @OutputVal TABLE (ParentId INT)

        INSERT dbo.ParentTable (ParentValue) 
        OUTPUT inserted.ParentId INTO @OutputVal(ParentId)
        SELECT x.newval
        FROM (SELECT @ParentValue as newval) as x
        LEFT JOIN dbo.ParentTable as p WITH (UPDLOCK, HOLDLOCK) on 
            x.newval=p.ParentValue
        WHERE p.ParentValue IS NULL;

        /* We are only ever going to have one row in @OutputVal */
        SELECT @ParentId=ParentId
        FROM @OutputVal;

    END

    INSERT INTO dbo.ChildTable (ParentId, ChildValue)
    SELECT @ParentId, @ChildValue;
GO

In our scenario, it’s rare for new ParentValues to come in. So I’ve used a pattern to try to use as many shared locks against ParentTable as possible, stick with the Read Committed Isolation level, and still protect against race conditions. The pattern is essentially this:

Check if ParentValue already exists (shared read locks only)
If this is the rare case that a ParentValue does not exist…
- Insert the row into ParentTable
- Protect against race conditions by inserting with a SELECT joining to ParentTable with UPDLOCK, HOLDLOCK in case the same new row happens to come in on two sessions at the same time
- Use the OUTPUT clause to get the new @ParentId so we don’t have to join to ParentTable again in the next step
Insert into ChildTable

Rough Comparison: Do These Changes Help?

I didn’t do thorough load testing on this. I just ran the call to dbo.DoStuff above in three sessions in a 4 core VM on my MacBook and looked at the BatchRequests/sec performance counter in SQL Server. Here’s what I saw:

Setup	Batch Requests/sec with No Nonclustered Index on dbo.Parent	Batch Requests/sec with Nonclustered Index on dbo.Parent
Merge	313 avg	4000+
Insert with Left Join	495 avg	4000+

In this test case, adding the nonclustered index makes a bigger difference than changing the TSQL. But I would still move away from merge, because I want to be able to control when anything tougher than a read lock is being taken out against ParentTable — that’s very attractive since new values come in rarely in this case. The more concurrent sessions that are running this, the more that will help.

Don’t Forget to Handle Errors

Error handling is important! The code in this post doesn’t have it for simplicity reasons. If you need a starter guide for error handling, check out Erland Sommarskog’s excellent whitepaper.

Further Tuning Thoughts

This code can be tuned further, but I’d want to set up a really clean load test using application servers (not SSMS) against a proper instance (maybe NOT on a MacBook). I would look at:

Whether validating ParentValue could be done in memory in the client application tier. Avoiding constant data access against dbo.ParentTable is attractive if that’s possible.
Wait statistics during execution to point to the next thing to tune.

What Do You Think of Merge?

Do you love or hate merge statements in SQL Server?

↧

Target Recovery Interval and Indirect Checkpoint – New Default of 60 Seconds in SQL Server 2016

June 14, 2016, 8:00 am

≫ Next: Outside the Big SAN Box: Identifying Storage and SAN Latency in SQL Server (Dear SQL DBA)

≪ Previous: The Case of the Blocking Merge Statement (LCK_M_RS_U locks)

Update, 6/21/2016: Be careful using indirect checkpoint with failover clusters if your SQL Server 2014 instance is not fully patched. See KB 3166902. This bug was fixed in SQL Server 2016 prior to RTM.

SQL Server 2016 introduces big new features, but it also includes small improvements as well. Many of these features are described in the “It Just Runs Faster” series of blog posts by Bob Ward and Bob Dorr.

One article in this series explained that new databases created in SQL Server 2016 will use “Indirect Checkpoint” by default. Indirect checkpoint was added in SQL Server 2012, but has not previously been enabled by default for new databases. The article emphasizes this point:

Indirect checkpoint is the recommended configuration, especially on systems with large memory footprints and default for databases created in SQL Server 2016.

Head over and read the article to learn how indirect checkpoint works.

Indirect Checkpoint for new databases in SQL Server 2016 is set using the model database

When you create a new database in SQL Server 2016, if you use the GUI and click on the ‘Options’ tab, you can see the “Target Recovery Time (seconds)” in the Recovery section is set to 60.

This value is inherited from the model database, so if you don’t choose to use this as the default for your new databases, you can change it there. You can also turn this on for individual databases in SQL Server 2012, 2014, and databases restored to 2016.

Query sys.databases to see your current recovery interval, and if you’re using indirect checkpoint

You can see the settings for existing databases with this query…

SELECT name, target_recovery_time_in_seconds
FROM sys.databases;

If the target recovery time is set to 0, that means the database uses automatic checkpoints (not the newer indirect feature).

Is 60 seconds a big change in recovery interval?

Nope. Not unless you’ve changed “recovery interval (min)” in your server configuration settings. Check your current setting with this query…

SELECT name, value, value_in_use
FROM sys.configurations 
WHERE name = 'recovery interval (min)';

If your value of ‘recovery interval (min)’ is set to zero, that means automatic checkpoints are typically occurring every minute (source).

Setting Target Recovery Time (Seconds) to 60 at the database level maintains the same checkpoint interval, but uses the indirect checkpoint algorithm.

Are there risks to using indirect checkpoint?

Yes. Things can go wrong with any configuration, and every setting can have bugs.

If you’re using indirect checkpoint with a failover cluster on SQL Server 2014, make sure to test and apply recent cumulative updates. On June 21, 2016, Microsoft released KB 3166902.

This KB is a pretty serious one: “FIX: logs are missing after multiple failovers in a SQL Server 2014 failover cluster”. When log records are missing, SQL Server can’t recover the database properly — read the error message carefully in the KB and note that it says:

Restore the database from a full backup, or repair the database.

I verified from the team at Microsoft that this bug was fixed in SQL Server 2016 prior to RTM, so no need to wait for a patch.

Extra: Trace Flag 3449 and Indirect Checkpoint

In June 2016, Microsoft released a series of Cumulative Updates for SQL Server 2012 and 2014 that recommend using Trace Flag 3449 and indirect checkpoint on servers with 2+ terabytes of memory to speed up creating new databases. See KB 3158396 for details.

↧

Outside the Big SAN Box: Identifying Storage and SAN Latency in SQL Server (Dear SQL DBA)

June 16, 2016, 8:00 am

≫ Next: Index Usage Stats Insanity – the oddities of sys.dm db index usage stats (Dear SQL DBA)

≪ Previous: Target Recovery Interval and Indirect Checkpoint – New Default of 60 Seconds in SQL Server 2016

Note: This is a “listen-able” 36 minute video. You can also listen to this as a podcast – learn how at littlekendra.com/dearsqldba.

Here’s today’s question: Dear SQL DBA,

What do you say to a SAN admin when you think that the billion dollar SAN *may* be the bottleneck and you just want to look into it. What are the technical things I need to say to make them believe there might be something to my questions?

Sincerely,

Outside the Big SAN Box

This question is near and dear to my heart

I’ve been called in a lot as a consultant with the question, “Is the storage slowing us down? If so, what do we do?” This is really mystifying to a lot of people just because they don’t know where to look, and SQL Server gives out so many metrics.

The good news is that this doesn’t require a whole lot of rocket science. You just need to know where to look.

Your Mission: Show evidence of when disk latency impacts performance, and how much it hurts

Do your homework collecting data from the SQL Server and (sometimes, maybe) Windows.

If it’s not “emergency slow”, look for potential workarounds, such as adding memory to reduce the amount of reads — that’s always cheaper than speeding up storage. Take care of due diligence looking at index fixes, too.

When those won’t do it, perform an analysis of exactly where faster storage would help you the most, document what it will help, and ask for help speeding up the workload.

Make all of your notes and data available to the SAN admin, but write up a short TLDR summary.

Political tips

I like that you’re concerns about this already! You know this is a sensitive topic, and that things can go wrong when you bring this up.

Talk about “disk latency” that you’re seeing.

Some people say things like, “The SAN is slow.” That’s like the SAN admin saying, “The SQL Server is dumb.” Things tend to go badly after you call someone’s baby ugly (whether or not the baby is actually ugly).

In reality, the problem could be part of the N of SAN– the network. Saying “the SAN is slow” is a really general thing, and that’s part of why it’s not really helpful. I’ve had slow storage incidents that were solved by replacing a single cable.

“Emergency Slow” – Look in the SQL Server Error Log for “15 second” warnings

Filter the SQL Server log looking for messages with “longer than 15 seconds” in them.

The full message is like this:

SQL Server has encountered [#] occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [blah blah blah] in database [blah].

This message means SQL Server sent off an I/O request and waited 15 seconds (1…. 2…. 3…. ) before getting anything back.

The # of occurrences is important. Was the SQL Server even doing a lot?

This message will only occur once every 5 minutes, so if you see the messages repeat, the latency may have been continuous.

If you see these messages, this is severe latency. Stop and send the times to the storage admins right away. Note that the SQL Server had severe storage latency at these times, and ask if any scheduled maintenance was occurring.

If there was no maintenance, ask for help starting a ticket up to investigate what’s causing this. This is a severe latency problem. For the storage with the problem, start by asking:

How many servers / computers share that storage
How active was it at that time on the SAN side

Gather read/write latency metrics from SQL Server’s “virtual file stats” dynamic management view

Sys.dm_io_virtual_file_stats is your friend here!

You want to know about latency when SQL Server reads from storage. This isn’t all reads, because when it can read from memory alone, it avoids the trip to storage.

Look at how much is read and written to in samples when performance is poor.

Volume of reads and writes by file (I look at MB or GB because those units make sense to me)
Latency – how much are you waiting?

Lots of reads but very low latency is evidence that you’re using the storage, but things are going swimmingly. And that may be the case– be open to that! The beauty of this DMV is that it can tell you if the problem is elsewhere. (If so, start here.)

This DMV reports on all physical reads, including IO done by:

Backups
DBCC CHECKDB
Index maintenance

That means that the data since start up is diluted– it contains maintenance windows as well as periods of time when just not much was going on. It’s a data point, but what you really want to know is what does the data look like in 5 minute periods when performance was poor and when maintenance wasn’t running (unless your problem is slow maintenance).

There are a bunch of free scripts to sample this, if you don’t feel like writing your own. Here are two:

Paul Randal’s script from SQLSkills
sp_AskBrent from Brent Ozar Unlimited

Analysis: Which files have latency, and is it acceptable?

Typically throw out samples where very few reads and writes are done. A tiny amount of reads being slow is usually not an issue.

What’s impacted the most?

Reads from user database data files?
Writes to user database log files?
Reads and writes in tempdb?
Something else?

The latency numbers tell you how severe the latency is impacting you

Some latency is acceptable. From Microsoft’s “Analyzing I/O Characteristics and Sizing Storage Systems for SQL Server Database Applications”

Data file latency: <= 20 ms (OLTP) <=30 ms (DW)
Log file files: <= 5 ms

These are pretty aggressive targets.

Your storage is not “on fire” if you see periodic 100 ms write latency to your data files.

One point on writes to data files: SQL Server writes to memory and a write ahead transaction log. Seeing a 100ms insert on writes on a data file does not mean a user is waiting 100ms when inserting one row.

However, seeing a 100ms read latency from a data file may mean that a user is waiting on that.

Things to think about:

Adding memory frequently reduces read latency from storage for data files
- Data file reads don’t have to be blazing fast when there’s sufficient memory to make reads from storage fairly rare and the workload is largely reads
In many environments, 100 ms is tolerable as read latency. It’s all about performance standards.

Heavy write latency to transaction logs can usually only be sped up either by speeding up storage or (sometimes) changing application patterns if you’ve got a lot of tiny small commits.

Both of these options are typically somewhat expensive, unfortunately

Helpful: Cover your (developer’s) butt with a sanity check on indexes

Arguably, storage shouldn’t be slowing you down, whether or not you have good indexes

However, it’s polite to do a sanity check, because indexing can dramatically reduce the number of physical reads you do (sometimes)

Note: you shouldn’t have those “15 second” latency warnings in the SQL Server error log, no matter what your indexes are.

If you’ve never done an indexing health check and can request for a developer to do a quick assessment, that’s fine too– you don’t have to do it yourself.

Sometimes helpful politically: Gather disk latency counters from Windows

DBAs are usually suspicious of the SAN, and SAN administrators are usually suspicious of the SQL Server.

If this dynamic is a problem in your environment, Windows is kind of a neutral middle ground that helps make the SQL Server info more convincing / less threatening. It will “back up” the virtual file stats data if you need that.

Latency counters on the PhysicalDisk Object:

Physical Disk – Avg Disk sec/Read
Physical Disk – Avg Disk sec/Write

Recap!

Talk about measured storage latency and its impact on the SQL Server. Sticking to the data points helps keep people from getting defensive / taking it personally.
Look for “emergency slow” first, and triage those as a critical issue
Use samples from sys.dm_io_virtual_file_stats to identify if storage latency impacts you
Analyze it to identify which databases are impacted, and reads vs writes
Consider whether adding memory is a cheaper way to reduce the latency
Remember that backups and maintenance do physical IO and will show in virtual file stats
Do an indexing sanity check if possible as part of due diligence
Collecting windows physical disk latency counters may help “bridge the gap” for some SAN admins

What if my SAN admin doesn’t know what to do?

Your storage vendor would love to help (for a cost) if your SAN admins aren’t sure how to troubleshoot the latency
If the SAN is a recent implementation and what you’re seeing doesn’t live up to what was advertised, you may be able to get some of their time based on that difference (but your mileage will vary)

↧

Index Usage Stats Insanity – the oddities of sys.dm db index usage stats (Dear SQL DBA)

June 30, 2016, 8:00 am

≫ Next: Max Degree of Confusion (Dear SQL DBA Episode 8)

≪ Previous: Outside the Big SAN Box: Identifying Storage and SAN Latency in SQL Server (Dear SQL DBA)

SQL Server’s “index usage stats” dynamic management view is incredibly useful– but does it tell you what you THINK it tells you? Kendra explains the quirks of how sys.dm_db_index_usage_stats works, as well as why she thinks the information is so valuable.

This is a “listen-able” 24 minute video. Prefer a podcast instead? Find it at at littlekendra.com/DearSQLDBA.

Question: Dear SQL DBA,

Why does the sys.dm_db_index_usage_stats dynamic management view increment the user_updates value even when you have a where clause on a given index that would result in no change to indexed values?

Sincerely,

Going Insane with Index Usage Stats

A quick overview of the “index usage stats” DMV

sys.dm_db_index_usage_stats is a dynamic management view that reports the number of times an index is used by queries for reads or writes.

The “user_updates” column is described in books online as:

Number of updates by user queries. This includes Insert, Delete and Updates representing number of operations done not the actual rows affected. For example, if you delete 1000 rows in one statement, this count will increment by 1

Source: https://msdn.microsoft.com/en-us/library/ms188755.aspx

There are also columns for:

user_seeks
user_scans
user_lookups

These are all named “user” to indicate that they aren’t behind the scenes system queries, which have their own columns in the view.

“Going Insane with Index Usage Stats” has noticed that the devil is in the details

Books online said, “For example, if you delete 1000 rows in one statement, [the count in the user_updates column] will increment by 1

“Going Insane” noticed that if your delete query deletes zero rows, it will still increment the user_updates column value by 1.

This is totally true, and it’s easy to reproduce like this:

Create a simple table with an identity column named “i” and make it the clustered index.
Insert 10 rows with default values, so you have 10 rows, where i goes from 1 to 10
Query sys.dm_db_index_usage stats for that table, and you’ll see that user_updates reads as “10” (the 10 insert statements you ran)
Run a delete statement against the table where it deletes the row with i = 1000. That doesn’t exist, so it’ll delete 0 rows.
Look at sys.dm_db_index_usage_stats again– it’ll show 11 for user_updates, even though you didn’t actually delete anything.

It gets weirder: seek operators that don’t get used will update the user_seeks column in index usage stats

Sometimes operators in query plans don’t actually get used at runtime.

For example, the nested loop join operator looks up a value in an “inner” table/index for every row that comes back from the “outer” table/index in the join. If zero rows comes back from the “outer” table/index, the SQL Server will never go and access the “inner” table/index.

But user_seeks will still be incremented by 1, even if the index wasn’t actually used at runtime by that query.

Even more weirdness: user_scans doesn’t necessarily indicate that SQL Server scanned the whole index

When we learn about seeks vs scans, we tend to think of seeks as being super efficient and looking at just a few rows, and scans as reading all the rows in the object.

It turns out this isn’t true. You can have a seek operator that reads all the rows in an index for a query. And you can have a scan operator that reads just a few rows.

Let’s say your query uses a TOP. You may get a scan operator in your plan that feeds into the top. And the query may be so efficient that it quickly finds enough rows to satisfy the TOP requirement, and at that point the scan can just stop.

It may only read a tiny number of pages in a huge index, but still user_scans is incremented by 1.

And finally, rollbacks don’t decrement / undo any values in the index index usage stats DMV

Let’s say you do run a single query that updates 1,000 rows. Like books online says, user_updates will increase by 1.

If you roll back that transaction, the value stays as is. It shows the same value in user_updates as if the transaction committed successfully.

Here’s the secret: “usage” means “this index appeared in a query execution plan”

The misunderstanding here is about what’s doing the “using”. I totally get this, I had the same misunderstanding myself.

Our inclination is to think the DMV answers the question, “Did I use the rows in this index?”

Instead, it answers the question, “Did I run a query with an operator that could do something with this index?”

It’s counting the number of times an operator shows up in a query plan, and categorizing it by type.

It’s not checking if the operator is actually executed on individual runs, if it does a “full” scan, or if the transaction is rolled back.

sys.dm_db_index_usage_stats is still incredibly useful

Although the DMV is higher level than it appears, it still points to the indexes that get used the most

It can also point you to indexes that haven’t been used at all– at least since the last time usage stats were reset for that index

The DMV is really useful for quick, high level insight into what indexes are most popular, and which you should put on a list to monitor, and perhaps eventually drop

Sample uses:

Reclaiming storage space/ maintenance time: which indexes aren’t used?
Adding a column or modifying an index: how popular is it? (Quick assessment of risk)

When does index usage stats get reset?

Index usage stats is always reset when a database goes offline (that includes restarting the SQL Server or failing the database over)

Dropping an index or CREATE with DROP_EXISTING will also reset usage stats

In SQL Server 2012, a bug occurred where index rebuilds started resetting index usage stats

The issue is specific to ALTER INDEX REBUILD (not a reorganize command)

That has been fixed in:

SQL Server 2012 SP3 + CU3
SQL Server 2014 SP2 (planned as of this recording)
SQL Server 2016

More detail: http://littlekendra.com/2016/03/07/sql-server-2016-rc0-fixes-index-usage-stats-bug-missing-indexes-still-broken/

Getting back to the question about user_updates – how can we solve that?

If you want specific data about how many writes occur in an index, you have a few options:

1) Tracing with SQLTrace or Extended events: more impact on the instance, but more complete info

2) Looking at sys.dm_exec_query_stats for queries that reference the index: lightweight, but may miss information for queries whose plans aren’t in cache

Grant Fritchey has a great post on doing this easily by joining to sys.dm_exec_text_query_plan – search for “Grant Querying Plan Cache Simplified” (http://www.scarydba.com/2012/07/02/querying-data-from-the-plan-cache/ )

3) Tracking activity with QueryStore and looking for queries that modify the index: SQL Server 2016 only.

What about Operational Stats?

There is a DMV, sys.dm_db_index_operational_stats that includes more granular information about writes

It’s sensitive to memory pressure, though. Data goes away when metadata for that index is no longer in memory

“How many writes since when?” is super hard to answer with this DMV

If you’re putting in a lot of time on answering the question, you’re better off with sys.dm_exec_query_stats and friends, in my experience

Index tuning can be complex, but it has a great payoff

This is worth your time investment! Indexes are critical to performance.

Stick with it and don’t sweat the small stuff.

↧

Max Degree of Confusion (Dear SQL DBA Episode 8)

July 14, 2016, 7:43 am

≫ Next: Free Poster: Game of Performance Tuning

≪ Previous: Index Usage Stats Insanity – the oddities of sys.dm db index usage stats (Dear SQL DBA)

Learn how to configure the Max Degree of Parallelism and Cost Threshold for Parallelism settings in SQL Server – and how SQL Server 2014 SP2 and SQL Server 2016 change the way that SQL Server automatically configures some SQL Servers with lots of cores.

This is a “listen-able” 20 minute video. Prefer a podcast instead? Find it at at littlekendra.com/dearsqldba. Show notes with clickable links are below the video.

Dear SQL DBA…

I am completely confused as to how to set Max Degree of Parallelism for an OLTP workload. Having looked at three recommendations recently and applied it to my own machine I get 3 different values. My machine has 1 physical CPU with 4 cores, 4 visible schedulers and a hyperthreading ratio of 4. However I’ve got recommendations to set either to 1, 2 or 4. What should it be?

Sincerely,

Max Degree of Confusion

I don’t blame you for being confused– this is a tough one!

The good news is that for Max Degree of Confusion’s specific question, I’ve got a clear recommendation for a default setting for “Max Degree of Parallelism” and “Cost Threshold for Parallelism”. I think you need to set both, and I’ll explain why.

But for people who have a lot more cores in their servers, things are a little more interesting– especially if you’re running SQL Server 2014 SP2+ or SQL Server 2016.

Let’s break this down and talk about how to figure out the setting, then we’ll circle back to our 4 core example.

Settings: Max Degree of Parallelism (“MAXDOP”) and Cost Threshold for Parallelism

When you run a query, SQL Server estimates how “expensive” it is in a fake costing unit, let’s call it Estimated QueryBucks.

If a query’s Estimated QueryBucks is over the “Cost Threshold for Parallelism” setting in SQL Server, it qualifies to potentially use multiple processors to run the query.

The number of processors it can use is defined by the instance level “Max Degree of Parallelism” setting.

When writing TSQL, you can specify maxdop for individual statements as a query hint, to say that if that query qualifies to go parallel, it should use the number of processors specified in the hint and ignore the server level setting. (You could use this to make it use more processors, or to never go parallel.)

KB 2806535 helps determine Max Degree of Parallelism

Hooray, Microsoft has published some guidance on this!

KB 2806536 is titled Recommendations and guidelines for the “max degree of parallelism” configuration option in SQL Server.

An acronym: NUMA nodes

KB 2806535 explains that you need to determine two things about your hardware

How many NUMA nodes it has
How many logical processors are in each NUMA node

NUMA – simpler than it sounds

NUMA means “Non-Uniform Memory Access.” (That doesn’t really explain much of anything, I know, but if I didn’t tell you what it stands for it would be weird.)

When you buy a modern server, typically each physical CPU has many logical processors. Let’s say we buy a server with 1 physical CPU and 10 logical processors, and the server has 256GB of RAM. That 1 physical CPU is snuggled up right next to all the memory chips, and it’s really fast for all 10 logical processors to access that 256GB of RAM. Our server has one NUMA node.

But what if we bought a server with 2 physical CPUs and 10 logical processors each, and 512GB of RAM? We would then have 2 NUMA nodes, because a NUMA node is just a physical CPU and its local memory. Each NUMA node would have 10 logical processors and 256GB of RAM.

Logical processors can access all of the memory in the server. It’s just faster for a processor to access the memory that’s hooked up to its own “NUMA node”.

This is important to SQL Server, because it wants queries to be fast.

If a query goes parallel, you want it to use processors from the same NUMA node and access memory local to that node (ideally).

8 is a magic number

The guidance in KB 2806535 is basically this:

Figure out how many logical processors you have in a NUMA node
If you have 8 or more logical processors in a NUMA node, generally you’ll get the best performance at maxdop 8 or lower
If you have less than 8 logical processors per NUMA node, generally you’ll get the best performance setting maxdop to the number of logical processors or lower

Why 8?

It’s not a law or anything– sometimes you can get better performance for a query with a maxdop higher than 8. And if that works out well for your workload, that’s cool!

But in general, using more cores = more overhead to pull everything back together.

8 may be less magical in SQL Server 2014 SP2 and SQL Server 2016 because of “Automatic Soft NUMA”

Hardware manufacturers are packing more and more cores in processors. SQL Server’s making some changes to scale with this.

SQL Server 2014 SP2 and SQL Server 2016 have a feature called “Automatic Soft NUMA“…

This feature is on by default in SQL Server 2016, but can be disabled using ALTER SERVER CONFIGURATION with the SET SOFTNUMA argument
In SQL Server 2014 SP2, you can enable Automatic Soft NUMA configuration by turning on Trace Flag 8079 at the server level

When this feature is enabled, if you have more than 8 logical processors in a NUMA node Soft NUMA will be configured when SQL Server starts up.

Messages are written to the SQL Server Error log when this occurs, so it’s very easy to check there at the time of the latest startup for information about what occurred. You can also query the sys.dm_os_sys_info and sys.dm_os_nodes dynamic management views for configuration information.

Bob Dorr explains more about Automatic Soft NUMA configuration in his blog post, “SQL 2016 – It Just Runs Faster: Automatic Soft NUMA” on the “SQL Server According to Bob” blog.

Bob gives an example of a workload running on 2016 where a 30% gain in query performance was obtained by using Soft NUMA with “max degree of parallelism” set to the number of physical cores in a socket– which was 12 in that case.

Fine tuning MAXDOP and Cost Threshold require a repeatable workload

If you really care about performance, you need a repeatable benchmark for your workload. You also need to be able to run that benchmark repeatedly on the production hardware with different settings.

This is one of the many reasons that performance-critical environments buy identical hardware for staging environments.

So what to do with 1 NUMA node and 4 logical processors?

OK, so back to Max Degree of Confusion’s question.

We know that there is 1 physical CPU. That’s one NUMA node. It was 4 logical processors. So we want 4 or lower.

Max Degree of Confusion said that this is an OLTP workload, which means we can have concurrent queries running. That’s a good argument for not using 4 — one longrunning query using all 4 logical processors isn’t going to be a nice day for lots of chatty little queries.

Really, the question in this situation is whether we want to go with maxdop 1 an effectively disable parallelism, or go with maxdop 2 and and have some parallelism.

I would personally start with:

Max Degree of Parallelism set to 2
Cost Threshold for Parallelism set to 50

Wait a second, the KB doesn’t talk about Cost Threshold for Parallelism!

I know, that’s what I’d change about the KB.

Remember, there’s two parts to going parallel:

Is the query’s Estimated cost is over the “Cost Threshold for Parallelism”
If so, how many logical processors is it allowed to use based on the “Max Degree of Parallelism”

SQL Server’s default “Cost Threshold for Parallelism” is 5. A cost of 5 QueryBucks is a super low bar these days.

This default was set back in days when processor power was a LOT MORE SCARCE. Processors have gotten way faster and you can eat a lot of data pretty quickly with a single logical processor these days.

When I was trained as a DBA back on SQL Server 2005, our standard was to raise Cost Threshold to 50 on every server.

11 years later, that has only become less risky. I think it’s a pretty safe default now.

This isn’t a law any more than the magic 8 was a law. It’s just a generalization based on observation.

Would you ever set Max Degree of Parallelism to 1 and disable parallelism?

Sure, if the application was carefully crafted to NEVER need parallelism unless a query hints it higher, maxdop 1 is the way to do that at the server level. Sharepoint is famous for this architecture.

But generally parallelism is a good thing, and you want to allow parallelism for the queries that need it, and benefit for it.

“Cost threshold for parallelism” is your setting for determining which queries “need” it, based on their estimated cost.

Want to learn more? Bookmark these two resources:

Brent Ozar talks more about these two settings and brings in SQL Server wait stats with his post on CXPACKET
Paul White gave an excellent presentation on parallel query execution at the PASS Summit in 2013. It’s still just as good as when he first presented it. Watch the hour here.
Don’t forget to check out the SQL Server According to Bob blog, by Bob Dorr and Bob Ward of Microsoft. They’ve got that article on Automatic Soft NUMA Configuration and much more cool stuff.

↧

Free Poster: Game of Performance Tuning

August 2, 2016, 7:04 am

≫ Next: Teach Yourself SQL Server Performance Tuning (Dear SQL DBA Episode 12)

≪ Previous: Max Degree of Confusion (Dear SQL DBA Episode 8)

Here’s a teeny tiny preview. Click the link in the post to download the full sized poster.

Performance tuning is much easier when you have dragons.

When I think about speeding up SQL Server, I think of three things:

Index tuning
Query tuning
Server and application architecture

Capture all three in one free poster.

Free Poster: Game of Performance Tuning Poster 149.43 KB

Download

Want more posters? I’ve got download links for all my free posters right here.

↧

Teach Yourself SQL Server Performance Tuning (Dear SQL DBA Episode 12)

August 11, 2016, 8:00 am

≫ Next: Collect and Baseline Wait Statistics (Dear SQL DBA Episode 14)

≪ Previous: Free Poster: Game of Performance Tuning

You’d love to have a job tuning SQL Servers, but you don’t have an environment to practice in. Here’s how to teach yourself performance tuning and prepare yourself to land and succeed in job interviews.

This is a “listen-able” 20 minute video. Prefer a podcast instead? Find it at at littlekendra.com/dearsqldba.

A written version of the discussion with clickable links is just under this video.

Dear SQL DBA,

Is there a way I can gain SQL performance tuning experience if I don’t have access to a live production environment? I read lots of blogs and attend classes and conferences were I can, but I don’t feel confident.

I know real experience is the best, but I’d like to do whatever I can, and I’d like to get a job tuning performance.

Yours truly,

Junior Tuner

You’re right to ask this question, because job interviews focus heavily on experience

pablo-teach-yourself-sql-server-performance-tuning It’s tough to get a job without direct experience.

But there’s a bright side with performance tuning: not a lot of people have direct experience.

If you follow what I outline in this post, you’ll be able to talk about what you’ve done to learn and the problems you’ve retro-engineered and solved. That will give you a real advantage in those interviews.

You’re going to need a sample dataset to work with

There are lots of options for sample databases. Anything can work.

With any dataset, you may need to write code to enlarge tables or change the data around to demonstrate specific problems. That’s a normal part of the challenge — it’s really a feature in a way.

Here’s just a few of the sample databases out there:

Microsoft’s AdventureWorks sample database: http://msftdbprodsamples.codeplex.com/. AdventureWorks is small. To demonstrate slow queries and speed them up, expanding the database is very helpful. Jonathan Kehayias wrote a script to enlarge it.
Microsoft’s World Wide Importers sample database for SQL Server 2016
StackOverflow sample database, shared on BitTorrent by Brent Ozar
SQLIndexWorkbook Sample Database, shared on LittleKendra.com – very small, but I have scripts to expand it after download. I can publish them if people are interested.
SQLSkills sample databases, including their Credit database, SalesDB, and a Baseball Stats database.
There are also many free data downloads at Data.gov, if you want to build your own!

If you have enough space to keep multiple of these databases on your instance, there’s no reason to only use one of them as a learner.

If you’re planning to take your experience and teach a class, you may want to focus on just one sample database, though — and also make sure you have the rights to share it with students. (Switching around between databases in a class can be confusing.)

Start writing queries that demonstrate TSQL anti patterns – and make them slow

You know how people say that the best way to learn something is to teach it?

The best way to learn to speed up queries is to write slow ones.

The best way to get a job speeding up queries is to write a blog about the queries you’ve sped up.

The hardest part is going to be writing slow queries properly. You wouldn’t think that it takes talent to write truly crappy TSQL, but it takes me quite a long time to write terrible queries that demonstrate an anti-pattern against a sample dataset.

Two articles will get you started on anti-patterns:

Grant Fritchey’s Seven Sins against TSQL Performance article on Simple Talk
Aaron Bertrand’s Bad Habits Revival on the SQL Sentry blog (not all cause performance issues, you’ll learn as you work through the list)

These articles will include sample code. Use that as inspiration.

If you really want to learn performance tuning outside of a production environment, writing your own slow code and then speeding it up is the most effective approach.

For each anti pattern you create, understand the execution plan and how to measure the query

For each slow query you write, test different solutions and compare them. To do this well, you’ll need to:

Research operators in the execution plans when the query is slow and fast
Learn how to measure performance using tools like STATISTICS TIME and STATISTICS IO

I find that the easiest way to do this is to make lots of notes in my TSQL scripts as I go, to remind myself of the performance at different points in the script.

Use the queries to make an anti-pattern environment

Once you have a bunch of slow queries, you can create an environment of bad queries.

One easy way to do this is to set up SQL Server Agent jobs that run the queries in a loop or on a scheduled basis.

You’ll learn quickly that you do have to meter them out in a way, because just running a ton of stuff in a tight loop is going to completely overwhelm your CPUs.

Some options for running a bunch of queries:

Michael J Swart wrote a great post on generating concurrent activity that lists out a bunch of tools which can help.
One note is that the SQL Query Stress tool originally written by Adam Machanic is now maintained on GitHub by Erik, Ejlskov Jensen.

Practice finding the worst queries and diagnosing a solution

Some of your bad queries are going to be worse for your instance than others.

But which are the worst of the worst?

And what’s the most efficient way to fix the top three queries with the least amount of work?

After automating your queries, you can now practice:

using sp_WhoIsActive to find out what’s slow right now
using wait statistics to measure bottlenecks on the instance (see the episode on my performance tuning process)

Finally, add in database and server level anti-patterns

You can take this even farther, and challenge yourself to:

Simulate tempdb contention
Turn on database auto-shrink, and see if you can identify from the server exactly what it slows down and by how much
Change server settings related to parallelism and measure how it impacts performance, and how you would detect and tune those settings. (I did an episode on parallelism called Max Degree of Confusion.)
Lower your memory settings so not all data fits in cache, and measure how that impacts performance

Remember what I said about the blog?

Blogging about this process as you go through it serves a few purposes:

Writing about it helps you remember things
Writing about it under your name will act as an online resume
Bonus: you’re helping others as you go

You’re going to need to be persistent about this project to make it work. And it’s going to take a lot of time.

Blogging as you go is extra work, but if your goal is to get a job, it’s incredibly valuable – because if you do this once a week for a year, that link at the top of your resume is going to be almost as awesome as your confidence about what you’ve learned.

↧

Collect and Baseline Wait Statistics (Dear SQL DBA Episode 14)

August 25, 2016, 8:00 am

≫ Next: Estimated vs. Actual Number of Rows in Nested Loop Operators

≪ Previous: Teach Yourself SQL Server Performance Tuning (Dear SQL DBA Episode 12)

What are the best tools to collect and baseline wait statistics? Should you write your own? Watch the 18 minute video or read the episode transcript below.

Prefer a podcast?

Dear SQL DBA…

I am getting into performance troubleshooting on SQL Server. I hear you talk about wait stats a lot, and how important they are to the process of troubleshooting.

What ways are there to check the wait stats for a given time? How would you go about creating a baseline for a system you have just taken over?

Sincerely,

Waiting on Stats

I do love wait stats!

If you listened to the performance tuning methodology I outlined in an earlier episode, you saw how important I think wait stats are for troubleshooting performance.

If you missed that episode, it’s called Lost in Performance Tuning. (I’ve got an outline of the discussion in the blog post, as always.)

If I’m going to manage the system for a long time, I would buy a vendor tool to baseline wait stats

SQL Server is a mature database. There’s a lot of vendors out there who have tapped into the need to track and baseline wait stats.

They’ve honed tools to:

Collect the waitstats in a lightweight manner
Store them in a repository and groom the data over time, so it doesn’t explode
Build reports for you to see big picture data
Build fancy UIs for you to zoom in on a point in time
Find queries that were running when the waits were occurring

Example vendors – I’m listing three that I’ve used before to solve problems:

SQL Sentry Performance Advisor, Idera Diagnostic Manager, Dell Software (formerly Quest) Spotlight on SQL Server Enterprise

I haven’t listed these in order of preference. I know people who swear by each of them.

Since monitoring systems for SQL Server are pretty mature, the differences are in the details.

Details can be very important, of course– research and trials will help you find which one is the best fit for your team, processes, and applications.

Should DBAs write their own tools?

There are some people out there who think you should roll your own tools. That it makes you more legitimate.

I’ve written a lot of my own tools. It takes a lot of time.

To get feature parity with what vendors are offering, we’re talking years of investment.

It’s really easy to negatively impact performance with your tools. Tool vendors work very hard to avoid this, and it even happens to them sometimes.

The difference is that the vendor has a bunch of engineers who can quickly fix the issue and release a new version.

It’s only worth it to write your own tools when nobody offers a solution that fits you.

It’s a little bit like monitoring your heart rate for your own health

I wear a heart rate monitor to help me estimate how active I am during the day, and how hard I work during my workouts. Heart rate monitors are pretty affordable, and you can choose between wearing them on your wrist and wearing a chest strap. Some are more accurate than others, and they have different reporting tools.

I could learn to take my own heart rate and sample and record it myself. I could probably build some reports off it. But I’m really happy having spent $150 for a device that does it for me.

This leaves me free to spend my time interpreting the heart rate and calorie burn data it gives me, and customizing my activity to fit my health plan.

How to get budget for a performance monitoring tool

Do two things:

Outline the business cases that a performance monitoring tool will help with. Link to specific customer incidents that it would help resolve.
Pick the top 2 or 3 vendor tools you’d like to test, and map their features to the business cases.

Bam, your request is looking a lot more legitimate.

Test them one at a time. Start with a non-production server.

Your best bet is to write some code to reproduce performance problems against that server.

Ideally these map to your business cases.

Other ideas:

Find sample code with searches to simulate blocking and deadlocks, if you’d like to start there.
Modify my sample code for testing inserts for race conditions with Microsoft’s free ostress tool for more fun (here it is)
Write some queries that read a lot of data and possibly run them from an Agent job (maybe it calls ostress)

Review how your use cases all look in the tool you’re testing.

Are the wait stats recorded and displayed well? Are they useful to you?

How easy is it for you to find the queries related to the wait stats?

Reach out to the vendor during your trial if you’re having problem. Sometimes the tools are smart in ways that aren’t obvious. This also gives you some insight into their support processes.

Tip: check if the tool which you test sends monitoring data to the cloud. If so, make sure you get that approved by management before putting the tools into production. In sensitive environments, get that approved before you test it, too.

If I’m troubleshooting a system for a short time, or if there’s no budget, I’ll use and contribute to an open source tool

Sometimes there’s good reasons for budgetary limitations– maybe you work for a non-profit and that money is literally feeding children.

Or maybe you’re doing a short term analysis and you just need to collect information over a couple of days, and there’s no time to test and deploy a more robust tool.

In that case, I’d start with sp_BlitzFirst from Brent Ozar Unlimited:

It’s free
It’s open source
It’s got some documentation to get you started
It’s already wired up to support different sample lengths, and write to tables
It looks at running queries as well as some system metrics to help point out critical information related to the wait stats

You can start with what others have built, and slowly contribute on your own as well. Much nicer than starting from scratch.

↧