In the beginning of my IT career, I was a developer. I wrote a lot of bad code that didn’t perform well. As more data entered my tables my stored procedures got slower and slower. More data, more problems! I started my career in performance tuning by making my crappy code a little less crappy. Therefore, I wanted to share with you an extended event added in SQL Server 2022 to help you identify some bad coding practices in your code that is also known as query anti-patterns.

Query Anti-Pattern’s Video

In the following video, you can learn how to capture query anti-patterns via Extended Events. Once you have found them, you can aggregate them to find the most occurring offenders or the offenders causing the most harm!

Query Anti-Pattern Demos

To start this demo, you will need to download a copy of the WildWorldImporters sample database.

You will need SQL Server 2022 as the query anti-pattern extended event is new in SQL Server 2022.

You will also need SQL Server Management Studio if you don’t already have it.

Demo Setup

Create Index

First, we will need to create an index we will use for our implicit conversion demo. If this index already exists, we will drop it.

USE [WideWorldImporters]
GO

DROP INDEX IF EXISTS [idx_Invoices_PurchaseOrder] ON [Sales].[Invoices]
GO

/****** Create Index for Anit-Pattern Implicit Conversion Demo ******/
CREATE NONCLUSTERED INDEX [idx_Invoices_PurchaseOrder] ON [Sales].[Invoices]
(
	[CustomerPurchaseOrderNumber] ASC
)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, 
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, 
OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [USERDATA]
GO

Create Query Anti-Pattern Extended Event

Next, we will create our extended event used to capture the query anti-pattern. If it exists, we will drop it before creating the extended event.

/***** Create Extended Event *****/
IF EXISTS (SELECT name FROM sys.dm_xe_sessions  WHERE name = 'query_antipattern')
DROP EVENT SESSION [query_antipattern] ON SERVER
GO

CREATE EVENT SESSION [query_antipattern] ON SERVER 
ADD EVENT sqlserver.query_antipattern(
    ACTION(sqlserver.session_id, sqlserver.client_app_name,sqlserver.client_hostname
	,sqlserver.database_name,sqlserver.query_hash,sqlserver.query_plan_hash,
	sqlserver.sql_text,sqlserver.username))
ADD TARGET package0.ring_buffer(SET max_events_limit=(0),max_memory=(51200))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,
TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
GO
-- Start Extended Event
ALTER EVENT SESSION [query_antipattern] ON SERVER STATE = START;
GO

QDS and Plan Cache Cleanup

Before we start, we will want to clear out or query store data and execution plans in cache. NOTE: Only do this in non-production. We shouldn’t be testing demo’s in production.

/* Start the Demo by puring plans and QDS */
USE [WideWorldImporters]
GO
ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;
ALTER DATABASE [WideWorldImporters] SET QUERY_STORE CLEAR
GO

What are query anti-patterns?

What are query anti-patterns? Great question. I like Andreas Schörghuber’s definition .. “Anti-patterns are common solutions to ineffective problems and cause more problems than they solve.” [1] In simpler terms, a query is written to get the correct data but consumes more reads, CPU or duration than rewriting the query to get the same results in a way that allows the query optimizer to utilize less reads, CPU or duration.

SQL Server’s Extended Events map values also lets you know which anti-patterns can be captured by the query_antipattern event.

/* What types of query_antipatterns are included? */
SELECT map_value 
FROM sys.dm_xe_map_values 
WHERE name = N'query_antipattern_type'; 


Below are the query anti-patterns included in the extended event as of SQL Server 2022 CU 10.

query anti-pattern extended event map values.

Start Query Anti-Pattern’s Demo

Finally, we can now utilize that index we created to go over our query anti-pattern example.

Implicit Conversion AKA ConvertPreventingSeek

In this example, we a have a simple query. As you can see in the video above. The column is NVARCHAR (20) but the predicate filter is INTEGER data type. The following image should help you visualize the data type mismatch.

The query has to convert the data in CustomerPurchaseOrderNumber column to integer to utilize the filter.

/* 
Examine what an implicit conversion is and 
how it will hurt performance! 
*/
SELECT InvoiceID
FROM [Sales].[Invoices]
WHERE [CustomerPurchaseOrderNumber]=17913


If you look at the live data of the extended event we added. You will see the following data collected.

Fix Implicit Conversion Query Anti-Pattern

This fix isn’t magic. It’s actually quite simple. You just have to make the data type of the predicate filter match the data type of the column.

/* How to fix an implicit conversion? 
Change the data type to match the column.
*/
SELECT InvoiceID
FROM [Sales].[Invoices]
WHERE [CustomerPurchaseOrderNumber]=N'17913'

Large Number of In Predicate

For our next query anti-pattern we will focus on an IN statement that has too many values.

/* Long List of IN Statements.
	Do we get alerts for both IN statements and Convert? */
 SELECT *
 FROM Sales.Invoices
 WHERE CustomerPurchaseOrderNumber IN (12126,15342,12211,17129,10369,
 13383,17913,14518,17577,18030,17639,10941,16374,17521,11911,10690,
 10255,13260,16825,12227,16070,16793,12467,13700,11503,13916,17783,
 13042,10411,15084,14159,11389,12559,19457,18468,14486,19057,18036,
 13079,19402,15267,12781,17334,16645,17591,14618,19857,17849,10947,
 11610,14853,10821,15491,14474,10079,11612,11991,19321,10210,14970,
 17468,11643,16411,19525,10996,16474,15832,12134,13630,14902,15212,
 19609,19809,13551,12805,18797,11553,11156,18583,19122,11615,12262,
 10310,19254,13954,13960,14630,18585,18834,18451,19842,18893,13004,
 19890,12941,13508,15403,19391,17938,19140,16362,18795,16388,18492,
 15272,14938,13868,18901,15236,19782,11134,10182,18651,14169,18514,
 18108,17328,13949,12727,17486,13783,15165,17627,13215,18992,10815,
 17466,12153,16275,10695,14744,12253,13328,18450,12232,19473,14441,
 19357,19208,14194,14468,16930,14270,13096,19458,13731,14609,17253,
 13190,16905)

A good question you might be wondering is which query anti-pattern will this query return. Is it the previous TypeConvertPreventingSeek or LargeNumberOfOrINPredicate?

If you cleared the Live Data Viewer for your extended event you will notice that its actually both anti-patterns. In the screen shot below I put a box around the query_hash so you know its the same unique query text. I also include the session_id and timestamp as well.

Fix Large Number of In Predicate

Now there are multiple ways on how you can convert this massive IN clause into joining to a Temp Table, Table Variable, Table-Valued Parameters, etc. I will leave up which one is best for another day. Today, we are going to use the Table-Valued Parameter.

NOTE: I am creating and dropping the table type in the demo code because this is a demo. Usually, you might have a table type per common data type filter you might use.

SECOND NOTE: Hopefully, you will see we also converted the list of integers into a list of NVARCHAR values. This also removes the Implicit Conversion query anti-pattern.

/*  How to fix this anti-pattern?  */
DROP TYPE IF EXISTS KeyList;
GO
CREATE TYPE KeyList AS TABLE ( 
	KeyValue NVARCHAR(50)
	)
GO
	DECLARE @Values KeyList
	INSERT INTO @Values VALUES (N'12126'),(N'15342'),(N'12211'),(N'17129'),(N'10369'),(N'13383'),(N'17913'),(N'14518'),(N'17577'),(N'18030'),(N'17639'),(N'10941'),(N'16374'),(N'17521'),(N'11911'),(N'10690'),(N'10255'),(N'13260'),(N'16825'),(N'12227'),(N'16070'),(N'16793'),(N'12467'),(N'13700'),(N'11503'),(N'13916'),(N'17783'),(N'13042'),(N'10411'),(N'15084'),(N'14159'),(N'11389'),(N'12559'),(N'19457'),(N'18468'),(N'14486'),(N'19057'),(N'18036'),(N'13079'),(N'19402'),(N'15267'),(N'12781'),(N'17334'),(N'16645'),(N'17591'),(N'14618'),(N'19857'),(N'17849'),(N'10947'),(N'11610'),(N'14853'),(N'10821'),(N'15491'),(N'14474'),(N'10079'),(N'11612'),(N'11991'),(N'19321'),(N'10210'),(N'14970'),(N'17468'),(N'11643'),(N'16411'),(N'19525'),(N'10996'),(N'16474'),(N'15832'),(N'12134'),(N'13630'),(N'14902'),(N'15212'),(N'19609'),(N'19809'),(N'13551'),(N'12805'),(N'18797'),(N'11553'),(N'11156'),(N'18583'),(N'19122'),(N'11615'),(N'12262'),(N'10310'),(N'19254'),(N'13954'),(N'13960'),(N'14630'),(N'18585'),(N'18834'),(N'18451'),(N'19842'),(N'18893'),(N'13004'),(N'19890'),(N'12941'),(N'13508'),(N'15403'),(N'19391'),(N'17938'),(N'19140'),(N'16362'),(N'18795'),(N'16388'),(N'18492'),(N'15272'),(N'14938'),(N'13868'),(N'18901'),(N'15236'),(N'19782'),(N'11134'),(N'10182'),(N'18651'),(N'14169'),(N'18514'),(N'18108'),(N'17328'),(N'13949'),(N'12727'),(N'17486'),(N'13783'),(N'15165'),(N'17627'),(N'13215'),(N'18992'),(N'10815'),(N'17466'),(N'12153'),(N'16275'),(N'10695'),(N'14744'),(N'12253'),(N'13328'),(N'18450'),(N'12232'),(N'19473'),(N'14441'),(N'19357'),(N'19208'),(N'14194'),(N'14468'),(N'16930'),(N'14270'),(N'13096'),(N'19458'),(N'13731'),(N'14609'),(N'17253'),(N'13190'),(N'16905')
	select * 
	FROM Sales.Invoices
	JOIN @Values Tst on (Tst.KeyValue = CustomerPurchaseOrderNumber)   	
GO

Query Anti-Pattern Wrap Up

It’s now easier than before, to find your top offending anti-patterns. Just create the extended event and aggregate the results.

If you would like some help making your queries go faster contact us. We also do health checks where you get results. We guarantee value provided or you get a 100% refund!

Also, if you liked this tip join our newsletter so you can get more free videos and tips.

Tuning select statements is less complicated than tuning update statements because your code changes do not change data in your tables. Tuning update statements on the other hand can be tricky as you are changing data while you are testing. See multiple ways on how you can streamline tuning update statements to quickly get to your starting point to make sure you are comparing apples to apples.

You are out doing your scheduled sweeps of the environment and you notice a huge I/O consumer is one of your update procedures. Your initial thought may be to panic because how can you test data that is changing? You have several methods to tuning queries that adjust data that allow data to revert between tests.

NOTE: We always recommend tuning all queries in non-production until we have our final tested solution to implement in production.

Tuning Update Statement with Rollback Transaction

One of the more basic methods for executing queries without changing data is to wrap the statements are you running in a BEGIN TRANSACTION and ROLLBACK TRANSACTION. This allows everything to execute and give you the performance statistics you are looking for, while leaving the data intact to allow for repeat testing.

Tuning Update Statements using Explicit Transactions. This is BEGIN TRANSACTION and ROLLBACK or COMMIT TRANSACTION.
Using rollback to revert a change made to data

Tuning Update Statements using Database Backups

You have a couple options with your standard SQL Server backups to assist in reverting changes made to the testing environment and the same concepts are applied to your basic DR strategy. If the test is short, you can take a full backup before making a change and restore that backup to restart a test.

Tuning Update Statements with Restoring Full Backups. You can use a full backup to revert a change while tuning.
Using a full backup to revert a data change

If you are making a series of changes over a period of time and don’t want to completely restart your test, you can utilize point in time recovery. Once your database is in full recovery, you can start with a full backup and take log backups at set intervals or before different pieces of your change. This allows you to restore to a specific point in your test.

Tuning Update Statements with using Point in time recovery with full and log backups for tuning
Using a full backup with transaction log backups for Point in Time Recovery

Tuning Update Statements with Database Snapshot

Database Snapshots are a great way to quickly revert large databases to an initial state. Database Snapshots keep a record of database pages as they change and the copy of the page so those extents can be reverted instead of restoring the whole database. Therefore, they require the associated source database as it is not a full copy of the database. Snapshots contain the original version of files of the source database and a pointer file that shows the changes since the snapshot was taken, called a sparse file. The source database must be available for snapshots to be usable. This makes them not an ideal DR solution, but great for analysis in auditing and in tuning situations.

Tuning Update Statements while using snapshots to revert a data change for tuning
Using 2016 Database Snapshots to revert a change

Summary

There are many ways to quickly get back to your original state to allow for quick repetitive testing for queries that change data. A huge benefit of the methods I listed today allow you to also document where transactions and backups occur so you can open a .sql file and have all the needed steps in one document. Do you like other ways to revert changes? Maybe temporal tables? I’d be happy to cover that in a post in the near future. Tell me all about your methods in the comments below!

The post Tuning Basics – Updating an Update appeared first on dbWonderKid.

SQL Server 2022 was announced yesterday at Microsoft’s Ignite Conference. Here are some of the items that caught our attention. This includes SQL Server 2022 Announcements and other announcements around the Microsoft Data Platform.

Bob Ward blessed the community with this presentation over his three favorite features..

SQL Server 2022 First Thoughts

The following is a summary of some of my favorite announcements with some initial thoughts.

Bi-directional HA/DR to Azure SQL – With SQL Server 2022, you can use Distributed Availability Groups to have HA/DR replica in Azure Managed Instance. You can failover your Read/Write Traffic. You have to do a backup and a restore into SQL Server 2022 to go back from the Azure Managed Instance. Hopefully, Distributed Availability Group failovers will be bi-directional in the near future.

Query Store Enhancements (QDS) – SQL Server 2022 allows you to have separate data for your readable replicas. QDS will now be on by default for new databases. QDS provides the ability to utilize hints to easily shape query plans with no code changes.

Intellectual Query Processing Improvements (IQP)- IQP has new enhancements to memory grant feedback to persist data instead of requiring changes only to happen when queries are executed frequently.IQP has more improvements without changing your code. These improvements include changes with MAXDOP and CE Feeback. The query can detect and change how many threads are used so large tables can have more threads as needed. Saving the best for last. Progress to help smooth out parameter sniffing will also be included in SQL 2022. Looking forward to seeing how this is like Automatic Tuning Automatic Plan Correction introduced in SQL Server 2017.

Free SQL Assessments for Azure VMs – If you are using an Azure VM to host an instance of SQL Server you might be interested in the new free assessment offering coming out of Ignite.

New Class Ebdsv5 Virtual Machines – Now 30% better price on performance compared to the previous greatest option. Remote I/O Throughput goes up to 2,500 MBPS so less memory could be required due to higher read throughput.

SQL Server 2022 integrates with Azure Synapse Link and Azure Purview – Being able to manage data discovery within both on-premise and the cloud is a great win! The same could be said for breaking down silos to help you get the most value out of your data.

SQL Server Ledger – Looks like the Azure SQL Database Ledger which was announced in May of this year is coming to SQL Server 2022. Always excited to see the most secure relational database platform increase its security features., Microsoft is bringing the same blockchain capabilities to SQL Server.

SQL Server 2022 What Is Next?

If you want to get an early jump on SQL Server 2022 you can apply to enter the Early Adoption Program (EAP). I also recommend signing up for PASS Member Summit Conference that starts next week. I will be giving a 101 Fundamentals talk on Performance Tuning. It’s 100% FREE to attend.

What is your favorite feature of SQL Server 2002? What do you wish was added to the feature list? Let us know in the comments.

What is a heap?

Heaps are unorganized just like this deck of cards
Heaps are unorganized just like this deck of cards

Heap was actually named pretty accurately. It is a “pile” of unsorted/unorganized data. A heap will exist on any table that lacks a clustered index. Heaps can exist even if you have a non clustered indexes. The main advantage to a heap is the speed at which you can insert data into your tables, as it doesn’t need to follow a logical order like inserting identity values into the last page of a table.

What is a clustered index?

Clustered Indexes are sorted and organized unlike table heaps
Clustered Indexes are sorted and organized unlike table heaps

A clustered index is used as the main method of logically sorting data in a table. It can be done on a key-value defined in the table, but it doesn’t require an actual primary key to exist. Clustered indexes should be created on the most common column(s) used in the predicates of your most frequently executed statements. All data will be sorted to match the index and you can only have 1 clustered index per table. The main advantage to having a clustered index is increased speed on data reads.

How should I identify when a clustered index should be used instead of a heap?

As noted above, the main advantage of a clustered index is an increase in the speed of your reads. To get an idea of where you might need to increase read speed, you should first identify all of your tables that do not have a clustered index. I found it very helpful to also place lower limits on the number of rows in a table and the number of reads against that table (by checking dynamic system views). The below query is a great start into looking at those things.

DECLARE @type_of_index INT = 0 /*0 = heap, 1 = clustered, 2 = non clustered*/ 
	, @min_num_of_reads INT = 10
	, @min_num_of_rows INT = 100

SELECT OBJECT_NAME(IX.OBJECT_ID) AS Table_Name
   ,SCH.name AS SchemaName
   ,IX.name AS Index_Name
   ,IX.type_desc AS Index_Type
   ,PS.row_count AS TableRowCount
   ,SUM(PS.[used_page_count]) * 8 IndexSizeKB
   ,IXUS.user_seeks AS NumOfSeeks
   ,IXUS.user_scans AS NumOfScans
   ,IXUS.user_seeks + user_scans AS NumOfReads
   ,IXUS.user_lookuPS AS NumOfLookuPS
   ,IXUS.user_updates AS NumOfUpdates
   ,IXUS.last_user_seek AS LastSeek
   ,IXUS.last_user_scan AS LastScan
   ,IXUS.last_user_lookup AS LastLookup
   ,IXUS.last_user_update AS LastUpdate
FROM sys.indexes IX
	INNER JOIN sys.objects O 
		ON ix.object_id = O.object_id
	INNER JOIN sys.schemas SCH 
		ON SCH.schema_id = O.schema_id
	INNER JOIN sys.dm_db_index_usage_stats IXUS 
		ON IXUS.index_id = IX.index_id 
		AND IXUS.OBJECT_ID = IX.OBJECT_ID
	INNER JOIN sys.dm_db_partition_stats PS 
		ON PS.object_id = IX.object_id  
		AND PS.index_id = IX.index_id
WHERE OBJECTPROPERTY(IX.OBJECT_ID,'IsUserTable') = 1
	AND IX.index_id = @type_of_index 
	AND PS.index_id = @type_of_index 
	AND IXUS.index_id = @type_of_index
	AND IXUS.database_id = (DB_ID())
	AND IXUS.user_scans + IXUS.user_seeks > @min_num_of_reads
	AND PS.row_count > @min_num_of_rows
GROUP BY OBJECT_NAME(IX.OBJECT_ID), IX.name, IX.type_desc, IXUS.user_seeks, IXUS.user_scans, IXUS.user_lookuPS, IXUS.user_updates, last_user_seek, IXUS.last_user_scan, IXUS.last_user_lookup, IXUS.last_user_update, SCH.name, PS.row_count
ORDER BY TableRowCount DESC, NumOfReads DESC

Now that you see what tables are on the larger side that also has a lot of reads, we can dive deeper into the MSSQL system dynamic management views to see how these tables are being used in the system by looking at query stats and plans. Enter your table name in the variable and you can see the number of times a plan has been used, the text associated to that plan, and the plan itself to validate if the query is using a table scan due to the heap or if there is a non clustered index that is being used instead. The 2nd result set will give you object names if the table is being used in a SQL object like a stored procedure or function

DECLARE @TableName NVARCHAR(50) = 'ExampleTableName'

SELECT TOP 10 databases.name AS DBName
	, dm_exec_sql_text.text AS TSQL_Text,
	, dm_exec_query_stats.creation_time, 
	, dm_exec_query_stats.execution_count,
	, dm_exec_query_stats.total_worker_time AS total_cpu_time,
	, dm_exec_query_stats.total_elapsed_time, 
	, dm_exec_query_stats.total_logical_reads, 
	, dm_exec_query_stats.total_physical_reads, 
	, dm_exec_query_plan.query_plan
FROM sys.dm_exec_query_stats 
	CROSS APPLY sys.dm_exec_sql_text(dm_exec_query_stats.plan_handle)
	CROSS APPLY sys.dm_exec_query_plan(dm_exec_query_stats.plan_handle)
	INNER JOIN sys.databases
		ON dm_exec_sql_text.dbid = databases.database_id
WHERE dm_exec_sql_text.text LIKE '%'+@TableName+'%'
ORDER BY ;

SELECT * 
FROM sys.sql_modules M 
	INNER JOIN sys.objects O 
		ON M.object_id = O.object_id
WHERE M.definition LIKE '%'+@TableName+'%'

After reviewing use cases and their query plans, it’s time to decide if the table in question meets the above recommendations for a clustered index or if it’s better served as a heap, and what columns to include in the index. If the table has multiple use cases but they all typically share the same column/s, create the clustered index on that table with those column/s.

Now don’t go throwing clustered indexes around like Oprah to the studio audience

The clustered index is helpful in a lot of situations, but there are some instances where it is not necessary or can actually do some harm to your performance.

The main example of this would be logging tables. These tables typically have a lot of insert operations, but not necessarily a lot of reads or even updates as they are just logging everything that is happening and are referenced infrequently. Putting an index on a table like that could cause hot latches which are caused by data waiting to be inserted into the last available page as the key is typically sequential and other data is frequently being inserted into that page. The only exception to this rule would be if your key column of the index is a GUID, as the GUID is random and not sequential.

Watch out for making your clustered index on columns/data that is typically not static. You would update these key values frequently. Updating key values on an index will likely cause performance issues. The clustered index is used as a lookup point for all non clustered indexes.  By updating the key values, you can cause page splits. Page splits will require index maintenance to fix your problem and return performance to where it belongs.

Another example of a common mistake is making the clustered index with too many columns. A clustered index is defining your default sort of all data in the table. Too many columns in an index could slow down data operations by requiring resorting for additional use cases and it will also increase the size of all non clustered indexes on that table as well (and there is a limit to how wide a key can be for an index).

Finally, never forget the most important part of adding indexes: TEST TEST and TEST.

Post a comment below if you have any other tips or tricks when it comes to fixing your heap woes below!

Replay in the Cloud like a RockStar
Replay like a Rockstar!

Want to save money, validate performance, and make sure you don’t have errors while migrating to Azure SQL Database, Azure SQL Managed Instance, SQL Server RDS in Amazon AWS? In this video, you will learn how to use the Data Experimentation Assistant to replay and compare your on-premise or cloud SQL Server workloads on-demand. First, you will learn how to capture a workload on-premise or in the cloud. Next, you will master replaying your workload on-demand as needed. Finally, you will do an analysis of comparing a baseline workload and another workload with your changes.

Replay in the Cloud Video

Workload Replay is the Secret Weapon for a Successful Migration to the Cloud

Slidedeck

Recommended Reads

If you liked this video, check out the following other resources.

Today, I want to focus on a free tool that everyone can use when benchmarking queries or performance tuning their queries. This tool is Plan Explorer by SentryOne (I think I will still always know them as SQLSentry).

Automatic Tuning: Execution Plan Correction
Plan Explorer is another great free tool that should be in every performance tuners toolbelt.

Most people use Management Studio (SSMS) when they are benchmarking queries and improving the performance of their queries. It makes sense because this is the same tool most people use to develop their queries. I have found it to be priceless to have a history log of all my changes. Therefore, I can quickly see how each change impacts the logical reads, duration, CPU, and the execution plan during all of my code changes for the query.

If you are new to benchmarking and improving queries, go ahead and start the video from the beginning. I show you how you can get the actual execution plan, duration, CPU, and reads in SSMS. If you want to see how I track changes via history feature of Plan Explorer, go ahead and skip to 3:36 part in the eight-minute video.

Quickly track the performance changes of your code tuning with Plan Explorer’s history tracking feature.

If you enjoyed this tip go ahead and join our newsletter for free tips and videos.

Query Store for Workload Replays
Query Store for Workload Replays

This month’s T-SQL Tuesday is hosted by Tracy Boggiano. Tracy invites us all to write about adopting Query Store. Today, I wanted to share my favorite but a very unique way I use the Query Store for Workload Replays.

You can read more about the invite in detail by clicking on the T-SQL Tuesday logo in this post.

Today, I wanted to talk about my least favorite part of replaying workloads. It’s having an extended event or server-side trace running during a workload replay only so we can compare the results at a query-level when the replay is finished. Now, this might seem like a trivial thing but when you have workloads over 10k batch requests/sec this can consume terabytes of data quickly. The worst part is waiting to read all the data, slice and dice the data for analysis.

Starting with SQL Server 2016 there is a better and faster way to go! You can replace your extended event or server-side trace with Query Store captured data. Today, I will show you how to use the Query Store for the same purpose.

Different Settings

Keep in mind our goal here is very different from the typical use case for using the Query Store. We want to capture metrics for all the queries executed during a workload replay. Nothing more and nothing less.

If we have the runtime results for multiple replays we can then easily compare the workload performance between the workload replays.

Most of our changes from the regular Query Store best practices are shown below:

  • Max Size (MB) – Need to make sure there is enough space to capture your whole workload. This size will vary by how much workload is being replayed.
  • Query Store Capture Mode set to All. Normally, not ideal, but remember we want to capture metrics for our whole workload being replayed.
  • Size Based Cleanup Mode set Off – Yup, we don’t want to lose our workload data that is capture until we persist in our ideal form. More on this later.

The Capture Process

Now, this is where you would use Database Experimentation Assistant (DEA), Distributed Replay or some other process to replay your consistent workload in an isolated non-production environment. This subject we will cover in another future post. For now, we will just have two replays called “Baseline” and “Change”. This simulates a baseline replay with no schema changes and then another change replay with a change introduced in the schema.

To capture our workload we just enable the Query store with our settings mentioned above and also clear out the query store right before our workload replay starts to help ensure we are just capturing our workload.

USE [master]
GO
ALTER DATABASE [YourDatabase] SET QUERY_STORE = ON
GO
ALTER DATABASE [YourDatabase] SET QUERY_STORE (OPERATION_MODE = READ_ONLY, MAX_STORAGE_SIZE_MB = 10000, QUERY_CAPTURE_MODE = AUTO, SIZE_BASED_CLEANUP_MODE = OFF)
GO
ALTER DATABASE [YourDatabase] SET QUERY_STORE CLEAR
GO

Stop Capturing Query Store Data

Once your replay is finished we will want to disable the query store from writing data into the query store. We want the least amount of non-workload data inside of the Query Store when we are using it for the sole purpose of comparing workloads.

USE [master]
GO
ALTER DATABASE [YourDatabase] SET QUERY_STORE (OPERATION_MODE = READ_ONLY)
GO

Prepare Query Store Data for Long-Term Analysis

Now for smaller workloads, one might be happy with utilizing DBCC CLONEDATABASE to have a schema-copy of their workload with Query Store data persisted. This is perfectly fine. With bigger workloads being captured I have noticed there are ways to improve the performance of query store when doing analysis of the query store data. For example, clustered columnstore indexes can be very helpful for performance and compacity. Therefore, I like to have a schema for each replay and import the data. The following is a quick example of setting up a schema for a “baseline” replay and a “change” replay.

CREATE DATABASE [DBA]
GO
use [DBA]
GO
CREATE SCHEMA Baseline;
GO
CREATE SCHEMA Compare;
GO

Next, we will import our captured data from our baseline replay that’s in our read-only query store database. I also like to have a baked-in aggregate of metrics for reads, writes, duration and CPU at the query level.

use [YourDatabase]
GO
/* Load Data */
SELECT * INTO DBA.Baseline.query_store_runtime_stats FROM sys.query_store_runtime_stats; SELECT * INTO DBA.Baseline.query_store_runtime_stats_interval from sys.query_store_runtime_stats_interval; select * INTO DBA.Baseline.query_store_plan from sys.query_store_plan; select * INTO DBA.Baseline.query_store_query
from sys.query_store_query; select * INTO DBA.Baseline.query_store_query_text
from sys.query_store_query_text;
/* Addition for SQL 2017 */
select * INTO DBA.Baseline.query_store_wait_stats from sys.query_store_wait_stats use [DBA]
GO SELECT SUM(Count_executions) AS TotalExecutions,
SUM(Count_executions*avg_duration) AS TotalDuration,
SUM(Count_executions*avg_logical_io_reads) AS TotalReads,
SUM(Count_executions*avg_logical_io_writes) AS TotalWrites,
SUM(count_executions*avg_cpu_time) AS TotalCPU,
query_hash
INTO Baseline.QueryResults
FROM Baseline.query_store_runtime_stats rs
JOIN Baseline.query_store_plan p ON rs.plan_id = p.plan_id
JOIN Baseline.query_store_query q ON p.query_id = q.query_id
GROUP BY q.query_hash

Next, we would reset the database to our starting position and add our query store settings as mentioned above in this blog post and replay or workload again. This time, we would dump our data into the “change” schema

use [YourDatabase]
GO
/* Load Data */
SELECT * INTO DBA.Compare.query_store_runtime_stats FROM sys.query_store_runtime_stats; SELECT * INTO DBA.Compare.query_store_runtime_stats_interval from sys.query_store_runtime_stats_interval; select * INTO DBA.Compare.query_store_plan from sys.query_store_plan; select * INTO DBA.Compare.query_store_query
from sys.query_store_query; select * INTO DBA.Compare.query_store_query_text
from sys.query_store_query_text; select * INTO DBA.Compare.query_store_wait_stats from sys.query_store_wait_stats use [DBA]
GO SELECT SUM(Count_executions) AS TotalExecutions,
SUM(Count_executions*avg_duration) AS TotalDuration,
SUM(Count_executions*avg_logical_io_reads) AS TotalReads,
SUM(Count_executions*avg_logical_io_writes) AS TotalWrites,
SUM(count_executions*avg_cpu_time) AS TotalCPU,
query_hash
INTO Compare.QueryResults
FROM Compare.query_store_runtime_stats rs
JOIN Compare.query_store_plan p ON rs.plan_id = p.plan_id
JOIN Compare.query_store_query q ON p.query_id = q.query_id
GROUP BY q.query_hash

Comparing Workload Results

Now that we have our two workloads imported we can now compare to see how the workload changed per query. I will break this down into two quick steps. First, get deltas per query. Second, get totals for how many times a query might be different in the query store. More on this a little later in the post.

/* Query Store Results */
use [DBA]
GO SELECT DISTINCT c.TotalExecutions - b.TotalExecutions AS ExecutionDelta,
c.TotalExecutions AS CompareExecutions,
b.TotalExecutions AS BaselineExecutions,
c.TotalDuration - b.TotalDuration AS DurationDelta,
c.TotalCPU - b.TotalCPU AS CPUDelta,
c.TotalReads - b.TotalReads AS ReadDelta,
c.TotalWrites - b.TotalWrites AS WriteDelta,
c.TotalReads AS CompareReads,
b.TotalReads AS BaselineReads,
c.TotalCPU AS CompareCPU,
b.TotalCPU AS BaselineCPU,
c.TotalDuration AS CompareDuration,
b.TotalDuration AS BaselineDuration,
c.query_hash
--q.query_sql_text
INTO #CTE
FROM Baseline.QueryResults b
JOIN Compare.QueryResults c ON b.query_hash = c.query_hash select COUNT(query_sql_text) AS QueryCount, MAX(query_sql_text) query_sql_text, MIN(query_id) MinQueryID, qsq.query_hash
INTO #Compare
from Compare.query_store_query qsq
JOIN Compare.query_store_query_text q ON qsq.query_text_id = q.query_text_id where qsq.is_internal_query = 0
GROUP BY query_hash select COUNT(query_sql_text) AS QueryCount, MAX(query_sql_text) query_sql_text, MIN(query_id) MinQueryID, qsq.query_hash
INTO #Baseline
from Baseline.query_store_query qsq
JOIN Baseline.query_store_query_text q ON qsq.query_text_id = q.query_text_id where qsq.is_internal_query = 0
GROUP BY query_hash select cte.*
, a.QueryCount AS Compare_QueryCount
, b.QueryCount AS Baseline_QueryCount
, a.MinQueryID AS Compare_MinQueryID
, b.MinQueryID AS Baseline_MinQueryID
, a.query_sql_text
FROM #CTE cte JOIN #Compare a on cte.query_hash = a.query_hash
JOIN #Baseline b on cte.query_hash = b.query_hash
WHERE 1=1
AND ExecutionDelta = 0
ORDER BY ReadDelta ASC

Query Store for Workload Replays

Query Store for Workload Replay gives you performance metrics to the query level.
Workload Replays compared down to the query execution level is priceless!

Lessons Learned Along the Way!

Initially, working with the query store I thought query_id was going to be my best friend. I quickly learned that my old friend query_hash is more helpful for multiple reasons. One, I can easily compare queries between different replays. That’s right now all workload replays get you the same query_id even when the workload is the exact same being replayed. Two, I can compare them with different databases as well. Finally, query_hash is very helpful with ad-hoc workloads as I can aggregate all the different query_ids that have the same query hash.

Need Performance Tuning Help?

Let us help you! Schedule a free 30-minute chat to see if we can point you the right direction. If you liked this blog post signup for our free newsletter! Also, take advantage of our free training videos!.

The post Query Store for Workload Replays appeared first on SQL Server Consulting & Remote DBA Service.

Microsoft made SQL Server 2019 Generally Available this week we want to share some videos and code examples of our favorite new features. Most of these will make your code go faster without any code changes!

We have been testing SQL Server 2019 for months and hope you enjoy these features as much as we do!

The post 5 Game Changers with SQL Server 2019 appeared first on SQL Server Consulting & Remote DBA Service.

While not all implicit conversions are equal implicit conversions matter. For example, just this week I ran into a trivial code change that caused some severe regression in performance. One simple filter change caused a query to consume 227 million reads compared to 212 reads. What did the filter do? Well, it forced an implicit conversion.

In this blog post you will learn what is an implicit conversion, how to identify it, and also how do you find the implicit conversion leading to the highest amount of reads in the query store.

What is a SQL Server Implicit Conversion?

This is a great question. The answer is when SQL Server optimizer has to convert data in a column to match the type you are using in your query. For example, if you had a column that was an Integer and you decided to filter off of a string “varchar” variable. ORM’s like Entity Framework is notoriously known for this. You would end up forcing an implicit conversion. This will force the optimizer to read every value in the column and cast it so the column can then be filtered based off your variable or another column that doesn’t match the same data type.

Let’s look at an example. Yes, this is exactly similar to the example mentioned in the first paragraph that consumed 227 million reads. We are just going to simplify it down for you with a copy of the AdventureWorks modified database.

SalesOrderID column is an INT, not VARCHAR. We are going to filter with a VARCHAR data type using the LIKE operator as shown below.

SELECT [SalesOrderID]
      ,[SalesOrderDetailID]
      ,[CarrierTrackingNumber]
      ,[OrderQty]
      ,[ProductID]
      ,[SpecialOfferID]
      ,[UnitPrice]
      ,[UnitPriceDiscount]
      ,[LineTotal]
      ,[rowguid]
      ,[ModifiedDate]
  FROM [AdventureWorks2012].[Sales].[SalesOrderDetailEnlarged]
  WHERE SalesOrderID LIKE '43659'

This will produce the following execution plan and statistics which is not ideal at all. We will get a warning for the implicit conversion and also go parallel scanning the whole table due to this conversion.

Implicit Conversion Forces Table Scan
A SQL Server Implicit Conversion Forces a Table Scan
Implicit Conversions hurt performance
SQL Server Implicit Conversions causing the whole table to be read.
statistics for our implicit conversion
The following is statistics for our implicit conversion

How to fix Implicit Conversions in SQL Server

Now that we know how to identify an implicit conversion I bet you might be wondering how do you resolve them? They usually come in two flavors. First, the columns do not match data types on a join. Second, the parameter or literals data type does not match with the column. Today, we will focus on the second time in the code below.

The only thing we change in our code is the type of filter.

SELECT [SalesOrderID]
      ,[SalesOrderDetailID]
      ,[CarrierTrackingNumber]
      ,[OrderQty]
      ,[ProductID]
      ,[SpecialOfferID]
      ,[UnitPrice]
      ,[UnitPriceDiscount]
      ,[LineTotal]
      ,[rowguid]
      ,[ModifiedDate]
  FROM [AdventureWorks2012].[Sales].[SalesOrderDetailEnlarged]
  WHERE SalesOrderID = 43659
Parameter matches column data type
The literal parameter matches with the column data type.
Gone from 76k reads to 3!

How do I get Implicit Conversions from the Query Store?

Now that you understand implicit conversions you might wonder how do I find them? Well, hopefully, you are on SQL Server 2016 or higher and can query the execution plans in query store. If not, check out Jonathan Kehayias example to find implicit conversions in the plan cache.

Disclaimer: Plan cache will only have execution plans that are still in memory. Query store will only have execution plans that are in the query store. Also, I added variables for the start date, end date, and top offenders. Performance milage will vary based on how your query store is configured and used. run the query below at your own risk!

DROP TABLE IF EXISTS #plans
DROP TABLE IF EXISTS #tmp1
DROP TABLE IF EXISTS #tmp2
DROP TABLE IF EXISTS #finalplan
GO

DECLARE @StartDate datetime = dateadd(d,-1, getdate()),
 @EndDate datetime = getdate(), 
 @TopStmts INT = 1000

SELECT TOP (@TopStmts)
	SUM(qrs.count_executions) * AVG(qrs.avg_logical_io_reads) as est_logical_reads,
    SUM(qrs.count_executions) AS sum_executions,
    AVG(qrs.avg_logical_io_reads) AS avg_logical_io_reads,
    SUM(qsq.count_compiles) AS sum_compiles,
   qsp.plan_id
INTO #tmp1
FROM sys.query_store_query qsq
JOIN sys.query_store_plan qsp on qsq.query_id=qsp.query_id
JOIN sys.query_store_runtime_stats qrs on qsp.plan_id = qrs.plan_id
WHERE qrs.first_execution_time >= @StartDate 
AND qrs.last_execution_time <= @EndDate
group by qsp.plan_id
ORDER BY est_logical_reads DESC 


SELECT cte.*, TRY_CAST (qsp.query_plan AS XML) AS xml_query_plan 
INTO #tmp2
FROM #tmp1 cte
JOIN sys.query_store_plan qsp ON cte.plan_id = qsp.plan_id


SELECT #tmp2.* 
INTO #plans
FROM #tmp2;

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
DECLARE @dbname SYSNAME 
SET @dbname = QUOTENAME(DB_NAME());

;WITH XMLNAMESPACES 
   (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') 
SELECT p.est_logical_reads, p.sum_executions, p.avg_logical_io_reads, p.plan_id
,   stmt.value('(@StatementText)[1]', 'varchar(max)') AS SQLStmt, 
   t.value('(ScalarOperator/Identifier/ColumnReference/@Schema)[1]', 'varchar(128)') AS SchemaName, 
   t.value('(ScalarOperator/Identifier/ColumnReference/@Table)[1]', 'varchar(128)') AS TableName, 
   t.value('(ScalarOperator/Identifier/ColumnReference/@Column)[1]', 'varchar(128)') AS ColumnName, 
 ic.DATA_TYPE AS ConvertFrom, 
 ic.CHARACTER_MAXIMUM_LENGTH AS ConvertFromLength, 
   t.value('(@DataType)[1]', 'varchar(128)') AS ConvertTo, 
   t.value('(@Length)[1]', 'int') AS ConvertToLength,
   p.xml_query_plan
   INTO #finalplan
   FROM #plans p
CROSS APPLY xml_query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt) 
CROSS APPLY stmt.nodes('.//Convert[@Implicit="1"]') AS n(t) 
CROSS APPLY stmt.nodes('.//PlanAffectingConvert[@ConvertIssue="Cardinality Estimate"]') AS ce(pt) 
JOIN INFORMATION_SCHEMA.COLUMNS AS ic 
ON QUOTENAME(ic.TABLE_SCHEMA) = t.value('(ScalarOperator/Identifier/ColumnReference/@Schema)[1]', 'varchar(128)') 
   AND QUOTENAME(ic.TABLE_NAME) = t.value('(ScalarOperator/Identifier/ColumnReference/@Table)[1]', 'varchar(128)') 
   AND ic.COLUMN_NAME = t.value('(ScalarOperator/Identifier/ColumnReference/@Column)[1]', 'varchar(128)') 
WHERE t.exist('ScalarOperator/Identifier/ColumnReference[@Database=sql:variable("@dbname")][@Schema!="[sys]"]') = 1


SELECT fp.est_logical_reads, fp.sum_executions, fp.avg_logical_io_reads, fp.plan_id, 
qsq.query_id, qsq.query_hash, fp.SQLStmt, fp.SchemaName, fp.TableName, 
fp.ColumnName, fp.ConvertFrom, fp.ConvertTo, fp.ConvertFromLength, fp.ConvertToLength, fp.xml_query_plan
FROM #finalplan fp
JOIN sys.query_store_plan qsp ON fp.plan_id = qsp.plan_id
JOIN sys.query_store_query qsq ON qsp.query_id = qsq.query_id
ORDER BY fp.est_logical_reads DESC