In the beginning of my IT career, I was a developer. I wrote a lot of bad code that didn’t perform well. As more data entered my tables my stored procedures got slower and slower. More data, more problems! I started my career in performance tuning by making my crappy code a little less crappy. Therefore, I wanted to share with you an extended event added in SQL Server 2022 to help you identify some bad coding practices in your code that is also known as query anti-patterns.
Query Anti-Pattern’s Video
In the following video, you can learn how to capture query anti-patterns via Extended Events. Once you have found them, you can aggregate them to find the most occurring offenders or the offenders causing the most harm!
Query Anti-Pattern Demos
To start this demo, you will need to download a copy of the WildWorldImporters sample database.
You will need SQL Server 2022 as the query anti-pattern extended event is new in SQL Server 2022.
You will also need SQL Server Management Studio if you don’t already have it.
Demo Setup
Create Index
First, we will need to create an index we will use for our implicit conversion demo. If this index already exists, we will drop it.
USE [WideWorldImporters]
GO
DROP INDEX IF EXISTS [idx_Invoices_PurchaseOrder] ON [Sales].[Invoices]
GO
/****** Create Index for Anit-Pattern Implicit Conversion Demo ******/
CREATE NONCLUSTERED INDEX [idx_Invoices_PurchaseOrder] ON [Sales].[Invoices]
(
[CustomerPurchaseOrderNumber] ASC
)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [USERDATA]
GO
Create Query Anti-Pattern Extended Event
Next, we will create our extended event used to capture the query anti-pattern. If it exists, we will drop it before creating the extended event.
/***** Create Extended Event *****/
IF EXISTS (SELECT name FROM sys.dm_xe_sessions WHERE name = 'query_antipattern')
DROP EVENT SESSION [query_antipattern] ON SERVER
GO
CREATE EVENT SESSION [query_antipattern] ON SERVER
ADD EVENT sqlserver.query_antipattern(
ACTION(sqlserver.session_id, sqlserver.client_app_name,sqlserver.client_hostname
,sqlserver.database_name,sqlserver.query_hash,sqlserver.query_plan_hash,
sqlserver.sql_text,sqlserver.username))
ADD TARGET package0.ring_buffer(SET max_events_limit=(0),max_memory=(51200))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,
TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)
GO
-- Start Extended Event
ALTER EVENT SESSION [query_antipattern] ON SERVER STATE = START;
GO
QDS and Plan Cache Cleanup
Before we start, we will want to clear out or query store data and execution plans in cache. NOTE: Only do this in non-production. We shouldn’t be testing demo’s in production.
/* Start the Demo by puring plans and QDS */
USE [WideWorldImporters]
GO
ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;
ALTER DATABASE [WideWorldImporters] SET QUERY_STORE CLEAR
GO
What are query anti-patterns?
What are query anti-patterns? Great question. I like Andreas Schörghuber’s definition .. “Anti-patterns are common solutions to ineffective problems and cause more problems than they solve.” [1] In simpler terms, a query is written to get the correct data but consumes more reads, CPU or duration than rewriting the query to get the same results in a way that allows the query optimizer to utilize less reads, CPU or duration.
SQL Server’s Extended Events map values also lets you know which anti-patterns can be captured by the query_antipattern event.
/* What types of query_antipatterns are included? */
SELECT map_value
FROM sys.dm_xe_map_values
WHERE name = N'query_antipattern_type';
Below are the query anti-patterns included in the extended event as of SQL Server 2022 CU 10.
Start Query Anti-Pattern’s Demo
Finally, we can now utilize that index we created to go over our query anti-pattern example.
Implicit Conversion AKA ConvertPreventingSeek
In this example, we a have a simple query. As you can see in the video above. The column is NVARCHAR (20) but the predicate filter is INTEGER data type. The following image should help you visualize the data type mismatch.
The query has to convert the data in CustomerPurchaseOrderNumber column to integer to utilize the filter.
/*
Examine what an implicit conversion is and
how it will hurt performance!
*/
SELECT InvoiceID
FROM [Sales].[Invoices]
WHERE [CustomerPurchaseOrderNumber]=17913
If you look at the live data of the extended event we added. You will see the following data collected.
Fix Implicit Conversion Query Anti-Pattern
This fix isn’t magic. It’s actually quite simple. You just have to make the data type of the predicate filter match the data type of the column.
/* How to fix an implicit conversion?
Change the data type to match the column.
*/
SELECT InvoiceID
FROM [Sales].[Invoices]
WHERE [CustomerPurchaseOrderNumber]=N'17913'
Large Number of In Predicate
For our next query anti-pattern we will focus on an IN statement that has too many values.
/* Long List of IN Statements.
Do we get alerts for both IN statements and Convert? */
SELECT *
FROM Sales.Invoices
WHERE CustomerPurchaseOrderNumber IN (12126,15342,12211,17129,10369,
13383,17913,14518,17577,18030,17639,10941,16374,17521,11911,10690,
10255,13260,16825,12227,16070,16793,12467,13700,11503,13916,17783,
13042,10411,15084,14159,11389,12559,19457,18468,14486,19057,18036,
13079,19402,15267,12781,17334,16645,17591,14618,19857,17849,10947,
11610,14853,10821,15491,14474,10079,11612,11991,19321,10210,14970,
17468,11643,16411,19525,10996,16474,15832,12134,13630,14902,15212,
19609,19809,13551,12805,18797,11553,11156,18583,19122,11615,12262,
10310,19254,13954,13960,14630,18585,18834,18451,19842,18893,13004,
19890,12941,13508,15403,19391,17938,19140,16362,18795,16388,18492,
15272,14938,13868,18901,15236,19782,11134,10182,18651,14169,18514,
18108,17328,13949,12727,17486,13783,15165,17627,13215,18992,10815,
17466,12153,16275,10695,14744,12253,13328,18450,12232,19473,14441,
19357,19208,14194,14468,16930,14270,13096,19458,13731,14609,17253,
13190,16905)
A good question you might be wondering is which query anti-pattern will this query return. Is it the previous TypeConvertPreventingSeek or LargeNumberOfOrINPredicate?
If you cleared the Live Data Viewer for your extended event you will notice that its actually both anti-patterns. In the screen shot below I put a box around the query_hash so you know its the same unique query text. I also include the session_id and timestamp as well.
Fix Large Number of In Predicate
Now there are multiple ways on how you can convert this massive IN clause into joining to a Temp Table, Table Variable, Table-Valued Parameters, etc. I will leave up which one is best for another day. Today, we are going to use the Table-Valued Parameter.
NOTE: I am creating and dropping the table type in the demo code because this is a demo. Usually, you might have a table type per common data type filter you might use.
SECOND NOTE: Hopefully, you will see we also converted the list of integers into a list of NVARCHAR values. This also removes the Implicit Conversion query anti-pattern.
/* How to fix this anti-pattern? */
DROP TYPE IF EXISTS KeyList;
GO
CREATE TYPE KeyList AS TABLE (
KeyValue NVARCHAR(50)
)
GO
DECLARE @Values KeyList
INSERT INTO @Values VALUES (N'12126'),(N'15342'),(N'12211'),(N'17129'),(N'10369'),(N'13383'),(N'17913'),(N'14518'),(N'17577'),(N'18030'),(N'17639'),(N'10941'),(N'16374'),(N'17521'),(N'11911'),(N'10690'),(N'10255'),(N'13260'),(N'16825'),(N'12227'),(N'16070'),(N'16793'),(N'12467'),(N'13700'),(N'11503'),(N'13916'),(N'17783'),(N'13042'),(N'10411'),(N'15084'),(N'14159'),(N'11389'),(N'12559'),(N'19457'),(N'18468'),(N'14486'),(N'19057'),(N'18036'),(N'13079'),(N'19402'),(N'15267'),(N'12781'),(N'17334'),(N'16645'),(N'17591'),(N'14618'),(N'19857'),(N'17849'),(N'10947'),(N'11610'),(N'14853'),(N'10821'),(N'15491'),(N'14474'),(N'10079'),(N'11612'),(N'11991'),(N'19321'),(N'10210'),(N'14970'),(N'17468'),(N'11643'),(N'16411'),(N'19525'),(N'10996'),(N'16474'),(N'15832'),(N'12134'),(N'13630'),(N'14902'),(N'15212'),(N'19609'),(N'19809'),(N'13551'),(N'12805'),(N'18797'),(N'11553'),(N'11156'),(N'18583'),(N'19122'),(N'11615'),(N'12262'),(N'10310'),(N'19254'),(N'13954'),(N'13960'),(N'14630'),(N'18585'),(N'18834'),(N'18451'),(N'19842'),(N'18893'),(N'13004'),(N'19890'),(N'12941'),(N'13508'),(N'15403'),(N'19391'),(N'17938'),(N'19140'),(N'16362'),(N'18795'),(N'16388'),(N'18492'),(N'15272'),(N'14938'),(N'13868'),(N'18901'),(N'15236'),(N'19782'),(N'11134'),(N'10182'),(N'18651'),(N'14169'),(N'18514'),(N'18108'),(N'17328'),(N'13949'),(N'12727'),(N'17486'),(N'13783'),(N'15165'),(N'17627'),(N'13215'),(N'18992'),(N'10815'),(N'17466'),(N'12153'),(N'16275'),(N'10695'),(N'14744'),(N'12253'),(N'13328'),(N'18450'),(N'12232'),(N'19473'),(N'14441'),(N'19357'),(N'19208'),(N'14194'),(N'14468'),(N'16930'),(N'14270'),(N'13096'),(N'19458'),(N'13731'),(N'14609'),(N'17253'),(N'13190'),(N'16905')
select *
FROM Sales.Invoices
JOIN @Values Tst on (Tst.KeyValue = CustomerPurchaseOrderNumber)
GO
Query Anti-Pattern Wrap Up
It’s now easier than before, to find your top offending anti-patterns. Just create the extended event and aggregate the results.
If you would like some help making your queries go faster contact us. We also do health checks where you get results. We guarantee value provided or you get a 100% refund!
Also, if you liked this tip join our newsletter so you can get more free videos and tips.
Tuning select statements is less complicated than tuning update statements because your code changes do not change data in your tables. Tuning update statements on the other hand can be tricky as you are changing data while you are testing. See multiple ways on how you can streamline tuning update statements to quickly get to your starting point to make sure you are comparing apples to apples.
You are out doing your scheduled sweeps of the environment and you notice a huge I/O consumer is one of your update procedures. Your initial thought may be to panic because how can you test data that is changing? You have several methods to tuning queries that adjust data that allow data to revert between tests.
NOTE: We always recommend tuning all queries in non-production until we have our final tested solution to implement in production.
Tuning Update Statement with Rollback Transaction
One of the more basic methods for executing queries without changing data is to wrap the statements are you running in a BEGIN TRANSACTION and ROLLBACK TRANSACTION. This allows everything to execute and give you the performance statistics you are looking for, while leaving the data intact to allow for repeat testing.
Tuning Update Statements using Database Backups
You have a couple options with your standard SQL Server backups to assist in reverting changes made to the testing environment and the same concepts are applied to your basic DR strategy. If the test is short, you can take a full backup before making a change and restore that backup to restart a test.
If you are making a series of changes over a period of time and don’t want to completely restart your test, you can utilize point in time recovery. Once your database is in full recovery, you can start with a full backup and take log backups at set intervals or before different pieces of your change. This allows you to restore to a specific point in your test.
Tuning Update Statements with Database Snapshot
Database Snapshots are a great way to quickly revert large databases to an initial state. Database Snapshots keep a record of database pages as they change and the copy of the page so those extents can be reverted instead of restoring the whole database. Therefore, they require the associated source database as it is not a full copy of the database. Snapshots contain the original version of files of the source database and a pointer file that shows the changes since the snapshot was taken, called a sparse file. The source database must be available for snapshots to be usable. This makes them not an ideal DR solution, but great for analysis in auditing and in tuning situations.
Summary
There are many ways to quickly get back to your original state to allow for quick repetitive testing for queries that change data. A huge benefit of the methods I listed today allow you to also document where transactions and backups occur so you can open a .sql file and have all the needed steps in one document. Do you like other ways to revert changes? Maybe temporal tables? I’d be happy to cover that in a post in the near future. Tell me all about your methods in the comments below!
The post Tuning Basics – Updating an Update appeared first on dbWonderKid.
SQL Server 2022 was announced yesterday at Microsoft’s Ignite Conference. Here are some of the items that caught our attention. This includes SQL Server 2022 Announcements and other announcements around the Microsoft Data Platform.
SQL Server 2022 First Thoughts
The following is a summary of some of my favorite announcements with some initial thoughts.
Bi-directional HA/DR to Azure SQL – With SQL Server 2022, you can use Distributed Availability Groups to have HA/DR replica in Azure Managed Instance. You can failover your Read/Write Traffic. You have to do a backup and a restore into SQL Server 2022 to go back from the Azure Managed Instance. Hopefully, Distributed Availability Group failovers will be bi-directional in the near future.
Query Store Enhancements (QDS) – SQL Server 2022 allows you to have separate data for your readable replicas. QDS will now be on by default for new databases. QDS provides the ability to utilize hints to easily shape query plans with no code changes.
Intellectual Query Processing Improvements (IQP)- IQP has new enhancements to memory grant feedback to persist data instead of requiring changes only to happen when queries are executed frequently.IQP has more improvements without changing your code. These improvements include changes with MAXDOP and CE Feeback. The query can detect and change how many threads are used so large tables can have more threads as needed. Saving the best for last. Progress to help smooth out parameter sniffing will also be included in SQL 2022. Looking forward to seeing how this is like Automatic Tuning Automatic Plan Correction introduced in SQL Server 2017.
Free SQL Assessments for Azure VMs – If you are using an Azure VM to host an instance of SQL Server you might be interested in the new free assessment offering coming out of Ignite.
New Class Ebdsv5 Virtual Machines – Now 30% better price on performance compared to the previous greatest option. Remote I/O Throughput goes up to 2,500 MBPS so less memory could be required due to higher read throughput.
SQL Server 2022 integrates with Azure Synapse Link and Azure Purview – Being able to manage data discovery within both on-premise and the cloud is a great win! The same could be said for breaking down silos to help you get the most value out of your data.
SQL Server Ledger – Looks like the Azure SQL Database Ledger which was announced in May of this year is coming to SQL Server 2022. Always excited to see the most secure relational database platform increase its security features., Microsoft is bringing the same blockchain capabilities to SQL Server.
SQL Server 2022 What Is Next?
If you want to get an early jump on SQL Server 2022 you can apply to enter the Early Adoption Program (EAP). I also recommend signing up for PASS Member Summit Conference that starts next week. I will be giving a 101 Fundamentals talk on Performance Tuning. It’s 100% FREE to attend.
What is your favorite feature of SQL Server 2002? What do you wish was added to the feature list? Let us know in the comments.
What is a heap?
Heap was actually named pretty accurately. It is a “pile” of unsorted/unorganized data. A heap will exist on any table that lacks a clustered index. Heaps can exist even if you have a non clustered indexes. The main advantage to a heap is the speed at which you can insert data into your tables, as it doesn’t need to follow a logical order like inserting identity values into the last page of a table.
What is a clustered index?
A clustered index is used as the main method of logically sorting data in a table. It can be done on a key-value defined in the table, but it doesn’t require an actual primary key to exist. Clustered indexes should be created on the most common column(s) used in the predicates of your most frequently executed statements. All data will be sorted to match the index and you can only have 1 clustered index per table. The main advantage to having a clustered index is increased speed on data reads.
How should I identify when a clustered index should be used instead of a heap?
As noted above, the main advantage of a clustered index is an increase in the speed of your reads. To get an idea of where you might need to increase read speed, you should first identify all of your tables that do not have a clustered index. I found it very helpful to also place lower limits on the number of rows in a table and the number of reads against that table (by checking dynamic system views). The below query is a great start into looking at those things.
DECLARE @type_of_index INT = 0 /*0 = heap, 1 = clustered, 2 = non clustered*/ , @min_num_of_reads INT = 10 , @min_num_of_rows INT = 100 SELECT OBJECT_NAME(IX.OBJECT_ID) AS Table_Name ,SCH.name AS SchemaName ,IX.name AS Index_Name ,IX.type_desc AS Index_Type ,PS.row_count AS TableRowCount ,SUM(PS.[used_page_count]) * 8 IndexSizeKB ,IXUS.user_seeks AS NumOfSeeks ,IXUS.user_scans AS NumOfScans ,IXUS.user_seeks + user_scans AS NumOfReads ,IXUS.user_lookuPS AS NumOfLookuPS ,IXUS.user_updates AS NumOfUpdates ,IXUS.last_user_seek AS LastSeek ,IXUS.last_user_scan AS LastScan ,IXUS.last_user_lookup AS LastLookup ,IXUS.last_user_update AS LastUpdate FROM sys.indexes IX INNER JOIN sys.objects O ON ix.object_id = O.object_id INNER JOIN sys.schemas SCH ON SCH.schema_id = O.schema_id INNER JOIN sys.dm_db_index_usage_stats IXUS ON IXUS.index_id = IX.index_id AND IXUS.OBJECT_ID = IX.OBJECT_ID INNER JOIN sys.dm_db_partition_stats PS ON PS.object_id = IX.object_id AND PS.index_id = IX.index_id WHERE OBJECTPROPERTY(IX.OBJECT_ID,'IsUserTable') = 1 AND IX.index_id = @type_of_index AND PS.index_id = @type_of_index AND IXUS.index_id = @type_of_index AND IXUS.database_id = (DB_ID()) AND IXUS.user_scans + IXUS.user_seeks > @min_num_of_reads AND PS.row_count > @min_num_of_rows GROUP BY OBJECT_NAME(IX.OBJECT_ID), IX.name, IX.type_desc, IXUS.user_seeks, IXUS.user_scans, IXUS.user_lookuPS, IXUS.user_updates, last_user_seek, IXUS.last_user_scan, IXUS.last_user_lookup, IXUS.last_user_update, SCH.name, PS.row_count ORDER BY TableRowCount DESC, NumOfReads DESC
Now that you see what tables are on the larger side that also has a lot of reads, we can dive deeper into the MSSQL system dynamic management views to see how these tables are being used in the system by looking at query stats and plans. Enter your table name in the variable and you can see the number of times a plan has been used, the text associated to that plan, and the plan itself to validate if the query is using a table scan due to the heap or if there is a non clustered index that is being used instead. The 2nd result set will give you object names if the table is being used in a SQL object like a stored procedure or function
DECLARE @TableName NVARCHAR(50) = 'ExampleTableName' SELECT TOP 10 databases.name AS DBName , dm_exec_sql_text.text AS TSQL_Text, , dm_exec_query_stats.creation_time, , dm_exec_query_stats.execution_count, , dm_exec_query_stats.total_worker_time AS total_cpu_time, , dm_exec_query_stats.total_elapsed_time, , dm_exec_query_stats.total_logical_reads, , dm_exec_query_stats.total_physical_reads, , dm_exec_query_plan.query_plan FROM sys.dm_exec_query_stats CROSS APPLY sys.dm_exec_sql_text(dm_exec_query_stats.plan_handle) CROSS APPLY sys.dm_exec_query_plan(dm_exec_query_stats.plan_handle) INNER JOIN sys.databases ON dm_exec_sql_text.dbid = databases.database_id WHERE dm_exec_sql_text.text LIKE '%'+@TableName+'%' ORDER BY ; SELECT * FROM sys.sql_modules M INNER JOIN sys.objects O ON M.object_id = O.object_id WHERE M.definition LIKE '%'+@TableName+'%'
After reviewing use cases and their query plans, it’s time to decide if the table in question meets the above recommendations for a clustered index or if it’s better served as a heap, and what columns to include in the index. If the table has multiple use cases but they all typically share the same column/s, create the clustered index on that table with those column/s.
Now don’t go throwing clustered indexes around like Oprah to the studio audience
The clustered index is helpful in a lot of situations, but there are some instances where it is not necessary or can actually do some harm to your performance.
The main example of this would be logging tables. These tables typically have a lot of insert operations, but not necessarily a lot of reads or even updates as they are just logging everything that is happening and are referenced infrequently. Putting an index on a table like that could cause hot latches which are caused by data waiting to be inserted into the last available page as the key is typically sequential and other data is frequently being inserted into that page. The only exception to this rule would be if your key column of the index is a GUID, as the GUID is random and not sequential.
Watch out for making your clustered index on columns/data that is typically not static. You would update these key values frequently. Updating key values on an index will likely cause performance issues. The clustered index is used as a lookup point for all non clustered indexes. By updating the key values, you can cause page splits. Page splits will require index maintenance to fix your problem and return performance to where it belongs.
Another example of a common mistake is making the clustered index with too many columns. A clustered index is defining your default sort of all data in the table. Too many columns in an index could slow down data operations by requiring resorting for additional use cases and it will also increase the size of all non clustered indexes on that table as well (and there is a limit to how wide a key can be for an index).
Finally, never forget the most important part of adding indexes: TEST TEST and TEST.
Post a comment below if you have any other tips or tricks when it comes to fixing your heap woes below!
Want to save money, validate performance, and make sure you don’t have errors while migrating to Azure SQL Database, Azure SQL Managed Instance, SQL Server RDS in Amazon AWS? In this video, you will learn how to use the Data Experimentation Assistant to replay and compare your on-premise or cloud SQL Server workloads on-demand. First, you will learn how to capture a workload on-premise or in the cloud. Next, you will master replaying your workload on-demand as needed. Finally, you will do an analysis of comparing a baseline workload and another workload with your changes.
Replay in the Cloud Video
Slidedeck
Recommended Reads
If you liked this video, check out the following other resources.
Today, I want to focus on a free tool that everyone can use when benchmarking queries or performance tuning their queries. This tool is Plan Explorer by SentryOne (I think I will still always know them as SQLSentry).
Most people use Management Studio (SSMS) when they are benchmarking queries and improving the performance of their queries. It makes sense because this is the same tool most people use to develop their queries. I have found it to be priceless to have a history log of all my changes. Therefore, I can quickly see how each change impacts the logical reads, duration, CPU, and the execution plan during all of my code changes for the query.
If you are new to benchmarking and improving queries, go ahead and start the video from the beginning. I show you how you can get the actual execution plan, duration, CPU, and reads in SSMS. If you want to see how I track changes via history feature of Plan Explorer, go ahead and skip to 3:36 part in the eight-minute video.
Quickly track the performance changes of your code tuning with Plan Explorer’s history tracking feature.
If you enjoyed this tip go ahead and join our newsletter for free tips and videos.
This month’s T-SQL Tuesday is hosted by Tracy Boggiano. Tracy invites us all to write about adopting Query Store. Today, I wanted to share my favorite but a very unique way I use the Query Store for Workload Replays.
You can read more about the invite in detail by clicking on the T-SQL Tuesday logo in this post.
Today, I wanted to talk about my least favorite part of replaying workloads. It’s having an extended event or server-side trace running during a workload replay only so we can compare the results at a query-level when the replay is finished. Now, this might seem like a trivial thing but when you have workloads over 10k batch requests/sec this can consume terabytes of data quickly. The worst part is waiting to read all the data, slice and dice the data for analysis.
Starting with SQL Server 2016 there is a better and faster way to go! You can replace your extended event or server-side trace with Query Store captured data. Today, I will show you how to use the Query Store for the same purpose.
Different Settings
Keep in mind our goal here is very different from the typical use case for using the Query Store. We want to capture metrics for all the queries executed during a workload replay. Nothing more and nothing less.
If we have the runtime results for multiple replays we can then easily compare the workload performance between the workload replays.
Most of our changes from the regular Query Store best practices are shown below:
- Max Size (MB) – Need to make sure there is enough space to capture your whole workload. This size will vary by how much workload is being replayed.
- Query Store Capture Mode set to All. Normally, not ideal, but remember we want to capture metrics for our whole workload being replayed.
- Size Based Cleanup Mode set Off – Yup, we don’t want to lose our workload data that is capture until we persist in our ideal form. More on this later.
The Capture Process
Now, this is where you would use Database Experimentation Assistant (DEA), Distributed Replay or some other process to replay your consistent workload in an isolated non-production environment. This subject we will cover in another future post. For now, we will just have two replays called “Baseline” and “Change”. This simulates a baseline replay with no schema changes and then another change replay with a change introduced in the schema.
To capture our workload we just enable the Query store with our settings mentioned above and also clear out the query store right before our workload replay starts to help ensure we are just capturing our workload.
USE [master]
GO
ALTER DATABASE [YourDatabase] SET QUERY_STORE = ON
GO
ALTER DATABASE [YourDatabase] SET QUERY_STORE (OPERATION_MODE = READ_ONLY, MAX_STORAGE_SIZE_MB = 10000, QUERY_CAPTURE_MODE = AUTO, SIZE_BASED_CLEANUP_MODE = OFF)
GO
ALTER DATABASE [YourDatabase] SET QUERY_STORE CLEAR
GO
Stop Capturing Query Store Data
Once your replay is finished we will want to disable the query store from writing data into the query store. We want the least amount of non-workload data inside of the Query Store when we are using it for the sole purpose of comparing workloads.
USE [master]
GO
ALTER DATABASE [YourDatabase] SET QUERY_STORE (OPERATION_MODE = READ_ONLY)
GO
Prepare Query Store Data for Long-Term Analysis
Now for smaller workloads, one might be happy with utilizing DBCC CLONEDATABASE to have a schema-copy of their workload with Query Store data persisted. This is perfectly fine. With bigger workloads being captured I have noticed there are ways to improve the performance of query store when doing analysis of the query store data. For example, clustered columnstore indexes can be very helpful for performance and compacity. Therefore, I like to have a schema for each replay and import the data. The following is a quick example of setting up a schema for a “baseline” replay and a “change” replay.
CREATE DATABASE [DBA]
GO
use [DBA]
GO
CREATE SCHEMA Baseline;
GO
CREATE SCHEMA Compare;
GO
Next, we will import our captured data from our baseline replay that’s in our read-only query store database. I also like to have a baked-in aggregate of metrics for reads, writes, duration and CPU at the query level.
use [YourDatabase]
GO
/* Load Data */
SELECT * INTO DBA.Baseline.query_store_runtime_stats FROM sys.query_store_runtime_stats; SELECT * INTO DBA.Baseline.query_store_runtime_stats_interval from sys.query_store_runtime_stats_interval; select * INTO DBA.Baseline.query_store_plan from sys.query_store_plan; select * INTO DBA.Baseline.query_store_query
from sys.query_store_query; select * INTO DBA.Baseline.query_store_query_text
from sys.query_store_query_text;
/* Addition for SQL 2017 */
select * INTO DBA.Baseline.query_store_wait_stats from sys.query_store_wait_stats use [DBA]
GO SELECT SUM(Count_executions) AS TotalExecutions,
SUM(Count_executions*avg_duration) AS TotalDuration,
SUM(Count_executions*avg_logical_io_reads) AS TotalReads,
SUM(Count_executions*avg_logical_io_writes) AS TotalWrites,
SUM(count_executions*avg_cpu_time) AS TotalCPU,
query_hash
INTO Baseline.QueryResults
FROM Baseline.query_store_runtime_stats rs
JOIN Baseline.query_store_plan p ON rs.plan_id = p.plan_id
JOIN Baseline.query_store_query q ON p.query_id = q.query_id
GROUP BY q.query_hash
Next, we would reset the database to our starting position and add our query store settings as mentioned above in this blog post and replay or workload again. This time, we would dump our data into the “change” schema
use [YourDatabase]
GO
/* Load Data */
SELECT * INTO DBA.Compare.query_store_runtime_stats FROM sys.query_store_runtime_stats; SELECT * INTO DBA.Compare.query_store_runtime_stats_interval from sys.query_store_runtime_stats_interval; select * INTO DBA.Compare.query_store_plan from sys.query_store_plan; select * INTO DBA.Compare.query_store_query
from sys.query_store_query; select * INTO DBA.Compare.query_store_query_text
from sys.query_store_query_text; select * INTO DBA.Compare.query_store_wait_stats from sys.query_store_wait_stats use [DBA]
GO SELECT SUM(Count_executions) AS TotalExecutions,
SUM(Count_executions*avg_duration) AS TotalDuration,
SUM(Count_executions*avg_logical_io_reads) AS TotalReads,
SUM(Count_executions*avg_logical_io_writes) AS TotalWrites,
SUM(count_executions*avg_cpu_time) AS TotalCPU,
query_hash
INTO Compare.QueryResults
FROM Compare.query_store_runtime_stats rs
JOIN Compare.query_store_plan p ON rs.plan_id = p.plan_id
JOIN Compare.query_store_query q ON p.query_id = q.query_id
GROUP BY q.query_hash
Comparing Workload Results
Now that we have our two workloads imported we can now compare to see how the workload changed per query. I will break this down into two quick steps. First, get deltas per query. Second, get totals for how many times a query might be different in the query store. More on this a little later in the post.
/* Query Store Results */
use [DBA]
GO SELECT DISTINCT c.TotalExecutions - b.TotalExecutions AS ExecutionDelta,
c.TotalExecutions AS CompareExecutions,
b.TotalExecutions AS BaselineExecutions,
c.TotalDuration - b.TotalDuration AS DurationDelta,
c.TotalCPU - b.TotalCPU AS CPUDelta,
c.TotalReads - b.TotalReads AS ReadDelta,
c.TotalWrites - b.TotalWrites AS WriteDelta,
c.TotalReads AS CompareReads,
b.TotalReads AS BaselineReads,
c.TotalCPU AS CompareCPU,
b.TotalCPU AS BaselineCPU,
c.TotalDuration AS CompareDuration,
b.TotalDuration AS BaselineDuration,
c.query_hash
--q.query_sql_text
INTO #CTE
FROM Baseline.QueryResults b
JOIN Compare.QueryResults c ON b.query_hash = c.query_hash select COUNT(query_sql_text) AS QueryCount, MAX(query_sql_text) query_sql_text, MIN(query_id) MinQueryID, qsq.query_hash
INTO #Compare
from Compare.query_store_query qsq
JOIN Compare.query_store_query_text q ON qsq.query_text_id = q.query_text_id where qsq.is_internal_query = 0
GROUP BY query_hash select COUNT(query_sql_text) AS QueryCount, MAX(query_sql_text) query_sql_text, MIN(query_id) MinQueryID, qsq.query_hash
INTO #Baseline
from Baseline.query_store_query qsq
JOIN Baseline.query_store_query_text q ON qsq.query_text_id = q.query_text_id where qsq.is_internal_query = 0
GROUP BY query_hash select cte.*
, a.QueryCount AS Compare_QueryCount
, b.QueryCount AS Baseline_QueryCount
, a.MinQueryID AS Compare_MinQueryID
, b.MinQueryID AS Baseline_MinQueryID
, a.query_sql_text
FROM #CTE cte JOIN #Compare a on cte.query_hash = a.query_hash
JOIN #Baseline b on cte.query_hash = b.query_hash
WHERE 1=1
AND ExecutionDelta = 0
ORDER BY ReadDelta ASC
Query Store for Workload Replays
Lessons Learned Along the Way!
Initially, working with the query store I thought query_id was going to be my best friend. I quickly learned that my old friend query_hash is more helpful for multiple reasons. One, I can easily compare queries between different replays. That’s right now all workload replays get you the same query_id even when the workload is the exact same being replayed. Two, I can compare them with different databases as well. Finally, query_hash is very helpful with ad-hoc workloads as I can aggregate all the different query_ids that have the same query hash.
Need Performance Tuning Help?
Let us help you! Schedule a free 30-minute chat to see if we can point you the right direction. If you liked this blog post signup for our free newsletter! Also, take advantage of our free training videos!.
The post Query Store for Workload Replays appeared first on SQL Server Consulting & Remote DBA Service.
Microsoft made SQL Server 2019 Generally Available this week we want to share some videos and code examples of our favorite new features. Most of these will make your code go faster without any code changes!
We have been testing SQL Server 2019 for months and hope you enjoy these features as much as we do!
The post 5 Game Changers with SQL Server 2019 appeared first on SQL Server Consulting & Remote DBA Service.
While not all implicit conversions are equal implicit conversions matter. For example, just this week I ran into a trivial code change that caused some severe regression in performance. One simple filter change caused a query to consume 227 million reads compared to 212 reads. What did the filter do? Well, it forced an implicit conversion.
In this blog post you will learn what is an implicit conversion, how to identify it, and also how do you find the implicit conversion leading to the highest amount of reads in the query store.
What is a SQL Server Implicit Conversion?
This is a great question. The answer is when SQL Server optimizer has to convert data in a column to match the type you are using in your query. For example, if you had a column that was an Integer and you decided to filter off of a string “varchar” variable. ORM’s like Entity Framework is notoriously known for this. You would end up forcing an implicit conversion. This will force the optimizer to read every value in the column and cast it so the column can then be filtered based off your variable or another column that doesn’t match the same data type.
Let’s look at an example. Yes, this is exactly similar to the example mentioned in the first paragraph that consumed 227 million reads. We are just going to simplify it down for you with a copy of the AdventureWorks modified database.
SalesOrderID column is an INT, not VARCHAR. We are going to filter with a VARCHAR data type using the LIKE operator as shown below.
SELECT [SalesOrderID] ,[SalesOrderDetailID] ,[CarrierTrackingNumber] ,[OrderQty] ,[ProductID] ,[SpecialOfferID] ,[UnitPrice] ,[UnitPriceDiscount] ,[LineTotal] ,[rowguid] ,[ModifiedDate] FROM [AdventureWorks2012].[Sales].[SalesOrderDetailEnlarged] WHERE SalesOrderID LIKE '43659'
This will produce the following execution plan and statistics which is not ideal at all. We will get a warning for the implicit conversion and also go parallel scanning the whole table due to this conversion.
How to fix Implicit Conversions in SQL Server
Now that we know how to identify an implicit conversion I bet you might be wondering how do you resolve them? They usually come in two flavors. First, the columns do not match data types on a join. Second, the parameter or literals data type does not match with the column. Today, we will focus on the second time in the code below.
The only thing we change in our code is the type of filter.
SELECT [SalesOrderID] ,[SalesOrderDetailID] ,[CarrierTrackingNumber] ,[OrderQty] ,[ProductID] ,[SpecialOfferID] ,[UnitPrice] ,[UnitPriceDiscount] ,[LineTotal] ,[rowguid] ,[ModifiedDate] FROM [AdventureWorks2012].[Sales].[SalesOrderDetailEnlarged] WHERE SalesOrderID = 43659
How do I get Implicit Conversions from the Query Store?
Now that you understand implicit conversions you might wonder how do I find them? Well, hopefully, you are on SQL Server 2016 or higher and can query the execution plans in query store. If not, check out Jonathan Kehayias example to find implicit conversions in the plan cache.
Disclaimer: Plan cache will only have execution plans that are still in memory. Query store will only have execution plans that are in the query store. Also, I added variables for the start date, end date, and top offenders. Performance milage will vary based on how your query store is configured and used. run the query below at your own risk!
DROP TABLE IF EXISTS #plans DROP TABLE IF EXISTS #tmp1 DROP TABLE IF EXISTS #tmp2 DROP TABLE IF EXISTS #finalplan GO DECLARE @StartDate datetime = dateadd(d,-1, getdate()), @EndDate datetime = getdate(), @TopStmts INT = 1000 SELECT TOP (@TopStmts) SUM(qrs.count_executions) * AVG(qrs.avg_logical_io_reads) as est_logical_reads, SUM(qrs.count_executions) AS sum_executions, AVG(qrs.avg_logical_io_reads) AS avg_logical_io_reads, SUM(qsq.count_compiles) AS sum_compiles, qsp.plan_id INTO #tmp1 FROM sys.query_store_query qsq JOIN sys.query_store_plan qsp on qsq.query_id=qsp.query_id JOIN sys.query_store_runtime_stats qrs on qsp.plan_id = qrs.plan_id WHERE qrs.first_execution_time >= @StartDate AND qrs.last_execution_time <= @EndDate group by qsp.plan_id ORDER BY est_logical_reads DESC SELECT cte.*, TRY_CAST (qsp.query_plan AS XML) AS xml_query_plan INTO #tmp2 FROM #tmp1 cte JOIN sys.query_store_plan qsp ON cte.plan_id = qsp.plan_id SELECT #tmp2.* INTO #plans FROM #tmp2; SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED DECLARE @dbname SYSNAME SET @dbname = QUOTENAME(DB_NAME()); ;WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') SELECT p.est_logical_reads, p.sum_executions, p.avg_logical_io_reads, p.plan_id , stmt.value('(@StatementText)[1]', 'varchar(max)') AS SQLStmt, t.value('(ScalarOperator/Identifier/ColumnReference/@Schema)[1]', 'varchar(128)') AS SchemaName, t.value('(ScalarOperator/Identifier/ColumnReference/@Table)[1]', 'varchar(128)') AS TableName, t.value('(ScalarOperator/Identifier/ColumnReference/@Column)[1]', 'varchar(128)') AS ColumnName, ic.DATA_TYPE AS ConvertFrom, ic.CHARACTER_MAXIMUM_LENGTH AS ConvertFromLength, t.value('(@DataType)[1]', 'varchar(128)') AS ConvertTo, t.value('(@Length)[1]', 'int') AS ConvertToLength, p.xml_query_plan INTO #finalplan FROM #plans p CROSS APPLY xml_query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt) CROSS APPLY stmt.nodes('.//Convert[@Implicit="1"]') AS n(t) CROSS APPLY stmt.nodes('.//PlanAffectingConvert[@ConvertIssue="Cardinality Estimate"]') AS ce(pt) JOIN INFORMATION_SCHEMA.COLUMNS AS ic ON QUOTENAME(ic.TABLE_SCHEMA) = t.value('(ScalarOperator/Identifier/ColumnReference/@Schema)[1]', 'varchar(128)') AND QUOTENAME(ic.TABLE_NAME) = t.value('(ScalarOperator/Identifier/ColumnReference/@Table)[1]', 'varchar(128)') AND ic.COLUMN_NAME = t.value('(ScalarOperator/Identifier/ColumnReference/@Column)[1]', 'varchar(128)') WHERE t.exist('ScalarOperator/Identifier/ColumnReference[@Database=sql:variable("@dbname")][@Schema!="[sys]"]') = 1 SELECT fp.est_logical_reads, fp.sum_executions, fp.avg_logical_io_reads, fp.plan_id, qsq.query_id, qsq.query_hash, fp.SQLStmt, fp.SchemaName, fp.TableName, fp.ColumnName, fp.ConvertFrom, fp.ConvertTo, fp.ConvertFromLength, fp.ConvertToLength, fp.xml_query_plan FROM #finalplan fp JOIN sys.query_store_plan qsp ON fp.plan_id = qsp.plan_id JOIN sys.query_store_query qsq ON qsp.query_id = qsq.query_id ORDER BY fp.est_logical_reads DESC