Monday, September 16, 2013

Performance Tuning with Compression

One lesser known trick for performance tuning is compression. Wait, what? Isn't compression all about saving space? Yes, but it also tends to have another pleasant side effect.

SQL Server is typically an IO bound application. That means, IO is almost always your constraining factor. Whenever I am troubleshooting a system, IO is one of the areas that I always take a look at.

Enabling compression will reduce the amount of IO that SQL Server uses to satisfy SELECT queries. Since more data is being stored on each page, it takes less pages to complete a query.

A Quick Demo

We'll use the following simple query as our baseline. Run the query and then take a look at the number of IOs. To see this click on the Messages Tab after you run the query.
-- example query
use AdventureWorksDW2012;

set statistics io on;

select ProductKey, 
 count(*) AS QuantitySold,
 sum(SalesAmount) AS TotalSales
from dbo.FactInternetSales
group by ProductKey
order by ProductKey;
go

Results

IO Baseline
Here's our baseline. We had 1,240 reads on the FactInternetSales table. Now, let's enable Row Compression and re-run the query.
-- turn on row compression
use AdventureWorksDW2012;

alter table dbo.FactInternetSales rebuild partition = all
 with (data_compression = row);
go

Results

Row Compression
Here you can see the IOs were cut in half, 656 reads from FactInternetSales. Last, turn on Page Compression and run the query one more time.
-- turn on page compression
use AdventureWorksDW2012;

alter table dbo.FactInternetSales rebuild partition = all
 with (data_compression = page);
go

Results

Page Compression
Now we have less than a quarter of the original IOs. Only 292 reads from FactInternetSales. Looks good to me.

There's No Such Thing as a Free Lunch

One thing to keep in mind is that compression will increase your CPU usage. In practice, I have usually found this to be in the range of one to three percent. That said, if you are currently experiencing CPU issues with your server, it would behoove you to address that first. Oddly enough, quite often I find that CPU problems are being driven by IO problems. So be sure to tune your queries and check your indexes.

Row Compression versus Page Compression

There are two types of compression available for use with SQL Server; Row Compression and Page Compression.

Row Compression stores fixed data type columns using a variable length format. Page Compression adds to that by incorporating Prefix and Dictionary Compression to the mix. Page Compression works very well when you have lots of repeating values in your tables. Like a Data Warehouse...

Generally speaking, I recommend using Row Compression with OLTP databases, and using Page Compression with Data Warehouses.

Now this doesn't mean you should blindly enable compression for all tables and indexes on all of your databases. Do a little analysis first and start small.

Focus on your largest tables first; the ones that are causing you pain. Run some checks and see if those tables would benefit from having compression enabled. Pick your top ten.

The best candidates for compression are tables that are not being updated frequently. So if you have a table that is getting 25% of its rows updated every day, that may not be the best table to compress. As always, you will need to test your servers and workload to see what combination works best for your environment.

Show me the T-SQL

The script below will check all of your tables and indexes. It will report back the current size, current compression methods being used, and an estimation of the space savings you can achieve by using either Row Compression or Page Compression. It runs on SQL 2008, SQL 2012, and SQL 2014 CTP1.
-- Steven Ormrod
-- 7/11/13
-- estimates row and page compression savings on all tables in a database
-- Version: 2.1
-- Source: http://stevenormrod.com/
-- License: Creative Commons. Attribution-NonCommercial CC BY-NC
-- http://creativecommons.org/licenses/by-nc/3.0/

set nocount on;

set transaction isolation level read uncommitted;

declare @currenttable numeric(10, 2);
declare @tablecount  numeric(10, 2);

-- var to hold cursor values
declare @table  varchar(255);
declare @schema  varchar(255);

declare @sql  nvarchar(255);

-- temp table for row compression information
create table #rowcompression (

 TableName   varchar(255),
 SchemaName   varchar(255),
 IndexID    int,
 PartitionNumber  int,
 CurrentSizeKB  bigint,
 RequestedSizeKB  bigint,
 CurrentSampleKB  bigint,
 RequestedSampleKB bigint

);

-- temp table for page compression information
create table #pagecompression (

 TableName   varchar(255),
 SchemaName   varchar(255),
 IndexID    int,
 PartitionNumber  int,
 CurrentSizeKB  bigint,
 RequestedSizeKB  bigint,
 CurrentSampleKB  bigint,
 RequestedSampleKB bigint

);

-- current compression information
select s.name as SchemaName,
 t.name as TableName,
 i.name as IndexName,
 i.index_id,
 p.data_compression_desc as CompressionType
into #currentcompression
from sys.schemas s
 join sys.tables t
  on s.schema_id = t.schema_id
 join sys.indexes i
  on t.object_id = i.object_id
 join sys.partitions p
  on i.object_id = p.object_id
   and i.index_id = p.index_id
order by s.name, t.name, i.name;

select @tablecount = count(*) from sys.tables;

set @currenttable = 0;

-- declare variable for the cursor
declare curTables cursor for
-- sql statement for the cursor
select t.name as TableName, 
 SCHEMA_NAME(t.schema_id) as SchemaName
from sys.tables t
order by SchemaName, TableName;

-- open 'er up
open curTables;

-- load the first row
fetch next from curTables into @table, @schema;

-- loop through the cursor
while @@fetch_status = 0 begin

 -- do some work
 print 'Estimating row and page compression for: ' + @schema + '.' + @table + '...';

 set @sql = 'exec sp_estimate_data_compression_savings ''' + @schema + ''', ''' + @table + ''', null, null, row';

-- print @sql

 insert into #rowcompression
 execute sp_executesql @sql;

 -- estimate page compression
 set @sql = 'exec sp_estimate_data_compression_savings ''' + @schema + ''', ''' + @table + ''', null, null, page';

-- print @sql

 insert into #pagecompression
 execute sp_executesql @sql;

 -- executive update
 set @currenttable = @currenttable + 1;
 print char(9) + 'Percent Complete: ' + cast(cast(@currenttable / @tablecount * 100.0 as numeric(10, 2)) as varchar(255));

 -- advance to the next row
 fetch next from curTables into @table, @schema;

end;

-- build the executive executive summary
select r.SchemaName, r.TableName, 
-- r.IndexID,
 i.name as IndexName,
 r.CurrentSizeKB, 
 c.CompressionType as CurrentCompression,
 r.RequestedSizeKB as RowCompressionKB,
 case r.CurrentSizeKB
  when 0
   then 0
  else
   cast((1.0 - r.RequestedSizeKB / (r.CurrentSizeKB * 1.0)) * 100 as numeric(10, 2))
 end as RowDecreasePercent,
 p.RequestedSizeKB as PageCompressionKB,
 case r.CurrentSizeKB
  when 0
   then 0
  else
   cast((1.0 - p.RequestedSizeKB / (r.CurrentSizeKB * 1.0)) * 100 as numeric(10, 2))
 end as PageDecreasePercent
into #executivesummary
from #currentcompression c
 join #rowcompression r
  on c.SchemaName = r.SchemaName
   and c.TableName = r.TableName
    and c.index_id = r.IndexID
 join #pagecompression p
  on r.TableName = p.TableName
   and r.SchemaName = p.SchemaName
    and r.IndexID = p.IndexID
 join sys.indexes i
  on r.TableName = OBJECT_NAME(i.object_id)
   and r.IndexID = i.index_id
order by r.SchemaName, r.TableName, IndexName;

-- show everything
select * from #executivesummary;

/*

-- if you want to change how reporting is done
-- experiment with the following queries
-- be sure to comment out the original one, above

-- focus on the largest tables
select top 10           -- focus on the biggest bang for the buck
* 
from #executivesummary
where RowDecreasePercent + PageDecreasePercent > 50  -- peel off the bottom layer
order by CurrentSizeKB desc;       -- focus on the larger tables and indexes

-- focus on the largest compression
select top 10           -- focus on the biggest bang for the buck
* 
from #executivesummary
where CurrentSizeKB > 1024
order by RowDecreasePercent + PageDecreasePercent desc;

*/

-- doing the dishes
close curTables;
deallocate curTables;

drop table #currentcompression;
drop table #rowcompression;
drop table #pagecompression;
drop table #executivesummary;

go



Results

Compression Estimates

Compression Estimates Percent Complete
The default is to display everything sorted by Table Name and Index Name. I've included a few other queries in the comments that will let you modify the display to focus on the largest tables, and to show that tables that should receive the highest percentage of compression.

Enjoy!

3 comments:

  1. in the script where

    cast((1.0 - r.RequestedSizeKB / (r.CurrentSizeKB * 1.0)) * 100 as numeric(10, 2))

    it should be

    cast((1.0 - p.RequestedSizeKB / (r.CurrentSizeKB * 1.0)) * 100 as numeric(10, 2))

    ReplyDelete
  2. Hi Mariano, good catch! I'll update the script.

    ReplyDelete