Tuesday, September 24, 2013

An Overview of SQL Server 2014 In-Memory OLTP Hekaton

So you've heard of Hekaton, but what is it, and why do you want it?

Hekaton, or In-Memory OLTP, is an entirely new set of data structures for tables and indexes that is optimized for memory-optimized storage as opposed to disk-based storage.

Hekaton is the code name for In-Memory OLTP, so I will use these two terms interchangeably.

Why is Microsoft doing this? Short version, memory and server are much cheaper now, than they were when SQL Server first launched.

At this point, it is feasible to have enough memory on your server to house the entire database. Even large, one terabyte databases.

However, the query optimizer and its costing rules haven't changes along with this. So, even if you have tons of memory, SQL Server is still making assumptions that it will be reading data off of the disk.

Basic Glossary of Terms

  • Cross-Container Transactions - transactions that use both disk-based tables and memory-optimized tables

  • Disk-Based Tables - plain old normal tables, what you have right now, 8k pages

  • Hekaton - codename for In-Memory OLTP

  • In-Memory OLTP - new architecture and data structures using memory for data storage instead of disks

  • Interop - a TSQL query written against memory-optimized tables

  • Memory-Optimized Tables - tables using new memory data structures to store their data

  • Natively Compiled Stored Procedures - compiled machine code instead of interpreted TSQL, still written in TSQL but with some restrictions


Databases

In order to make use of In-Memory OLTP, you need a database that supports it. It's fairly easy to do this. When you create the database you need a special filegroup with the CONTAINS MEMORY_OPTIMIZED_DATA clause. Additionally, you need to use a Windows BIN2 collation. This can be done at the database, table, or query level.

Tables

To create a Memory-Optimized you use the MEMORY_OPTIMIZED = ON clause. There are several restrictions on column types, but in simple terms no LOB data types are allowed, no whatever(max), and no CLR.

Rows are limited to 8060 bytes with nothing stored off row. The size limitation is enforced at creation, so all of your columns sizes must fit within this limitation.

DML triggers are not allowed, neither are foreign key or check constraints. Love GUIDs? I hope so, because identity columns are out, too.

There are two basic types of Memory-Optimized Tables, SCHEMA_ONLY and SCHEMA_AND_DATA.

SCHEMA_ONLY tables are non-durable. You can put data in there, but in the event of a restart or crash, the table is recreated but your data is gone. This could be useful for storing application session state or for staging tables in a data warehouse.

Indexes

Memory-Optimized Tables can have two types of indexes, Hash Indexes and Range Indexes. All tables must have at least one index, and no more than eight. Also, tables that are defined as SCHEMA_AND_DATA must have a primary key. Indexes are rebuilt each time SQL Server starts up.

A Hash Index is an array of pointer, where each element points to a linked list of rows. The number of elements in the array is controlled by the BUCKET_COUNT clause. In general, you want to set the BUCKET_COUNT to at least the number of unique key values in your table.

If you have too few buckets, then multiple key values will share the same linked list, which will mean longer scans to look for your row. If you have too many, then you will be wasting memory with empty buckets.

Range Indexes are good for when you will be searching for a range of values, or if you are not able to properly estimate the BUCKET_COUNT size. However, Range Indexes are not available in CTP1, so we'll have to wait a bit to learn more about those.

Queries and Stored Procedures

There are two basic methods for querying memory-Optimized Tables. Natively Compiled Stored Procedures or good old-fashioned TSQL, known as Interop, which also includes regular stored procedures.

Natively Compiled Stored Procedures are going to be the fastest. However, they are only able to access Memory-Optimized Tables. If you want to be able to query regular tables along with Memory Optimized Tables, then you will need to use TSQL Interop. There are a variety of restrictions when using TSQL Interop such as MERGE, cross database queries, locking hints, and linked servers.

TSQL Interop allows you to make a gradual migration to In-Memory OLTP. This way you can slow convert a few objects at a time based on which ones will give you the most performance gain.

One Big Caveat

One thing to keep in mind is that tables, indexes, and stored procedures cannot be modified in Hekaton. This means that you will need to drop and re-create these objects in order to make changes. Also, stats have to be rebuilt manually. And then to take advantage of them, the stored procedures would need to be recreated, as well.

Obviously, this is a fairly major restriction. However, I think I can live with this for a version one product. I hope that by the time SQL Server 2015 comes out, there will be an easier way to add a column to a Memory-Optimized Table.

Concurrency

Hekaton offers an improved versioned optimistic concurrency model for memory-Optimized Tables that removes waiting for locks and latches. Explicit transactions are supported using Repeatable Read, Serializable, and Snapshot isolation levels. Read Committed and RCSI are only available with autocommit transactions, with RCSI only if no disk-based tables are involved.

High Availability and Disaster Recovery

All the basics such as backups and restores are available. Additionally, AlwaysOn and Log Shipping are fully supported. Unfortunately, Mirroring and Transactional Replication are not. However, this isn't too much a surprise since SQL Server is definitely pushing AlwaysOn as the new HA/DR solution.

Migration Assistance

The AMR Tool (Analyze, Migration, Reporting) will identify unsupported data types and constraints in tables. It will also recommend which tables and stored procedures should see the most performance improvement by converting to In-Memory OLTP and Memory-Optimized Tables.

Management Data Warehouse offers the Transaction Performance Collection Set, which will help you gather the necessary data in order to let the AMR Tool work its magic.

Wednesday, September 18, 2013

SQL Server 2014 AMR Tool for In-Memory OLTP

SQL Server 2014 has many new features and improvements over SQL Server 2012. One feature that a lot of people are interested in is In-Memory OLTP. Knowing where or how to take advantage of this feature can hold people back from playing with it.

The AMR Tool (Analyze, Migrate, Report) helps you simplify migrations to SQL Server 2014 In-Memory OLTP.

SQL 2014 AMR Tool
The AMR Tool helps you identify which tables and stored procedures will benefit from In-Memory OLTP. If you already have some migration plans, then the AMR Tool can help validate your plans. It will evaluate what needs to be done to migrate your tables and stored procedures.

In order to take advantage of the AMR Tool you will need the following three items:
  • A target database that you want to migrate to SQL Server 2014. This needs to be SQL Server 2008 or higher. So no old-school databases here.

  • A copy of SQL Server 2014 CTP1 Management Studio installed. Note, you do not need a SQL Server 2014 instance or database, just the tools.

  • And last, a Management Data Warehouse with the Transaction Performance Collection Set installed.
Once you have these items setup, you are ready to begin using the AMR Tool to generate recommendations based on the access characteristics of your workload, contention statistics, and CPU usage of stored procedures.

Resources

Benjamin Nevarez has a nice tutorial on using the AMR Tool. Another good resource is the Hekaton whitepaper by Kalen Delaney. If you don't already have SQL Server 2014 CTP1, you can download it here.

Monday, September 16, 2013

Performance Tuning with Compression

One lesser known trick for performance tuning is compression. Wait, what? Isn't compression all about saving space? Yes, but it also tends to have another pleasant side effect.

SQL Server is typically an IO bound application. That means, IO is almost always your constraining factor. Whenever I am troubleshooting a system, IO is one of the areas that I always take a look at.

Enabling compression will reduce the amount of IO that SQL Server uses to satisfy SELECT queries. Since more data is being stored on each page, it takes less pages to complete a query.

A Quick Demo

We'll use the following simple query as our baseline. Run the query and then take a look at the number of IOs. To see this click on the Messages Tab after you run the query.
-- example query
use AdventureWorksDW2012;

set statistics io on;

select ProductKey, 
 count(*) AS QuantitySold,
 sum(SalesAmount) AS TotalSales
from dbo.FactInternetSales
group by ProductKey
order by ProductKey;
go

Results

IO Baseline
Here's our baseline. We had 1,240 reads on the FactInternetSales table. Now, let's enable Row Compression and re-run the query.
-- turn on row compression
use AdventureWorksDW2012;

alter table dbo.FactInternetSales rebuild partition = all
 with (data_compression = row);
go

Results

Row Compression
Here you can see the IOs were cut in half, 656 reads from FactInternetSales. Last, turn on Page Compression and run the query one more time.
-- turn on page compression
use AdventureWorksDW2012;

alter table dbo.FactInternetSales rebuild partition = all
 with (data_compression = page);
go

Results

Page Compression
Now we have less than a quarter of the original IOs. Only 292 reads from FactInternetSales. Looks good to me.

There's No Such Thing as a Free Lunch

One thing to keep in mind is that compression will increase your CPU usage. In practice, I have usually found this to be in the range of one to three percent. That said, if you are currently experiencing CPU issues with your server, it would behoove you to address that first. Oddly enough, quite often I find that CPU problems are being driven by IO problems. So be sure to tune your queries and check your indexes.

Row Compression versus Page Compression

There are two types of compression available for use with SQL Server; Row Compression and Page Compression.

Row Compression stores fixed data type columns using a variable length format. Page Compression adds to that by incorporating Prefix and Dictionary Compression to the mix. Page Compression works very well when you have lots of repeating values in your tables. Like a Data Warehouse...

Generally speaking, I recommend using Row Compression with OLTP databases, and using Page Compression with Data Warehouses.

Now this doesn't mean you should blindly enable compression for all tables and indexes on all of your databases. Do a little analysis first and start small.

Focus on your largest tables first; the ones that are causing you pain. Run some checks and see if those tables would benefit from having compression enabled. Pick your top ten.

The best candidates for compression are tables that are not being updated frequently. So if you have a table that is getting 25% of its rows updated every day, that may not be the best table to compress. As always, you will need to test your servers and workload to see what combination works best for your environment.

Show me the T-SQL

The script below will check all of your tables and indexes. It will report back the current size, current compression methods being used, and an estimation of the space savings you can achieve by using either Row Compression or Page Compression. It runs on SQL 2008, SQL 2012, and SQL 2014 CTP1.
-- Steven Ormrod
-- 7/11/13
-- estimates row and page compression savings on all tables in a database
-- Version: 2.1
-- Source: http://stevenormrod.com/
-- License: Creative Commons. Attribution-NonCommercial CC BY-NC
-- http://creativecommons.org/licenses/by-nc/3.0/

set nocount on;

set transaction isolation level read uncommitted;

declare @currenttable numeric(10, 2);
declare @tablecount  numeric(10, 2);

-- var to hold cursor values
declare @table  varchar(255);
declare @schema  varchar(255);

declare @sql  nvarchar(255);

-- temp table for row compression information
create table #rowcompression (

 TableName   varchar(255),
 SchemaName   varchar(255),
 IndexID    int,
 PartitionNumber  int,
 CurrentSizeKB  bigint,
 RequestedSizeKB  bigint,
 CurrentSampleKB  bigint,
 RequestedSampleKB bigint

);

-- temp table for page compression information
create table #pagecompression (

 TableName   varchar(255),
 SchemaName   varchar(255),
 IndexID    int,
 PartitionNumber  int,
 CurrentSizeKB  bigint,
 RequestedSizeKB  bigint,
 CurrentSampleKB  bigint,
 RequestedSampleKB bigint

);

-- current compression information
select s.name as SchemaName,
 t.name as TableName,
 i.name as IndexName,
 i.index_id,
 p.data_compression_desc as CompressionType
into #currentcompression
from sys.schemas s
 join sys.tables t
  on s.schema_id = t.schema_id
 join sys.indexes i
  on t.object_id = i.object_id
 join sys.partitions p
  on i.object_id = p.object_id
   and i.index_id = p.index_id
order by s.name, t.name, i.name;

select @tablecount = count(*) from sys.tables;

set @currenttable = 0;

-- declare variable for the cursor
declare curTables cursor for
-- sql statement for the cursor
select t.name as TableName, 
 SCHEMA_NAME(t.schema_id) as SchemaName
from sys.tables t
order by SchemaName, TableName;

-- open 'er up
open curTables;

-- load the first row
fetch next from curTables into @table, @schema;

-- loop through the cursor
while @@fetch_status = 0 begin

 -- do some work
 print 'Estimating row and page compression for: ' + @schema + '.' + @table + '...';

 set @sql = 'exec sp_estimate_data_compression_savings ''' + @schema + ''', ''' + @table + ''', null, null, row';

-- print @sql

 insert into #rowcompression
 execute sp_executesql @sql;

 -- estimate page compression
 set @sql = 'exec sp_estimate_data_compression_savings ''' + @schema + ''', ''' + @table + ''', null, null, page';

-- print @sql

 insert into #pagecompression
 execute sp_executesql @sql;

 -- executive update
 set @currenttable = @currenttable + 1;
 print char(9) + 'Percent Complete: ' + cast(cast(@currenttable / @tablecount * 100.0 as numeric(10, 2)) as varchar(255));

 -- advance to the next row
 fetch next from curTables into @table, @schema;

end;

-- build the executive executive summary
select r.SchemaName, r.TableName, 
-- r.IndexID,
 i.name as IndexName,
 r.CurrentSizeKB, 
 c.CompressionType as CurrentCompression,
 r.RequestedSizeKB as RowCompressionKB,
 case r.CurrentSizeKB
  when 0
   then 0
  else
   cast((1.0 - r.RequestedSizeKB / (r.CurrentSizeKB * 1.0)) * 100 as numeric(10, 2))
 end as RowDecreasePercent,
 p.RequestedSizeKB as PageCompressionKB,
 case r.CurrentSizeKB
  when 0
   then 0
  else
   cast((1.0 - p.RequestedSizeKB / (r.CurrentSizeKB * 1.0)) * 100 as numeric(10, 2))
 end as PageDecreasePercent
into #executivesummary
from #currentcompression c
 join #rowcompression r
  on c.SchemaName = r.SchemaName
   and c.TableName = r.TableName
    and c.index_id = r.IndexID
 join #pagecompression p
  on r.TableName = p.TableName
   and r.SchemaName = p.SchemaName
    and r.IndexID = p.IndexID
 join sys.indexes i
  on r.TableName = OBJECT_NAME(i.object_id)
   and r.IndexID = i.index_id
order by r.SchemaName, r.TableName, IndexName;

-- show everything
select * from #executivesummary;

/*

-- if you want to change how reporting is done
-- experiment with the following queries
-- be sure to comment out the original one, above

-- focus on the largest tables
select top 10           -- focus on the biggest bang for the buck
* 
from #executivesummary
where RowDecreasePercent + PageDecreasePercent > 50  -- peel off the bottom layer
order by CurrentSizeKB desc;       -- focus on the larger tables and indexes

-- focus on the largest compression
select top 10           -- focus on the biggest bang for the buck
* 
from #executivesummary
where CurrentSizeKB > 1024
order by RowDecreasePercent + PageDecreasePercent desc;

*/

-- doing the dishes
close curTables;
deallocate curTables;

drop table #currentcompression;
drop table #rowcompression;
drop table #pagecompression;
drop table #executivesummary;

go



Results

Compression Estimates

Compression Estimates Percent Complete
The default is to display everything sorted by Table Name and Index Name. I've included a few other queries in the comments that will let you modify the display to focus on the largest tables, and to show that tables that should receive the highest percentage of compression.

Enjoy!

Tuesday, September 3, 2013

Master of Puppets

You may have heard about the recent announcement from Microsoft to cancel the Microsoft Certified Masters program. AKA, the MCM and MCSM programs. Well, it wasn't really an announcement. An announcement is when you make a public declaration of some information or event. Think about a wedding announcement.


Instead, this was announced in an email very late on Friday night. Because of Time Zone differences, mine came in after midnight. In the email we were informed that all of the MCM exams and the entire program are being retired on October 31st.

A few news sites have picked up on this already, and actually had made public the non-announcement.

What's so awful about this is, within the past few weeks there have been other, contrasting announcements about expanding the number of testing centers, upcoming release of a new exam, etc. It's like the proverbial saying about the right hand not knowing what the left hand is doing.

Strange, to say the least.

Anyone who was part-way through the MCM program is being left out to dry at this point. All your time and money spent for naught. I can only imagine how that must feel. Awful, truly awful.

On the one hand, I'm disappointed. I spent a lot of time, effort, and money to achieve the MCM, only to have it discontinued. Kind of a slap in the face. However, I have to admit, that on the other hand, I'm a little relieved.

Wait, let me explain. As I see it, there are some serious problems with the overall Microsoft Certification Program. Since they've decided to gut the MCM Program, this is a good opportunity to fix everything that is wrong with it.

A Looming Deadline

I finished the MCM a little over a month ago, and I couldn't be more relieved. I had put a lot of time, money, and effort into this. My Significant Other has been very patient and supportive during this journey, but it was time for it to end. Or at least be able to rest for a little while.

Nope.

You see, once you have the SQL 2008 MCM, you only have until June of 2014 to complete the SQL 2012 MCM. That's only ten months away! Also remember, there is a 90-day waiting period for retakes at this level.

If you couldn't make that deadline, then you start back at the bottom with all the MCP/MCTS/MCITP/MCSA/MCSE exams to pre-qualify you for the opportunity to try the SQL 2012 MCM exams once again.

And, guess what, they didn't even have the the SQL 2012 MCM Lab Exam ready. So you have a deadline ticking away, but not much you can do about it.

I was a little frustrated by that timeline. I had just spent a considerable amount of money, my own money, to complete the MCM program, and now I had jump right back in and start spending a bunch more money, immediately. Add to that, I've used about half of my PTO (vacation days) for my studies, travel, and test taking along this journey.

So you can see why I'm a little relieved. Now, I don't have to explain my elaborate cover story as to why I'm not going to bother with the SQL 2012 MCM. Instead, I can join the chorus of folks who are screaming bloody murder about the program being canceled.

Maybe now, I can have a little break and attempt to pay down my SQL 2008 MCM costs. Maybe tomorrow, there will be a new announcement of an entirely news certification program that has been in development for months and months. Wink, wink, nudge, nudge.

Don't Change the Brand

One of the problems with the Microsoft Certifications and the MCM Program is the names and acronyms keep changing. Take a page from other, successful companies and don't.

Since the beginning of time, it has been the MCP. My first certification was an MCP in Visual Basic Programming. That should continue to be the basis of everything. Stop changing the names of the certifications every time a new version of the product comes out.

People are just now starting to learn about the MCM Program. Most don't even know what it is, including recruiters and HR, and now you're changing it to MCSM, why? And now, before people have a chance to be confused about the MCSM, it's getting scrapped.

Keep the standard MCP/MCTS/MCITP/MCM naming scheme, do you see Ford renaming the Mustang to Horse 2.0? No, you don't.

I can't tell you how many times I've seen a job posting or spoken to HR/Recruiting and they ask if I'm an MCDBA for SQL 2008 or even SQL 2014.

If you say 'no, I have the MCITP or MCM for SQL 2008 which is the newer version' all they hear is 'no' and move on. So, what you have to say is 'yes, I have the McDBA for SQL 2012' or whatever stupid crap recruiters are asking for.

TLAs are better than FLAs

But if you were going to change the names of the certificates, at least choose something easy to say, easy to understand, and that is intuitive.

People love three letter acronyms. They roll off the tongue easier, and they just sound so cool.

I would propose the following nomenclature:

  • MCA, MCP, MCE, MCM. Simple, easy, TLAs.

  • Associate, Professional, Expert, Master.

Most people intuitively know how to rank those four levels. You don't need to know anything about the technology in order to understand that an Associate is lower than an Expert, or Master.

Certificates Shouldn't Expire

I'm not saying you shouldn't continue to train, get certified, learn new skills, etc. But the certs you've earned should stay with you, period. Think about how many SQL 2000 installations there are still out there.

If you are an expert on an old piece of technology, and the customer needs that, then you are still the expert.

If a certification is tied to a specific version of technology there is no need to expire it. That person is not diminishing or interfering with new technology or certifications.

If someone only has certification from ten years ago, and nothing more recent, then let the customer decide if that is what they want.

Specialists Specialize

The SQL 2008 Server Certification program had three tracks: DEV, DBA, BI. There were two levels: junior and senior. Now, you have to complete all three tracks to get the entry level certification for SQL 2012.

Think about cars for a minute. Mechanics specialize. You have transmissions, engines, fuel injection systems, etc. Someone who knows how to fix one, rarely knows the others. Or you have an oil-change technician.

Or doctors? Orthopedic surgeon; ear, nose, and throat; endocrinology. Or you have a general practitioner.

Have you perused job descriptions that require you to be an expert in all three: BI, Dev, DBA; yet paid lower than just one? Me too, lot's of them. Those are interesting interviews, but they are also jobs to be avoided like the plague.

The official party line seems to be that Dev, DBA, and BI are so intertwined that you have to understand all of them in order to do any of them. Well, the real world doesn't quite work that way. Knowing about other areas certainly makes you better, and should be rewarded. But for an entry level certification that is ridiculous.

And, if you truly believed that, then how come someone can upgrade to the new MCSA with only one of the old MCTS certs. If all three skills were so intertwined, then you would require someone doing the upgrade to hole all three MCTS certifications.

Cost Benefit Analysis

All this leads me to the question whether I made the best choice of pursuing the SQL 2008 MCM. What is the cost / benefit analysis of all the time, money, effort, PTO, relationship costs, etc. for pursuing the MCM?

With the same money, you could self-fund a trip to the PASS or BA conferences. You could speak at tons of SQL Saturdays. You could take all the SQL 2012 MCSE Certifications. You could go on a SQL Cruise. And you'd still have money left over.

MCM RIP

I do hope Microsoft reconsiders canceling the MCM Program. This was the only certification that was serious and had sufficient rigor. It gave you something to strive for if you wanted to distinguish yourself from your peers.

Please take a moment and register a comment on the connect site and let Microsoft  know how you feel.