Tuesday, September 24, 2013

An Overview of SQL Server 2014 In-Memory OLTP Hekaton

So you've heard of Hekaton, but what is it, and why do you want it?

Hekaton, or In-Memory OLTP, is an entirely new set of data structures for tables and indexes that is optimized for memory-optimized storage as opposed to disk-based storage.

Hekaton is the code name for In-Memory OLTP, so I will use these two terms interchangeably.

Why is Microsoft doing this? Short version, memory and server are much cheaper now, than they were when SQL Server first launched.

At this point, it is feasible to have enough memory on your server to house the entire database. Even large, one terabyte databases.

However, the query optimizer and its costing rules haven't changes along with this. So, even if you have tons of memory, SQL Server is still making assumptions that it will be reading data off of the disk.

Basic Glossary of Terms

  • Cross-Container Transactions - transactions that use both disk-based tables and memory-optimized tables

  • Disk-Based Tables - plain old normal tables, what you have right now, 8k pages

  • Hekaton - codename for In-Memory OLTP

  • In-Memory OLTP - new architecture and data structures using memory for data storage instead of disks

  • Interop - a TSQL query written against memory-optimized tables

  • Memory-Optimized Tables - tables using new memory data structures to store their data

  • Natively Compiled Stored Procedures - compiled machine code instead of interpreted TSQL, still written in TSQL but with some restrictions


Databases

In order to make use of In-Memory OLTP, you need a database that supports it. It's fairly easy to do this. When you create the database you need a special filegroup with the CONTAINS MEMORY_OPTIMIZED_DATA clause. Additionally, you need to use a Windows BIN2 collation. This can be done at the database, table, or query level.

Tables

To create a Memory-Optimized you use the MEMORY_OPTIMIZED = ON clause. There are several restrictions on column types, but in simple terms no LOB data types are allowed, no whatever(max), and no CLR.

Rows are limited to 8060 bytes with nothing stored off row. The size limitation is enforced at creation, so all of your columns sizes must fit within this limitation.

DML triggers are not allowed, neither are foreign key or check constraints. Love GUIDs? I hope so, because identity columns are out, too.

There are two basic types of Memory-Optimized Tables, SCHEMA_ONLY and SCHEMA_AND_DATA.

SCHEMA_ONLY tables are non-durable. You can put data in there, but in the event of a restart or crash, the table is recreated but your data is gone. This could be useful for storing application session state or for staging tables in a data warehouse.

Indexes

Memory-Optimized Tables can have two types of indexes, Hash Indexes and Range Indexes. All tables must have at least one index, and no more than eight. Also, tables that are defined as SCHEMA_AND_DATA must have a primary key. Indexes are rebuilt each time SQL Server starts up.

A Hash Index is an array of pointer, where each element points to a linked list of rows. The number of elements in the array is controlled by the BUCKET_COUNT clause. In general, you want to set the BUCKET_COUNT to at least the number of unique key values in your table.

If you have too few buckets, then multiple key values will share the same linked list, which will mean longer scans to look for your row. If you have too many, then you will be wasting memory with empty buckets.

Range Indexes are good for when you will be searching for a range of values, or if you are not able to properly estimate the BUCKET_COUNT size. However, Range Indexes are not available in CTP1, so we'll have to wait a bit to learn more about those.

Queries and Stored Procedures

There are two basic methods for querying memory-Optimized Tables. Natively Compiled Stored Procedures or good old-fashioned TSQL, known as Interop, which also includes regular stored procedures.

Natively Compiled Stored Procedures are going to be the fastest. However, they are only able to access Memory-Optimized Tables. If you want to be able to query regular tables along with Memory Optimized Tables, then you will need to use TSQL Interop. There are a variety of restrictions when using TSQL Interop such as MERGE, cross database queries, locking hints, and linked servers.

TSQL Interop allows you to make a gradual migration to In-Memory OLTP. This way you can slow convert a few objects at a time based on which ones will give you the most performance gain.

One Big Caveat

One thing to keep in mind is that tables, indexes, and stored procedures cannot be modified in Hekaton. This means that you will need to drop and re-create these objects in order to make changes. Also, stats have to be rebuilt manually. And then to take advantage of them, the stored procedures would need to be recreated, as well.

Obviously, this is a fairly major restriction. However, I think I can live with this for a version one product. I hope that by the time SQL Server 2015 comes out, there will be an easier way to add a column to a Memory-Optimized Table.

Concurrency

Hekaton offers an improved versioned optimistic concurrency model for memory-Optimized Tables that removes waiting for locks and latches. Explicit transactions are supported using Repeatable Read, Serializable, and Snapshot isolation levels. Read Committed and RCSI are only available with autocommit transactions, with RCSI only if no disk-based tables are involved.

High Availability and Disaster Recovery

All the basics such as backups and restores are available. Additionally, AlwaysOn and Log Shipping are fully supported. Unfortunately, Mirroring and Transactional Replication are not. However, this isn't too much a surprise since SQL Server is definitely pushing AlwaysOn as the new HA/DR solution.

Migration Assistance

The AMR Tool (Analyze, Migration, Reporting) will identify unsupported data types and constraints in tables. It will also recommend which tables and stored procedures should see the most performance improvement by converting to In-Memory OLTP and Memory-Optimized Tables.

Management Data Warehouse offers the Transaction Performance Collection Set, which will help you gather the necessary data in order to let the AMR Tool work its magic.

1 comment:

  1. […] Nic Cain on Hekaton In-Memory OLTP.  One additional limitation that Nic doesn’t bring up is the lack of foreign key and check constraints. […]

    ReplyDelete