Use SQL Server Sparse columns to optimize storage for columns with mostly NULL values

If your SQL Server database has nullable columns which contain mostly null values consider setting these columns to sparse. Sparse columns are ordinary columns that have been optimized for storing null values. Not all columns can be set to sparse and a number of restrictions apply.

How to set sparse

Columns set to sparse require more storage space for nonnull values (4 additional bytes) than the space required for identical data in a column that is not marked sparse. This means it only makes sense to use sparse when the percentage of nulls is above a certain threshold. The threshold is different for each data type.

An except from Microsofts Estimated Space Savings by Data Type table is below. The red highlight shows how many bytes are needed for normal non sparse columns, this is regardless if they are null or not.

The blue highlight shows bytes required for non null values in sparse columns. Note null values take up 0 space in sparse columns.

The final column shows what percentage of nulls is needed for space saving of 40%. Microsoft recommends using sparse columns when savings are at least 20 – 40%. We can see in yellow that to save 40% space for int columns we need 64% of values to be null, while the corresponding percentage for datetime columns is 52%.

Sparse fixed data types

Determining percentage of null values in a column manually

Pretty simple..

SELECT 100.0 * SUM(CASE WHEN Address4 IS NULL THEN 1 ELSE 0 END) / COUNT(*) from Address AS Address4NullPercent

where Address4 is your column and Address is your table name.

Determining percentage of null values in all columns in all tables dynamically

If you have a large DB and wish to check for sparse candidates across your entire database you can use the script on this page which uses temporary tables and cursors to do this dynamically.

Note the script referenced above might take a while to run so to prevent performance issues for your existing prod processes its best to run it either after hours or on a production clone. It will produce a table similar to below.

sparse-candidates

You’ll get the biggest space savings from using sparse columns with fixed-length columns having a high null percentage in tables with large amounts of records.

Finding which columns are already sparse

SELECT so.name [Table Name], sc.name [Column Name]
FROM sys.columns sc
JOIN sys.objects so
ON so.OBJECT_ID = sc.OBJECT_ID
WHERE is_sparse = 1

 

Leave a Reply

Your email address will not be published. Required fields are marked *