PostData Types and How They Affect HA/DR

I previously wrote about Data Types and How They Affect Database Performance which was a way to get database developers to think about the small things that affect overall performance. As a high availability and disaster recovery (HA/DR) professional myself, I like to think of how data types affect database availability and recoverability. The fact is, all SQL Server HA/DR features are affected by the amount of transaction log records generated both by user and system transactions.

Transaction log records are read during a database’s recovery process.  Whenever the SQL Server service is restarted, the transaction log file is read as part of the recovery phase, committed transactions are persisted to the data file (REDO phase) and uncommitted transactions are rolled back (UNDO phase) to keep the database in a transactionally consistent state. A more comprehensive discussion of the recovery phase is described in the Understanding How Restore and Recovery of Backups Work in SQL Server topic in Books Online. Every SQL Server HA/DR feature runs either just the UNDO phase or the whole recovery process.

  • Backup/restore. After the data copy phase completes as part of restoring a backup, the recovery process kicks in.
  • Failover clustering. Failover is basically the process of intentionally (manual) or accidentally (automatic) stopping the SQL Server service from the current active node to any one of the nodes in a failover cluster.
  • Database mirroring/Availability Group. The transaction log records are sent from the principal/primary to the mirror/secondary where the REDO phase is continuously running. When a failover is initiated, only the UNDO phase runs on the mirror/secondary to make it the new principal/primary.
  • Log shipping. This is an automated backup-copy-restore process of transaction log backups. The REDO phase runs when the latest log backup is restored on the secondary and the UNDO phase runs when the databases are recovered.
  • Replication. The Log Reader Agent reads the transaction log records from the Publisher that are marked for replication and converts them to their corresponding DML statements, copied to the Distributor and applied to the Subscribers.

Bottom line is this: availability and recoverability is affected by the amount of transaction log records generated in the database. Now, you might be asking, “how do data types affect HA/DR?

Suppose you have a table structure similar to the one below. This was actually taken from one of the corrupted databases that I had to recover last week as part of a disaster recovery situation.

[callout]
CREATE TABLE [dbo].[Users](
[userID] [uniqueidentifier] NOT NULL PRIMARY KEY,
[FirstName] [varchar](255) NULL,
[LastName] [varchar](255) NULL,
[UserLogin] [varchar](255) NULL,
[IsActive] [int] NULL
)
GO
CREATE NONCLUSTERED INDEX NCIX_Users
ON Users (Lastname, Firstname)
INCLUDE (UserLogin,IsActive)
GO
[/callout]

Let’s insert a row in this table and measure how much transaction log record gets generated in the process. Note that I will wrap the INSERT statement in an explicit transaction so I can take a look at the amount of transaction log records using the sys.dm_tran_database_transactions DMV.

[callout]BEGIN TRAN;
INSERT INTO Users VALUES (NEWID(), 'James', 'Dylaine', 'jdylaine', 1);
INSERT INTO Users VALUES (NEWID(), 'Carol', 'Smith', 'csmith', 1);
INSERT INTO Users VALUES (NEWID(), 'Ryan', 'Anker', 'ranker', 1);
INSERT INTO Users VALUES (NEWID(), 'Bobby', 'Reston', 'breston', 1);
GO

SELECT [database_transaction_log_bytes_used], [database_transaction_log_bytes_used_system]
FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID (N’testDB’);
GO

COMMIT TRAN
GO
[/callout]

LogGenerated

Inserting four (4) records in the table generated 1440 bytes. That’s about 1/8 the size of a data page. Imagine the amount of transaction log records generated if this was a highly transactional table. Now, if you look at the table and index structure, the userID column – a 16-byte column – is assigned as a PRIMARY KEY column. By default, this also becomes the clustered index key. And because the clustered index key is also included in the non-clustered indexes,  we are duplicating a 16-byte column for every non-clustered index we create.

What if we make certain modifications on this table based on the column data types? Let’s say that we’ll change the userID column data type from uniqueidentifier to a bigint data type of size 8 bytes and make it an identity column. Note that I am only using this as a simple example as the data types should still meet your business requirements. Let’s also change the IsActive column from an int to a bit data type of size 1 byte since there are only two possible values for this column.

[callout]
CREATE TABLE [dbo].[Users](
[userID] [bigint] IDENTITY (1,1) NOT NULL PRIMARY KEY,
[FirstName] [varchar](255) NULL,
[LastName] [varchar](255) NULL,
[UserLogin] [varchar](255) NULL,
[IsActive] [bit] NULL DEFAULT 0
)
GO

CREATE NONCLUSTERED INDEX NCIX_Users
ON Users (Lastname, Firstname)
INCLUDE (UserLogin,IsActive)
GO
[/callout]

By doing this, we’ve reduced the row size by 11 bytes. Now, you might be thinking, “11 bytes is almost negligible with today’s disk and memory capacity.” Let’s see. I’ll insert the same four (4) records on the new table structure and measure the amount of transaction log records generated in the process. The corresponding INSERT statements need to be updated as well to reflect the change.

[callout]
BEGIN TRAN;
INSERT INTO Users (Firstname, Lastname, UserLogin, IsActive) VALUES ('James', 'Dylaine', 'jdylaine', 1);
INSERT INTO Users (Firstname, Lastname, UserLogin, IsActive) VALUES ('Carol', 'Smith', 'csmith', 1);
INSERT INTO Users (Firstname, Lastname, UserLogin, IsActive) VALUES ('Ryan', 'Anker', 'ranker', 1);
INSERT INTO Users (Firstname, Lastname, UserLogin, IsActive) VALUES ('Bobby', 'Reston', 'breston', 1);
GO
SELECT [database_transaction_log_bytes_used]
FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID (N'testDB');
GO
COMMIT TRAN
GO

[/callout]

LogGenerated2

The difference in the amount of transaction log records generated between the two is a mere 88 bytes. You might be wondering, “Didn’t we just reduce the row size by 11 bytes for a total of 44 bytes for all 4 rows? How come we’ve reduced it by 88 bytes?” That’s because the size reduction not just applies to the row itself but to all of the non-clustered indexes since they include the clustered index key. If you do the math, considering a table with 10,000 rows and four (4) non-clustered indexes, you’ve basically reduced the size by 550,000 bytes (550 KB.) And this number just adds up depending on the number of tables, rows and indexes that your database has.

If we want to achieve higher availability by reducing downtime when SQL Server is restarted, failover process initiated or when a database backup is restored, using the smallest yet appropriate sized data type for your columns is just one of many ways to do it. Because we’re reducing the amount of transaction log records generated, the REDO and UNDO phases of the recovery process will definitely run faster (not to mention the amount of log records sent to the mirror/secondary replica will also be reduced.) What I really like about this approach is the fact that doing so also improves your query performance.

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close