I recently had to design a relational database in which content had to be versioned. Changes to more of less every record in every table in the database must be tracked. For just about any record, a user should be able to retrieve a report detailing the list of changes made to it, when they were made, and by whom.
I considered a few strategies for how to implement these requirements:
I decided to go with the third option. To make it concrete, say I have a Contacts table. Besides the primary key, I store the name of the contact and possibly a comment:
CREATE TABLE [dbo].[Contacts]( [Id] [int] IDENTITY(1,1) NOT NULL, [Name] [nvarchar](255) NOT NULL, [Comment] [nvarchar](max) NULL, [CreatedBy] [nvarchar](30) NOT NULL, [CreatedAt] [datetime] NOT NULL, [ModifiedBy] [nvarchar](30) NOT NULL, [ModifiedAt] [datetime] NOT NULL, [UniqueRowGuid] [uniqueidentifier] NOT NULL, [Active] [bit] NOT NULL)
The remaining fields are system fields that I add to almost every table in the database. You could argue that the created and modified columns should really be part of the history table, but I prefer for the source table to be self-contained. UniqueRowGuid provides a unique identity of any row in any table and is useful for creating weak relationships between tables in the same database or across databases.
If it weren’t for the fact that a primary key of type Guid makes for bad clustered index performance, I could’ve made the primary key in every table be the UniqueRowGuid — MS SQL Server even has a special type of sequential Guid for this purpose. But I stuck with the traditional integer-based key. The Active column is used to denote if a row has been deleted. I want to keep every row for some period of time, even though to the user it appears to have been deleted.
Jumping straight to the history schema, here’s what I decided to store for each change.
CREATE TABLE [dbo].[History]( [Id] [int] IDENTITY(1,1) NOT NULL, [ChangeCorrelationGuid] [uniqueidentifier] NOT NULL, [UniqueRowGuid] [uniqueidentifier] NOT NULL, [ColumnName] [nvarchar](255) NOT NULL, [BeforeValue] [nvarchar](max) NOT NULL, [AfterValue] [nvarchar](max) NOT NULL, [CreatedAt] [datetime] NOT NULL, [Comment] [nvarchar](max) NULL)
The ChangeCorrelationGuid allows for a grouping of related history entries. Suppose in one transaction you modified both the Name and Comment of a contact. Using this ChangeCorrelationGuid, you effectively create different versions of the row. Next comes the UniqueRowGuid which contains the row-value from the source table, resulting in a loose coupling between History and Contacts. Think of it as identifying the source row whereas ColumnName identifying the source column.
Strictly speaking, I don’t need to store both the BeforeValue and an AfterValue, since the AfterValue of one change becomes the Before version of the next. But this piece of controlled redundency makes querying for the summary easier and less costly.