This article shows a simple way to give applications with multi-tenant architectures the ability to add custom fields while keeping the performance, type safety and reporting capabilities of the relational model and SQL language.
SaaS Takes Over the ASP Model
In the late 90’s application service providers started leveraging the availability of abundant bandwidth and the popularity of the web to deliver applications in an ASP model. Essentially the application would be hosted on a shared server and managed by the ASP. While this was a step in the right direction, there were still significant challenges with this approach. These models evolved into what is now knows as Software as a Service or SaaS. SaaS architectures introduced multi-tenancy which allowed pools of high powered resources to be shared in a different way. Rather than customers having their own web server and database, a shared infrastructure served dozens or sometimes thousands of customers.
The Customization Requirement
Businesses generally need some level of customization for all their software. The most common need is to store information that is specific to that business, but may not be something every single customer needs to have. For instance, we are a Salesforce.com customer. For all our leads, we track the download date so when know when a lead has downloaded a trial of our self-service reporting platform for evaluation purposes. Since most Salesforce.com customers do not have downloadable trials, it would not make sense for them add this just for us. If you did that in a multi-tenant environment, you’d quickly end up with a data architecture that is unwieldy from a maintenance perspective.
Instead, Salesforce lets us add custom fields that are relevant to us. They’ve done a great job of engineering the system so the custom fields work pretty much like a stock field that the system ships with. If you are working on a SaaS application, you will likely run into the same scenario.
Inevitably, customers needed the ability to extend the applications. Since there was a shared code base and schema, vendors needed to find a way to deliver this functionality without changing the database schema which was shared across all customers. Two main approaches became popular.
Two Popular Methods
|Entity Attribute Value(EAV)||This method uses name-value pairs||Single universal schema across all tenants.||Eliminates all advantages of using a relational database.
Special Code must be written to access the data.
Significant performance and scalability problems with large data sets or objects with a large number of fields.
Reporting becomes nearly impossible
|Custom Joined Table||This method adds a secondary table per tenant and which will be joined.||Relatively high performance
Retains the benefits of SQL technology
|Does not scale to a large number of tenants.
The number of custom tables becomes impossible to manage and administer.
The application schema must change at run time.
There Is A Better Way
One way to retain all the benefits of the relational model without adding lots of customer-specific tables is to use a single, generic table to store the data and a second table to map it to customer specific fields. Since databases optimize types differently, we will create a set of columns for each time. Note that under this approach we do have a limit to the number of maximum fields there are. For extremely large systems, you may want to break each type up into a separate table.
Create script for custom fields table.
CREATE TABLE [dbo].[CustomFields]( [TenantID] [int], [TableName] [varchar](50), [EntityID] [int], [Int1] [int] NULL, [Int2] [int] NULL, [Int3] [int] NULL, [String1] [varchar](50) NULL, [String2] [varchar](50) NULL, [String3] [varchar](50) NULL, [DateTime1] [datetime] NULL, [DateTime2] [datetime] NULL, [DateTime3] [datetime] NULL ) ON [PRIMARY]
This table will store the actual data. The TenantID field locks the data to that particular account or customer in a multi-tenant system with a shared database. TableName represents the name of the table where the entity is stored. The EntityID is the primary key of whatever object this links to. The rest of the fields store actual data based on the type of the field. So if we just created a Lead in our CRM system and needed to record a download date as a custom field our insert would look something like this.
INSERT INTO CustomFields( TenantID, TableName, EntityID, DateTime1) VALUES (1,'Leads',1,'08/18/2010')
To retrieve the data we would do a LEFT JOIN. This lets us retrieve any custom fields.
SELECT * FROM Leads LEFT JOIN CustomFields ON Leads.TenantID=CustomFields.TenantID AND TableName='Leads' AND EntityID=LeadID
Next we need the ability to map or alias the name into a more meaningful label. We would do this in the user interface or reporting tool dynamically. To do so we’d need to store the “metadata” whenever a custom field is created. Here’s what our tracking table looks like.
CREATE TABLE [dbo].[CustomFieldNames]( [TenantID] [int] NULL, [TableName] [varchar](50) NULL, [FieldIndex] [int] NULL, [Type] [varchar](50) NULL, [Label] [varchar](50) NULL ) ON [PRIMARY]
To setup our “Download Date” field, we would use an insert like this the first time the field is setup.
INSERT INTO CustomFieldNames( TenantID, TableName, FieldIndex, [Type], Label) VALUES (1,'Leads',1,'datetime','Download Date')
Essentially this would map to DateTime1 and the application UI or reporting tool would alias the field name dynamically based on which tenant the user belonged to.
We now have the entire data architecture capable of storing custom fields for our multi-tenant application. Each customer can have their own set of custom fields. Best of all, we keep all the benefits of the relational model and can plug this directly into a reporting tool like Izenda. The only overhead is a single join and adding an index will remove most of that.