We have a rather uniquely designed database tier in my current organization, and it causes some interesting problems when designing new components.

Our data tier currently consists of n servers running Microsoft SQL Server 2008, each running m discrete databases. Each customer has their own database, with its own schema which can differ based on each customer’s configuration, as we literally ALTER TABLE based on both in-app configuration and what we observe when syncing custom fields from the customer’s CRM via its API. For example, Customer A may use our default Contacts table with no additional custom columns, and that would have, say, 50 columns total. Customer B might have synced down all of their custom fields from Salesforce, resulting in a whopping 800 columns. It’s utterly unpredictable, and would have been a perfect use case for a NoSQL solution or, my personal preference, Postgres with hstore.

We’ve been moving toward a Django-based API and processing tier for some time now, and had successfully run Django's inspectdb to reverse engineer our default schema. Just getting the Django ORM to work with MSSQL had its own caveats, however. Considering that we generally do our development on the Python side of things with Macs and deploy to Linux servers in Amazon Web Services, we needed a Django database backend for SQL Server that didn't rely on Windows-specific implementations (I'm looking at you, ADO.NET).

On a side note, when you do a Google search for 'ADO', the first result is the definition of the English word 'ado', which is, rather hilariously:

a·do (/əˈdo͞o/), Noun: trouble or difficulty.

So, without further ado, let's drill into implementation.

First things first, we need a way to distinguish at runtime which database we want to query via Django's ORM. Normally, you would have something like the following in your settings.py -- a DATABASES dict that contains configuration for all of the databases that your Django application needs to access.

DATABASES = {  
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
    },
}

However, this proves problematic if you don't know about your databases until runtime. Our architecture here revolves around having a central 'controller' database that contains a table of customers and their respective MSSQL connection information.

So, rather than dropping a static DATABASES object into our configuration, we need something that can obtain a particular database's connection details at runtime. In order to accomodate this database-level multi-tenancy, we first wrote up this DatabasesShim class.

class DatabasesShim(dict):  
    def __init__(self, *arg, **kw):
        super(DatabasesShim, self).__init__(*arg, **kw)

        # These regexes extract database name, hostname, and credentials from an MSSQL
        # connection string.
        self.mssql_db_name_pattern = re.compile(r"database=(?P<dbname>[^$;\s]*)", re.I)
        self.mssql_db_host_pattern = re.compile(r"server=(?P<dbhost>[^$;\s]*)", re.I)
        self.mssql_user_pattern = re.compile(r"uid=(?P<uid>[^$;\s]*)", re.I)
        self.mssql_pass_pattern = re.compile(r"pwd=(?P<pwd>[^$;\s]*)", re.I)

    def __getitem__(self, key):
        # Overloads the default dict `__getitem__` function.
        # When a dict attribute of the DatabasesShim is accessed, we try to obtain a database connection for the key
        try:
            item = super(DatabasesShim, self).__getitem__(key)

        except KeyError:
            from sfdb.sfcontroller.models import Customer

            template = self.get('template').copy()

            try:
                cid = key

                # We can access a database's connection by using either the customer's primary key, or their domain name, which is unique to their account.
                try:
                    cid = int(key)
                    customer = Customer.objects.using("sfcontroller").get(customer_id=cid)
                except Exception, e:
                    customer = Customer.objects.using("sfcontroller").get(domainname=cid)

                connection_string = getattr(customer, 'sqlconnectstring')

                dbname_matches = self.mssql_db_name_pattern.search(connection_string)
                dbhost_matches = self.mssql_db_host_pattern.search(connection_string)
                pass_matches = self.mssql_pass_pattern.search(connection_string)
                user_matches = self.mssql_user_pattern.search(connection_string)

                template['NAME'] = dbname_matches.group('dbname')
                template['HOST'] = dbhost_matches.group('dbhost')
                template['USER'] = user_matches.group('uid')
                template['PASSWORD'] = pass_matches.group('pwd')

                return template
            except ObjectDoesNotExist:
                return None
        return item

This way, we can instead instantiate our DATABASES setting as follows:

DATABASES = DatabasesShim(  
    template={
        'ENGINE': 'sqlserver',
        'CONN_MAX_AGE': 6000,
        'AUTOCOMMIT': True,
        'TEST_MIRROR': 'default',
        'OPTIONS': {
            'host_is_server': True,
            'login_timeout': 1800,
            'use_legacy_date_fields': True,
        },
    },
    sfcontroller={
        'NAME': 'Controller',
        'ENGINE': 'sqlserver',
        'HOST': 'xxxxxxxxx',
        'USER': 'xxxxxxxxx',
        'PASSWORD': 'xxxxxxxxx',
        'TEST_MIRROR': 'default',
        'OPTIONS': {
            'host_is_server': True,
            'use_legacy_date_fields': True,
        },
    },
    default={
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
    }
)

With these in place, we can easily access any database via Django's Model.objects.using(db_name) syntax.

In part 2, I'll go into our implementation of a Django authentication backend that authenticates users across databases and how we handled the tables with dynamic, unpredictable schemata.