Joins in DynamoDB - Single-Table Design

While DynamoDB does not directly support join operations like a relational database, we are still able to model relations and to perform complex queries on them via a single-table design.

Intro

DynamoDB is just a simple key-value store

You can’t do joins with DynamoDB

For anything with relations you need to use a SQL database because in DynamoDB you would either have multiple round-trips or you need to scan the table

Do these quotes sound familiar to you? Chances are you might have said one of them yourself. Well, I definitely won’t blame you, but the thing is, they are all wrong.

DynamoDB can be extremely powerful if you know how to use it and it seems to me that this information hasn’t been spread enough in the developer community yet. Therefore, let me welcome you to the fancy new The power of DynamoDB series where we’ll go through some of the amazing things you can do with DynamoDB.

To keep things interesting, we’ll try to stick to a real-world example everybody should be able to relate to. Let’s quickly introduce the example application before we move on to the basics of DynamoDB and our first unexpected usage of DynamoDB.

Example application - cowments

Have you ever seen one of those textboxes at the bottom of a page, where people can shout at the author of an article and get into fights with other readers?

Our example will all relate to a more or less fictitious comments backend, similar to disqus, which we’ll name cowments (because naming is hard). Since we are pretty sure that cowments will take over the world of commenting, we can’t just do with a SQL database, we need something super fast and something that scales. We choose DynamoDB. But will it really support everything we need to do?

Well, we’ll figure it out in this series, but first, let’s start with a short section about some DynamoDB fundaments, otherwise, all the magic we are going to use won’t make any sense.

The Basics

DynamoDB supports two data models: key-value and wide-column. In the case of a key-value model, we are always accessing our items with the full key and use DynamoDB like a hash table. This is typically the one thing people think about when they hear DynamoDB, and it is definitely a good use case for this database. However, many are not aware of how we can use it as a wide-column store to allow for more complex querying mechanisms.

In the wide-column data model, we are still dealing with a form of hash-table due to our hash keys, but the additional range key transforms the entries in this table to B-trees.

The hash key describes the partition, we can also view this as a hashtable.

hashtable-dynamo.svg

Range/Sort key specifies the item in a partition. We often see this mechanism used with dates as the sort key. Query is then used to retrieve the last x items.

table-with-created.svg

We can still retrieve a single comment via its sort and hash key:

const dynamoRes = await dynamoDb.get({
        TableName: 'CommentsTable',
        Key: {
					URL: 'www.a.com',
					CreatedAt: 6543
				}
    }).promise()

And more interestingly, we are also able to select an arbitrary number of the most recent comments:

const dynamoRes = await dynamoDb.query({
        TableName: 'CommentsTable',
        KeyConditionExpression: 'URL = :url',
        ExpressionAttributeValues: {
            ':url': 'www.a.com'
        },
        ScanIndexForward: false,
				Limit: 2
    }).promise()

However, this is not where the fun stops. The query operation actually allows us to select on begins_with, between and the comparison operators on the sort key, which makes perfect sense if we keep in mind that we are operating on a B-tree here.

In the following section, we will see how we can combine our Pages and Comments tables into a single table to perform joins on both the pages and the comments.

Challenge: Fetch a Page and all its Comments

It is easy to imagine that a client will need the general information about the page and (some of) its corresponding comments. Luckily we don’t have to perform two separate calls to DynamoDB to achieve this. Instead, we will move both types of entities to a single table.

Since the table is now a generic table, not bound to a specific entity type, we rename our hash- and sort-key fields to the generic names pk and sk, additionally we encode the type of the entity in the key itself.

Let’s have a look at what this would look like:

single-table.svg

Our table is now sparse and only certain items (rows in DynamoDB) use certain fields.

The interesting part about this approach is that we are still able to fetch a page’s data

const dynamoRes = await dynamoDb.query({
        TableName: 'Table',
        KeyConditionExpression: 'pk = :pk sk = :sk',
        ExpressionAttributeValues: {
            ':pk': 'PAGE#www.a.com',
            ':sk': 'PAGE#www.a.com',
        },
        ScanIndexForward: false,
				Limit: 2
    }).promise()

The last n comments:

const dynamoRes = await dynamoDb.query({
        TableName: 'Table',
        KeyConditionExpression: 'pk = :pk and begins_with(sk, :sk)',
        ExpressionAttributeValues: {
            ':pk': 'PAGE#www.a.com',
            ':sk': 'COMMENT#',
        },
    }).promise()

But we are now also able to select the page and its comments in a single query that returns the page information and n-1 (since we are using one item for the page information) comments:

const dynamoRes = await dynamoDb.query({
        TableName: 'Table',
        KeyConditionExpression: 'pk = :pk',
        ExpressionAttributeValues: {
            ':pk': 'PAGE#www.a.com'
        },
        ScanIndexForward: false,
				Limit: 3
    }).promise()

The big advantages of this approach are the single query and database call we use and the fact that query, as opposed to a scan on DynamoDB, is extremely fast and cheap.

This is of course just a very basic example and should only provide an introduction to the underlying principles of a single-table design in DynamoDB.

Often we will have additional queries that we need to support in our table and to do so we need to use additional hash- & sortkey pairs on the same table via Global Secondary Indizes, which additionally allows us to query items based on conditions.

In general, single-table design can be a useful tool to unleash the full power of DynamoDB, but using such a design definitely increases the complexity of interacting and designing the database. Often it is best to only move to a single-table design after the specific query patterns are already well known.

Summary

In this post we learned how DynamoDB is basically just a hashmap with B-trees as it’s values. The querying mechanism on the B-tree allows us to use DynamoDB for far more than just simple key-value pair lookups.

Since the topic is quite fascinating there might be more posts on DynamoDB and more advanced concepts to come.