In this post, we'll have a look at a resource-intensive application that was initially written in a monolithic fashion, how the system was migrated to cloud-native architecture and what benefits this provides.
What I needed was a system to analyze pages. Basically, I wanted to give the system a URL and then it should perform multiple checks on the given page. These would typically include a full google lighthouse check, a test for broken outgoing links, missing images and many other possible future use-cases.
The original system I had was running in a monolithic application, to keep things simple in the beginning by starting out with a monolith. Due to the resource-intensive nature of some of those tasks, the resulting system required both a ton of effort in throttling and scheduling of tasks and still a lot of over-provisioning to keep it stable during those unpredictable workloads.
A better solution was needed and it turned out to be serverless.
To solve these issues, each task that needs to get executed on a given website was moved to its own lambda function.
Lambdas provide the huge advantage here that they can be easily and quickly scaled. Typically creating a lighthouse report for one page would peak at 1 GB of RAM usage and would take about 20 seconds and in some extreme cases, these numbers might get far worse.
To keep the system maintainable and extendable, it turned out to be a great choice to use a publish-subscribe model via the SNS fan-out pattern.
There's one central SNS topic to which pages that need to be analyzed are published. For each task that needs to be performed on such a page, we use a subscriber with an SQS queue that acts as a source for the relevant analyzer in a lambda.
As stated in the beginning, the initial, monolithic application required a combination of over-provisioning and throttling for task execution. With the use of lambdas, however, it was possible to completely isolate tasks from each other and to scale them individually on-demand.
While such a distributed architecture might seem more complex at first sight, it is actually far easier to build, because there is no special handling required to deal with resource constraints. Additionally, the system became more stable and reliable, since those tasks run completely isolated from each other.
The publish-subscribe model via an SNS fan-out allows the system to be easily extensible. Additional analytics tasks on a given website simply become another subscriber - SQS - Lambda - ResultSink (Dynamo, RDS, S3, etc) combination.