Serverless computing is a cloud computing execution model in which the cloud provider dynamically manages the allocation of machine resources. Currently the most popular serverless offerings are from AWS, Azure, Google Cloud Engine and IBM cloud.
We started our journey using the traditional architecture and used PHP, Rabbit MQ, MySQL and servers on EC2 to develop our first version of the software and then decided to re-architect the same using serverless.
Why did we choose serverless as a solution?
Our PaaS model envisages multiple clients using our solution simultaneously and their ability to quickly demo Proof Of concept IoT solution to their internal stakeholders.
We have seen simple cost gains from Hardware - infrastructure sharing (No EC2 instances, pay as you use), no need to have an infrastructural engineer (scaling, load balancers), Software (reduced development costs, quicker and faster development, use of 3rd party services like IAM/Cognito, CloudFoundry) and finally security (WAF, AWS Inspector and Guard duty).
Reduced Hardware Costs
In the V1 of our platform (the traditional server platform) we had to deploy the whole infrastructure using CloudFormation scripts, create all the EC2 instances, RDS for MySQL and Load balancers etc. for each client. The operation used to take time which included deploying and testing and also made inefficient use of infrastructure provisioned as sharing was not possible. Our scenario of quickly offering the clients the ability to sign up and do a quick PoC and demo to internal stakeholders meant that the usage patterns were also inconsistent. Now, with the Serverless approach we need not make any capacity decisions ourselves and depend on AWS to do it. Another major benefit was the only pay for the compute that you need for the Lambda. Since our traffic profile is inconsistent, this was a major cost saving for us.
Infra Manpower costs
Having less decisions to make means that we also do need to have an infra engineer to help us with configuration and management. This also reduces the need to have support engineers on standby for hardware issues, backups etc. and we can provide simple SLAs to the client riding on AWS SLAs
Similarly, since scaling is performed by AWS we no longer thought about the concurrent requests, memory, performance etc. at the serverless level. This was one of the most difficult things to do for us to get right in all the web applications we had built.
Reduced development cost
In our version 1 of the software, we had to manually hand craft (use reusable self-written components from older projects) to do a lot of stuff and also to create linkages between various components of the system like MQ to DB on our own. With BaaS, we just use different services now. Although it was possible to use the services earlier, but the architecture was not based on microservices. Hence the system architecture anyway had to be reworked.
Authentication moving from self-written code to AWS cognito (for web and mobile) turned out to be a huge advantage for us.
Another one benefit which we had was moving from Rabbit MQ to AWS IoT and shifting from http protocol to MQTT protocol. All these using the AWS SDKs reduced a lot of development time, testing time and finally power saving on the Mobile devices.
Reduced deployment complexity
Using serverless.com, we also reduced the deployment and packaging complexity of our platform. It was much simpler and more automated than deploying the complete servers, and installing softwares and patches on them.
Our team uses lean and agile process and we love to experiment on the new features quickly. Along with simpler deployment and microservices architecture allowed us to introduce these with low friction and minimal cost.
Vendor lock in
Almost all our components and services use AWS as a primary vendor. So, without major refactoring it would be difficult to get out of the vendor lock in. As such, we are dependent on AWS for downtime, account limits and throttles, costing, version upgrade of underlying softwares etc.
The more number of service vendors are being used, the more we give up control on the environment for execution. In a server environment, the problem is similar while using 3rd party components, but we are generally in control of the design and development of the execution environment.
Sometimes it is better if we can control the optimizations and tune our servers and softwares to meet a particular goal. With serverless environment the parameters available are limited to the ones which are provided by the service provider. It is also always quicker to fix and or upgrade to a newer version of the software/OS to apply a patch or use a feature which becomes available.
We have run into limits imposed by AWS like synchronous execution of Lambdas (1000) or ENI creation limits once in a while. As of now, with the current loads, this was mainly due to incorrectly written code. But depending on the final environment and needs this could be a potential bottleneck.
Another limitation is the max. amount of time available for execution for Lambda which is currently 300 seconds. We had a couple of batch jobs and long running cron jobs which we had to re-architect.
Since each Lambda gets executed synchronously, all concurrent Lambda connections gets instantiated in a new instance (Cold Start) which introduces a latency. Combined with latency of API/Microservices this becomes quite significant when the user expects responsive UI.
Debugging and Testing
It is quite difficult and non-trivial to debug failing lambdas in the microservices or FaaS environment. We have been experimenting with the X-Ray feature, but sometimes problems are extremely hard to reproduce and debug.
All testing needs to happen using the AWS infrastructure and hence any kind of load testing or stress testing would include the associated costs. We are exploring the possibility of testing on AWS Greengrass.
Similarly, to the configuration and packaging points there are no well-defined patterns for discovery across FaaS functions. While some of this is by no means FaaS specific the problem is exacerbated by the granular nature of FaaS functions and the lack of application / versioning definition.
What all components are we using for our platform