This would be considered a trivial task, if, however, one is aware of all the parameters involved, such as the number of images, the number of customers who will generate them, frequency of generation, simultaneous number of customers, etc. Our project was a launching SaaS platform, which would be aggressively developed and promoted, so there was no way to define these parameters in advance. The only thing that was clear from the start was that the number of users, and the workload respectively, will increase over time.
Given that our client’s entire infrastructure was based in Amazon Web Services (AWS), we were limited to the following options to complete the task:
- To use a single EC2 instance;
- To create an Auto Scaling Group (ASG) using EC2 instances;
- To use Lambda serverless function.
Let's quickly go through these three options and shed some light on the choice we made:
- If we choose the option with a single EC2 instance, we should create a really powerful one, because we don’t know what the workload would be. This would result in significant bills for our clients and might also bring along problems in case the workload becomes too high for the instance we use. If this happens, we’ll have to stop the instance in order to upgrade to a more powerful one, which will lead to additional costs. The load can be reduced at any time, but the service works constantly. Furthermore, some problems might arise with the instance itself, the connection loss in a given area, etc.
- Choosing the ASG approach seems a lot better – we can start with one small (and cheap) EC2 instance, and the group will generate a given number of additional instances, where necessary. Unfortunately, however, this option also has some significant disadvantages – for example, when shutting down the instances, we don’t know which of them exactly will be stopped by the AWS. This means that an instance might be shut down while generating an image, which will create extra work for the dev team to handle such situations and the same e-flyer will be unnecessarily generated twice. Although this option is not pricey, the instance in question is running all the time, which still generates costs regardless of whether it’s really needed or not.
- The Lambda function. If you’re not familiar with it, this function allows you to build some complex functionalities without maintaining any virtual machines yourself. The idea of this service is to perform relatively small tasks, just like the one we have. And its use doesn’t generate additional costs – you can deploy as many Lambda functions as you want, basically for free (using the AWS free tier account), and you are charged only for the real usage of your functions.
Needless to say, price is an important factor here, and reducing the clients’ costs is of utter importance to us. That’s why we decided to calculate and compare the prices of different sample loads and decide what would be best approach:
Bear in mind that:
- These calculations are approximate;
- They are based on average e-flyer generation time of 5 minutes and 1024Mb of required memory;
- The prices don’t cover any fees for Elastic Block Storage (ELB), which will further increase the cost of the two options relying on EC2 instances;
- Configuring more complex infrastructure and further changes to it generates additional costs (this mainly concerns EC2 and ASG).
|1000 requests / ~ 0.023 requests per minute
||t2.small (2 instances on average)
|10 000 requests / ~ 0.23 requests per minute
||m5.large (1.5 instances on average)
|100 000 requests per month/ ~ 2.3 requests per minute
||m5.large (10 instances on average)
The price differences between the three options speak for themselves, even if we don’t include the infrastructure configuration costs. So the obvious choice here is the Lambda function.
* In order to calculate these prices, we used the information from the following websites:
So, we have chosen the infrastructure for our task. The next step is to think how this task should reach the front-end, where the users upload their images and create e-flyer generation processes. Surely, most of you would say that the obvious solution to this is Amazon Simple Queue Service (SQS). Generally speaking, this service is indeed perfect for the purpose, however, due to the specifics of our project, we’ve chosen another approach.
Our front-end uses Laravel, which in turn uses cron. The latter performs different tasks every minute, i.e. we have a resource that is working anyway. So we decided to simply add one more task to the current cron job, which in turn would directly invoke the Lambda function with the required parameters. This may seem like the wrong architecture solution to some of you, but we have a different opinion. Here are some of the reasons why we chose this approach for the project:
- Using Lambda function with SQS will prolong the application development process, the process of building and configuring the infrastructure, but in the end the result will be the same.
- When using cron to directly invoke the Lambda function, Laravel performs only one task, without caring what happens next, which generally means less development time.
- Generally speaking, the Lambda function shouldn’t have access to the database, but in our case this isn’t a problem. This function works on a single project, so it’s not intended to work with other databases, nor to be used with other external dependencies. So, in our case, this is a perfectly acceptable compromise.
So far, so good – we have chosen the infrastructure and the way of communication between the systems. We only have to write the Lambda function itself. We decided that the most appropriate way to do it is with Python. Why? Because there are great Python image processing libraries, the code is easy to write and maintain, and also easy to debug.
So from now on, the easiest way to proceed is to simply put in the code we need and let the magic happen. It sounds as easy as pie, but we had several issues to solve first:
- some of the libraries used by the generator will also be used by other Lambda functions;
- we had to provide a connection between the Lambda function and the SQL server, which is located on a separate EC2 instance, without any external access to it;
- we had to find a way to make the Lambda function use the images uploaded by the users, and to save them in a way which allows the front-end to access them.
We solved the first issue by creating a Lambda layer for storing the packages we need. Actually, this isn’t as simple as it seems, because if some of the libraries you use are dependent on the operating system they are working on, they must be compiled on this specific OS. And as Lambda functions use AWS Linux, there are two ways to compile such libraries:
Another thing worth noting is that when you use Lambda Layer(s), Python may not find the libraries you need. So, you should add a variable PYTHONPATH with value /opt/ to your Environment variables.
In regards to the problem with accessing the database - if your database is created in an Amazon Virtual Private Cloud (VPC) with no access from the Internet, you will also need to configure the access to your VPC from the settings of the Lambda function (you can check https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html for more details on how to do this).
So, the only thing that’s left is the issue with the access to the images, and we decided to use Amazon Simple Storage Service (S3). On the one hand, this would allow serving the images directly to the users’ browsers without causing any load on our servers, and on the other, we would have access to them from different applications without having to complicate the application code unnecessarily.
And finally, here are the exact steps to create a Lambda function with the AWS web console:
1. First, go to Lambda menu -> Functions -> Create a Function
2. Choose the programming language and the name of your function. In case you wish to use a specific role of the function, select ‘Choose or create an execution role’
3. Add the respective variables which will be used by the Lambda function code:
Choose edit environment variables from the page of the function
Add the respective variables from the page which will load
4. In order to connect to instances in some of the virtual private clouds (VPC) you use, you should add VPC and follow the instructions on the webpage.
5. Then, you may use a similar code to connect to the AWS resources database. To this end, we use Python packages pymysql and boto3.
MYSQL_HOST = str(os.getenv('MYSQL_HOST'))
MYSQL_USERNAME = str(os.getenv('MYSQL_USERNAME'))
MYSQL_PASSWORD = str(os.getenv('MYSQL_PASSWORD'))
MYSQL_DATABASE = str(os.getenv('MYSQL_DATABASE'))
MYSQL_PORT = int(os.getenv('MYSQL_PORT'))
AWS_ACCESS_KEY_ID = str(os.getenv('CUSTOM_AWS_ACCESS_KEY_ID'))
AWS_SECRET_ACCESS_KEY = str(os.getenv('CUSTOM_AWS_SECRET_ACCESS_KEY'))
AWS_DEFAULT_REGION = str(os.getenv('CUSTOM_AWS_DEFAULT_REGION'))
AWS_BUCKET = str(os.getenv('AWS_BUCKET'))
mysql = pymysql.connect(
s3 = boto3.client(
6. Adding a Layer
Whereas the boto3 package doesn’t require any further actions, the pymysql requires installation. So, if you are going to use it in more than one function, you may configure it as a Layer. In order to do that, use the button ‘Create layer’ from the Lambda menu -> Layers.
Then, in the page which will load, you should fill out the name of the Layer, the environment in which it will be used (in our case - Python 3.7), as well as a description and a license, if you wish. Do have in mind that packages larger than 50Mb have to be uploaded in S3 beforehand.
Then, from the function you configured, use the Layers button to add the layer you’ve just created:
7. And finally, here is how the new Lambda function looks like:
The example presented above is of course simplified and incomplete – after all, it’s intended to provide you with a general idea of how to generate images with AWS Lambda and the steps you need to follow.
In order to use your new function, you’ll need its ARN (Amazon Resource Name), which is located at the upper end of the page:
We should also note that the way to call the function will vary, depending on the environment in which you will use it.
Most certainly, there are other ways to execute this task. In our view, however, the way we did it could be described as an optimal and stable solution that meets the requirements of our project and minimizes the infrastructure costs. We use it in production for more than four months now, and we haven’t encountered any problems, even though the workload is constantly increasing, as we expected.
Last but not least, we haven’t reached the free tier limits for the AWS services used for the time being, so the generation of e-flyers in our project is still free.