Building a microsite generation and management system

What would you do, if you had to develop a number of microsites for a client? We faced this issue as a part of a larger project for a real estate portal, which has multiple modules to it. In one of the modules there is an administrative panel through which real estate agents enter certain information about the properties they offer – text, location, video and images, as well as contact information.

Using this data, we had to generate a number of personal websites for the real estate agents, where they can present the properties offered by them along with their contact information. And we were required to do that in a way, allowing the agents to use their own domains, and in case there aren’t such, to provide them with free subdomain URLs. So, from this point on, we had to decide on the following matters:

  • the general architecture of the app;
  • the design options;
  • how to manage the domains and the certificates of the websites;
  • how to connect all the above-mentioned parts of the project;
  • Amazon Lambda and how it was useful with all these tasks.

Some of the solutions to these issues were obvious, others required great deal of research and testing, and in the end we reached the following decisions:

1. Website generation

We’ll start off with the website generation, as almost all the other steps depend on it. We have two options with this – to use a programming language to create dynamic sites or to generate static web pages. Each of these approaches has its pros and cons, some of which we have summarized in the table below:

Dynamic website Static website
Immediate visualization of changes Yes No
Server load High Low
Development resources Low Medium
Flexibility High High, but more difficult for implementation
Price of hosting service High Low
Need to regenerate the site on every change No Yes
Server configuration difficulty Medium Low
Programming language version dependence High Low
Complexity for end users Low Low

When you look at the table above, the two approaches seem nearly equally good. Yet, we decided to use static web pages, which fits our client's needs completely and is more cost-effective. We will discuss this in more detail in the hosting services part below. And here, we will only mention that the costs associated with servicing the static sites are significantly lower.

The next step was choosing the static website generator. Of course, we could have developed it ourselves, but this would unnecessarily increase the project’s costs. So, we decided to find a ready-made solution.

We needed a generator that would meet two requirements: high generation speed with minimum resources, as well as ease of use and flexibility. And because of the first requirement, we set on two options– Jekyll and Hugo. That said, JavaScript-based generators are also great, but unfortunately they don’t allow for high generation speed with minimum resource usage.

So, after carefully going through the documentation of these two options and testing them, we came to the conclusion that the two platforms are actually very similar in terms of functionality. Yet, we chose Hugo because it uses less resources and offers better speed. That said, Hugo has one more notable advantage – it’s written in Go – a language we have good experience with and we enjoy using.

Subscribe to our newsletter

2. System architecture

After we got a clear idea on how to generate the websites, it was about time to ponder on the system architecture. Given that we weren’t restricted to any specific database or programming language for site processing and servicing, the situation wasn’t that complicated.

So, we had an Amazon EC2-based system consisting of a bunch of servers (test and product) hosting the main marketing website and the administration of the real estate agents, as well as separate instances (again test and product) servicing the databases.

And the most logical step from here on would be creating separate servers for the websites being generated, as we didn’t know their number and workload in advance. Some of you might ask why we didn't place the generated sites directly in Amazon S3. This approach might indeed seem better, but it involves some complications that prevented us from undertaking it at that stage. Here, we will mention some of the problems we faced when using S3:

  • If we use only S3 as a hosting service, we cannot use HTTPS/SSL;
  • In order to support HTTPS/SSL, we need to use the Amazon CloudFront service. The issue here is the generation of certificates, and more specifically the fact that part of the domains will be owned by clients of the platform. So, in order to make it as easy as possible for them, we only require users to point the IP address of the domain they are using to the generated site. However, in case we want to generate certificates for our clients, they will have to add further records to their DNS configurations, which might cause problems;
  • For each live website we have a corresponding test one, protected by a username and password known only to our client. This could also be achieved with CloudFront, but it requires additional DevOps work.

That said, we should note that the S3 is an absolutely adequate solution here, which also has its advantages and can be used in the future if it's justified. In order for the S3 solution to be effective from a technical and financial point of view, the generated static websites should start generating extremely large traffic that exceeds the current machine's resources.

3. Web designs

As it already became clear, we decided to use Hugo, which allows us to apply many different themes when generating the websites. Besides, Hugo allows for CSS and JS compilation, which would greatly facilitate the work of our front-end team. The only requirement to our front-end experts was to make the design work with static data. We managed to add certain dynamic features, such as basic filters in the real estate listings, but there are limitations if working with larger data volumes.

4. Domain management

As we already noted, the users of the platform are in charge of the domain management, because they actually own the domains. We did our best to make it as easy as possible for them, and the only thing they have to do is point the domain to the IP address of the generated site.

The SSL certificate generation task on the other hand is a bit more complicated. If one wishes to generate an SSL certificate, the domain has to be redirected, and the DNS records which provide us with information about it should be updated. It’s also possible that a domain which was pointed to the system at some point is now pointed away or expired, and we should be able to detect such cases and delete the SSL certificate for the domain in question. So, having all these conditions in mind, we’ve created a system for the domains and their SSL certificates, which will be discussed in detail below.

5. Connecting the system components

After we have planned out all the components of our system, and how they should operate, the only thing that’s left is to make them all work together. We've found an easy and effective way to do that by creating an app to manage the site generation, to watch out for (re)directing domains and the generation of their certificates. The individual components of this application are performed as cron tasks at different time intervals.

We know that a website has to be generated when a user makes changes that require site regeneration. Then, the real estate agents’ administration system makes the necessary entries in the database. The site generation application checks whether such data exists and, if available, generates the respective website. Domain tracking is a separate task, but it is not directly related to website generation. The only relevant thing is whether HTTP or HTTPS / SSL protocol will be used.

6. Amazon Lambda

We have managed to build a completely functional static site so far, but there is one more important thing left to be done – the contact form, which isn’t exactly static. Our original idea was to provide a secure and flexible solution, which doesn’t use a dynamic programming language, but rather a static form that has to send data for processing. We considered different possibilities and in the end decided to use Amazon Lambda so that we could ensure reliable operation and performance flexibility. So, our website would send data to the Lambda function, which would process it and save it to the database. Of course, to protect users from annoying spam, that would certainly be received, we added Google reCAPTCHA. This approach turned out to be functioning and we used it without any problems. It occurred to us, however, that if the site traffic and the use of the contact form increases significantly over time, the Lambda function usage will generate higher cost rates by Amazon. Having this in mind, although we haven’t experienced any problems with its functionality, we decided to connect the static form directly to the backend of the platform where the necessary information is submitted. So, we stopped using Lambda, which will cut a lot of unnecessary costs in the long run.

This was an overview of the project's architecture, our choice of applications and how they are connected. We will now go a little deeper into the technical details of the project and the whole process, but before that we are going to give you a short introduction of the different modules of the whole system:

Let’s start with the central system unit, namely the application created by us. It basically monitors requests for generating sites, generates such where necessary, and also oversees the domains - generates, renews and, where necessary, deletes their certificates.

Before we started with the application development, we  had to decide what its architecture would be, which in turn was related to the programming language in which it would be written. As already mentioned, we are looking for a high-speed solution with minimal resources. In this case we also need to take into account another two factors – easy debugging and code extensions. Perhaps the most logical choice was to use Go, but we decided that the speed we would achieve would not compensate for the slower development process and the more difficult maintenance afterwards. That's why we chose another popular language that is widely used for process management tools - Python. As mentioned before, we have divided the application into several modules:

In brief, the application checks for new site requests every minute. If there is one, the application creates a unique directory for the specific user, the data for the respective site gets extracted and processed, and finally a toml configuration file used by Hugo is generated, as well all necessary markdown files based on which Hugo will create all required pages. While making these configurations we are of course taking into account the user-defined site theme. After all files are ready, Hugo generates the new site and saves it in the server file system, then it passes the control to a module which performs a domain status check. Depending on the status (whether it is available or not), a new certificate and the corresponding nginx configuration are being generated.

We obtain the SSL certificates from the automated external certificate authority Let’s Encrypt at no additional costs for the customers. To make things even more convenient, we also use the already configured Let's Encrypt module compatible with nginx, which shortens the development time.

Meanwhile, we check the already generated sites every five minutes for the domains that have not been pointed to the platform in the last check. If at this point the domain is properly pointed, then we generate the respective certificates and regenerate the site so that the links in it are correct. After that we adjust the new nginx configuration. Besides, all existing domains are being checked on a daily basis in order to make sure they are routed correctly. If for some reason they are not, we delete the old certificates to prevent any further issues and get new ones from Let’s Encrypt.

Every single operation is of course recorded in the corresponding application log and in case an error occurs, our team receives a direct message.

We figured that the server we use for site creation has the capacity to serve the generated websites. Moreover, we have carefully configured the resource cashing. In order to keep a better architecture structure, large resources, such as images and videos are being stored on AWS S3. This way, we guarantee better server performance and the ability to process heavy loads.

This is the main application we use for site creation. As it already became clear, we rely on nginx as a web server. And we believe this is a very good solution, especially considering that the served files are static. These services are running on an Amazon EC2 instance, whose workload is being constantly monitored so that we can react immediately and change its parameters if needed. That said, the system is operating for several months now and no changes have been required so far. We use MySQL database, and the real estate agents’ main backend is written in PHP (using the Laravel framework).

In conclusion, we can say that the system we have built works flawlessly. Its structure is as simple as possible, which makes it easy to maintain and upgrade - assuming that the load on the generated sites increases significantly and the EC2 instance isn’t appropriate anymore, the application settings can easily be changed so that the generated sites are being saved directly on Amazon S3. Such a development of events would certainly require additional work, but it is not related to the way our application works, but rather to the way Amazon CloudFront functions. From an architectural point of view, in this situation we will simply replace the module handling nginx configurations with a new module that will perform the necessary checks and configurations in CloudFront.