Deploying and managing web applications

An article, posted 3 months ago filed in deployment, linux, AWS, cloud, hosting, discussion, unix, yaml, virtualisation, docker, local, infrastructure, automation, security & standardization.

Traditional software is downloaded, installed, then run. With web applications it is different. These are built, then pushed to a remote server and then the interface of that application is presented to the screen of the users via web-technologies like HTML, CSS & JavaScript, in the past sometimes assisted by Flash, these days using WASM. Besides that web applications need databases, storage, cache-systems, maybe a search solution, and sometimes more dedicated tools. From the end-user’s perspective, it became a lot easier (although more restrictive) to access these tools, but getting it up and running got harder.

In the old days web software was deployed by uploading software via FTP, in a folder that was then read by a web server, and then presented to the user of the web-application. I’ve also deployed compiled .war files manually via a tomcat web-interface. Databases, storage, these were all pretty much managed by hand, sometimes even requiring physically adding a new drive or server rack.

The problem with this approach, though conceptually simple, is that deploying manually is error prone. One forgets about files, configurations, put files in the wrong directory, or forget to make changes to the database in order to make the latest change work.

The first step is hence automation of the manual steps. I will discuss a few high level dimensions that can be used to discuss solutions. I’d like to stress these are in practice scales and approaches can be mixed.

Imperative vs declarative deployment

When commands are executed consecutively this is called imperative.

At the lowest level, this is error prone: if a command states update this configuration file, but the configuration file was moved in a newer version of the software used, this step would fail.

To mitigate the issue of working with changing environments, commands can be abstracted, creating higher level commands where possible variations different environments is dealt with in a shared code library.

The ultimate abstraction is no longer defining what we do to get to a certain desired state, but declaring what we want. This is called ‘declarative’. An operating system like NixOS takes this to the max, allowing one to carefully define operating contexts.

The disadvantage of fully abstracted or declarative is that it reduces flexibility. Additionally, it may be harder to get support on these systems as these may be less popular resulting in more time lost in debugging.

Bare metal versus virtualisation

Virtualisation is both a nice abstraction and a way to contain processes on a same machine. But virtualisation comes at a performance cost. If you can utilise the full machine running operations on bare metal may be optimal. A big advantage of virtualisation, however, is that it gives a layer of abstraction that makes it easier to replicate a certain situation.

Virtualisation may be hosted on own hard ware or shared hard ware. When the hardware is not yours, you may be affected by other parties.

Virtualisation may happen on both hardware (fast) and software level (slower). While slower on software level, it could allow for even higher portability of the software.

A special type of ‘virtualisation’ is containerisation, which has become a popular concept with Docker. This is not full virtualisation, as a lot is handled by the main OS, which makes it perform much better than traditional virtualisation in terms of memory and performance. When considering virtualisation this is these days a popular compromise.

Centrally orchestrated or from a local machine?

One of the risks when deploying something automated, especially when you are a larger organisation, is that the machine deploying the software is compromised. Be it because of the operator or external factors, including viruses, malware and unwanted intruders. If this is a risk, the deployment can be relegated to another centrally managed system, which then executes the changes, in a transparent and reviewable manner. The disadvantage of such centrally managed system is that when it fails (or worst case compromised) it might be even harder to recover. Management of yet another system is not easy, but depending on the threat level may be worth the effort.

A popular way of centralising is having continuous integration (CI) pipelines building and deploying software. Both definition and execution of these pipelines can be monitored and reviewed.

The software or the entire infrastructure

Automation of the application deployment is typically the first step made. Then comes the OS. A step higher is the full infrastructure. It used to be strange to think about infrastructure as open for automation, but since everything is virtualised even dedicated database servers, firewalls and routing logic can be automated, even dynamically scaling (up and down) with demand.

Especially with more complicated software, with many moving parts, having some tooling that allows easy replication of the entire system can be of great support to QA. But perhaps the complexity of the architecture is the actual problem that needs to be toned down.

And: More or less automation

It should be emphasised that not everything needs to be automated. There is a cost to automation. As ever with automation, it is worth to evaluate the costs.

If your server landscape doesn’t change, it may not be necessary to invest heavily in orchestration of the entire infrastructure. If adjusting needs costs less than a week per year, but creating the right automation takes more than a month it may not be worth automating it.

Benefits of automation:

fast recovery (even when unplanned)
faster upgrades
reduce mistakes
consistency
increased flexibility in terms of scaling the solution

Cost of automation:

reduced flexibility in terms of execution
tooling might age; automation tooling is another software stack to manage
- tooling needs to be maintained
time to set it up costs at least a single manual deployment

Worse than manual is being deeply invested in an automated flow, not knowing the actual underlying technologies that may offer features not exposed by the tooling, while the automation tool is also not well supported. Hence it is important to make well informed decisions, and choose reduced levels of automation and potential scalability over vendor lock-ins. You may decide to dump your linux system administrator, and get a Certified AWS solution architect in return. While the first might be a bit slower to adapt your systems in a surge of new customers, the latter will keep you stuck in a world of proprietary technologies that are owned by an US-company, which is additionally putting your customers data at risk.

Conclusion

Every company has different requirements. But be reminded that there is no single desired end state. Having everything running on just someone else’s computers (the cloud) may save headaches about hardware upgrades. But the physical distance might introduce new problems if large amount of data need to be transported back and forth, or you want to be able to tweak the hardware configuration. Spending months of automating something that used to cost less than a week of manual work per year is probably not worth it, unless other advantages are more important, but be reminded about the costs other than upfront time investment. Hence, my approach to these questions is to first start with writing down some of the main pain points experienced by the existing situation, consider future concerns, but be wary of thinking there is a single right end-state (‘everything server-less’, ‘no code’, ‘all defined in YAML’). More important is to consider: What do we have, what works, where do we want to go, and stay agile.

Finally, as time is scarce, next to automating deployment consider automating monitoring and configuration reviews; think of uptime, certificates, patch levels, backup & restore, performance, errors… perhaps even before fully completing automating the deployment, as it may be both another way of managing risks imposed by manual deployment, as well as allow you to keep track of more or less successful automated deployments.