When it comes to deploying your application, we've been advocating for a NoOps philosophy for some time now. What we mean by this is to encourage your team to focus their efforts on developing features for your application, rather than configuring containers or fiddling with resource management for the hosted product. But perhaps, for one reason or another, the option to deploy onto a PaaS (platform as a service) is unavailable to you. Maybe your burn rate is tight, and you believe you'll save some money setting up your own box rather than paying for a platform; maybe your team has members already well-versed in operations engineering; or maybe you don't see what the big deal of being able to
git push <remote> master your code is, since a bot will end up doing the heavy lifting of getting code onto servers anyway.
If you aren't deploying to a PaaS, there are many different parts to worry about. To continue our evangelism of NoOps, this post will list out the considerations and features that we believe are essential for the goal of keeping deployments simple. For the sake of simplicity, we'll imagine a fairly common scenario where we've built a containerized application that's set up with a pipeline to take it from AWS' CodeDeploy to Amazon Elastic Container Service (ECS).
Going from a branch to a server
Requiring your application's
master branch to always be in a production-ready, deployable state is the bare minimum. Yet this can actually be a fairly challenging endeavour to get your application up and running. Amazon's own documentation obfuscates this complexity, which is as follows:
Configuring the VPC
By default, your entire application is accessible to the world. That means that every AWS service you use is publicly exposed and can be susceptible to attack. In order to secure your application's resources by restricting network access, you'll need to create a VPC with subnets to define the IP addresses that your services can communicate over. In order to maintain a high level of availability, your application should also be available in multiple geographic regions, which also means that separate VPCs need to be configured for every region. If one region fails, another one can continue to serve your application. How many regions you need, properly setting up the public and private subnets, the NAT gateway, and the routing table, are all tasks left entirely up to you.
Establishing a load balancer
Load balancers sit on the public subnet and are used to control the flow of traffic to the servers hosting your application. If one server is oversaturated with requests, it's the load balancer's duty to divert incoming traffic to another machine with more availability. Amazon's Elastic Load Balancer offers three types—Application Load Balancers, Network Load Balancers, and Classic Load Balancers—and you'll need to identify which one is right for you. Setting up a load balancer first is a bit counterintuitive, since you don't even know how popular your application will be, yet it's a necessary step that ensures your application is accessible to the outside world.
Setting up IAM roles and permissions
Before configuring anything else, you must now sit down and identify precisely which actors will have access to what resources. As J Cole Morrison puts it, there's "an overwhelming number of possible actions, principals, resources and conditions" to consider. Setting up the Identity and Access Management (IAM) definitions for your application tells Amazon which components can communicate with EC2, CodeDeploy, the load balancer, and any other services your application might need. Roles can be created to define permissions for access, rather than granting individual permissions for each user. For example, you can designate a role that grants write access to S3, and apply it to the users which need that capability.
The default service roles which Amazon offers are often too permissive, which means your Ops team will need to audit precisely what roles your application does need, so that you don't accidentally expose anything sensitive. We recommend documenting the permissions necessary for every service—both permitted and denied—and then designing roles from those groupings. After that, make sure you've got buy-in from your Security and/or Ops team for each of these roles.
Choosing a machine size
The flexibility of an IaaS (infrastructure as a service) grants you the ability to choose how much CPU and memory is available to you, and in doing so, you're able to control your monthly spending by only paying for what you need. Again, though, this assumes that you are cognizant of the infrastructure requirements of your application before it's put into production.
You'll likely need to stand up a separate monitoring service to ensure that your uptime meets some satisfactory threshold and that you're not overutilizing the resources you thought would be adequate when your application first launched. Amazon does provide a service called CloudWatch, but its proprietary nature will continue to tie you up with more vendor lock-in. It also requires quite a bit of domain knowledge, as you will need to specifically configure alarms and alerts, log grouping, and the like.
Automating your infrastructure
With a tool like Terraform, you can try and automate all of the above. This, too, is somewhat of an obstacle, as it requires learning both a new configuration language as well as understanding how to script something unique for your IaaS. Modules can abstract the physical items necessary for your infrastructure, but they still require a dedicated vigilance on maintenance.
Testing your automation
Once you've got a functional build-and-deploy pipeline working, congratulations! Next, you'll need to test your Terraform steps and plans, particularly since the APIs for the external IaaS configurations you rely on could change at any moment. Whatever automation you've come up with will need its own test suite, to ensure that it remains functional and correct.
While all of the above makes it easier to deploy your code instantaneously, it is often beneficial to define ECS tasks to automate additional work. You will definitely need a task which serves the application, but you can also include operations like automatically running a database migration or configuring your logging.
Setting up multiple environments
Some time after you've been pushing your application straight to a production environment, you may find that you need to establish a staging environment as well. You'll need to redefine many of the elements above, taking care to change the values where they make sense—for example, with regards to a smaller machine image or different IP addresses. You'll also be maintaining two separate architectures, which comes with its own complexities in ensuring that changes to the two environments are synchronized.
A sufficiently complex application is rarely just a single code base. It may also be communicating to other databases, key-value stores, or storage systems. If one day you discover that Postgres has a feature that your MySQL database isn't effective at, you'll need to set everything up all over again to set up the IP address and the IAM roles of this new service—on top of the complexity of migrating everything over.
You've set everything up and for many years, your process works with only minimal interference to your feature release cadence. Now, you want to move off of AWS, and you find that you'll need to reconsider your operational strategy all over again.
For the majority of applications and service deployments, we continue to believe that your money is better spent paying experts on a PaaS to think about these details for you. If you're in an early phase of building your application, your time is better spent building impactful features than acquiring all the domain knowledge that a safe and scalable AWS deployment requires.
If we've persuaded you of the simplicity of NoOps, we'll go over in a subsequent post some information that will help you relay these principles to any leadership that also needs convincing!