Building VPC with Terraform in Amazon AWS


Terraform is a tool for automating infrastructure management. It can be used for a simple task like managing single application instance or more complex ones like managing entire datacenter or virtual cloud. The infrastructure Terraform can manage includes low-level components such as compute instances, storage, and networking, as well as high-level components such as DNS entries, SaaS features and others. It is a great tool to have in a DevOps environment and I find it very powerful but simple to use when it comes to managing infrastructure as a code (IaaC). And the best thing about it is the support for various platforms and providers like AWS, Digital Ocean, OpenStack, Microsoft Azure, Google Cloud etc. meaning you get to use the same tool to manage your infrastructure on any of these cloud providers. See the Providers page for full list.

Terraform uses its own domain-specific language (DSL) called the Hashicorp Configuration Language (HCL): a fully JSON-compatible language for describing infrastructure as code. Configuration files created by HCL are used to describe the components of the infrastructure we want to build and manage. It generates an execution plan describing what it will do to reach the desired state, and then executes it to build the terraform.tfstate file by default. This state file is extremely important; it maps various resource metadata to actual resource IDs so that Terraform knows what it is managing. This file must be saved and distributed to anyone who might run Terraform against the very VPC infrastructure we created so storing this in GitHub repository is the best way to go in order to share a project.

To install terraform follow the simple steps from the install web page Getting Started

Notes before we start

First let me mention that the below code has been adjusted to work with the latest Terraform and has been successfully tested with version 0.7.4. It has been running for long time now and been used many times to create production VPC’s in AWS. It’s been based on CloudFormation templates I’ve written for the same purpose at some point in 2014 during the quest of converting our infrastructure into code.

What is this going to do is:

  • Create a multi tier VPC (Virtual Private Cloud) in specific AWS region
  • Create 2 Private and 1 Public Subnet in the specific VPC CIDR
  • Create Routing Tables and attach them to the appropriate subnets
  • Create a NAT instance with ASG (Auto Scaling Group) to serve as default gateway for the private subnets
  • Create Security Groups to use with EC2 instances we create
  • Create SNS notifications for Auto Scaling events

This is a step-by-step walk through, the source code will be made available at some point.

We will need an SSH key and SSL certificate (for ELB) uploaded to our AWS account and awscli tool installed on the machine we are running terraform before we start.

Building Infrastructure

After setting up the binaries we create an empty directory that will hold the new project. First thing we do is tell terraform which provider we are going to use. Since we are building Amazon AWS infrastructure we create a .tfvars file with our AWS IAM API credentials. For example, provider-credentials.tfvars with the following content:

We make sure the API credentials are for user that has full permissions to create, read and destroy infrastructure in our AWS account. Check the IAM user and its roles to confirm this is the case.

Then we create a .tf file where we create our first resource called provider. Lets name the file provider-config.tf and put the following content:

for our AWS provider type.

Then we create a .tf file vpc_environment.tf where we put all essential variables needed to build the VPC, like VPC CIDR, AWS zone and regions, default EC2 instance type and the ssh key and other AWS related parameters:

I have created most of the variables as generic and then passing on their values via separate .tfvars file vpc_environment.tfvars:

Terraform does not support (yet) interpolation by referencing another variable in a variable name (see Terraform issue #2727) nor usage of an array as an element of a map. These are couple of shortcomings but If you have used AWS’s CloudFormation you would have faced similar “issues”. After all these tools are not really a programming language so we have to accept them as they are and try to make the best of it.

We can see I have separated the provider stuff from the rest of it including the resource so I can easily share my project without exposing sensitive data. For example I can create GitHub repository out of my project directory and put the provider-credentials.tfvariables file in .gitignore so it never gets accidentally uploaded.

Now is time to do the first test. After substituting all values in <> with real ones we run:

inside the directory and check the output. If this goes without any errors then we can proceed to next step, otherwise we have to go back and fix the errors terraform has printed out. To apply the planned changes then we run:

but it’s too early for that at this stage since we have nothing to apply yet.

We can start creating resources now, starting with a VPC, subnets and IGW (Internet Gateway). We want our VPC to be created in a region with 3 AZ’s (Availability Zones) so we can spread our future instance nicely for HA. We create a new .tf file vpc.tf:

This will create a VPC for us with 3 sets of subnets, 2 private and 1 public (meaning will have the IGW as default gateway). For the private subnets we need to create a NAT instance to be used as internet gateway. We can create a new .tf file vpc_nat_instance.tf lets say where we create the resource:

We create the NAT instance in a Auto Scaling group since being a vital part of the infrastructure we want it to be highly available. This means that in case of a failure, the ASG will launch a new one. This instance then needs to configure it self as a gateway for the public subnets for which we create and attach to it an IAM role with specific permissions. Lastly the instance will use the userdata_nat_asg.sh file (see the variables file) given to it via user-data to setup the routing for the private subnets. The scipt is given below:

It configures the firewall and the NAT rules and executes the ha-nat-terraform.sh script fetched from a S3 bucket.

In the same time we create Security Groups, or instance firewalls in AWS terms, to attach to the subnets and the NAT instance we are going to create:

Next we need to sort out the VPC routing, create routing tables and associate them with the subnets. Create a new file vpc_routing_tables.tf:

To wrap it up I would like to receive some notifications in case of Autoscaling events so we create vpc_notifications.tf file:

As we further build our infrastructure we will use more Auto Scaling configurations and we can add those to the above resource under group_names.

At the end, some outputs we can use if needed:

At the end we run:

to test and create the plan and then:

to create our VPC. When finished we can destroy the infrastructure:

Adding ELB to the mix

Since we are building highly available infrastructure we are going to need a public ELB to put our application servers behind it. Under assumption that the app is listening on port 8080 we can add the following:

to our vpc_environment.tf file and set the values by putting:

in our vpc_environment.tfvars file. Now we can create the ELB by creating the vpc_elb.tf file with following content:

The ELB does not support redirections so the app needs to deal with redirecting users from port 80/8080 to 443 for fully secure SSL operation.

Finally, the Security Group for the ELB in vpc_security.tf file:

and we are done.

To get some outputs we are interested in from the ELB resource we can add this:

to the outputs.tf file.

Conclusion

As we add more infrastructure to the VPC we can make some improvements to the above code by creating modules for the common tasks like Autoscaling Groups and Launch Configurations, ELB’s, IAM Profiles etc., see Creating Modules for details.

Not everything can be done this way though. For example the repetitive code like:

is a great candidate for a module except Terraform does not (yet) support count parameter inside modules, see Support issue #953

It’s graphing feature might come handy in obtaining a logical diagram of the infrastructure we are creating:

To generate this run:

And ofcourse there is Atlas from HashiCorp, a paid DevOps Infrastructure Suite that provides collaboration, validation and automation features if professional support for those who need it.

Apart from couple of shortcomings mentioned, Terraform is really a powerful tool for creating and managing infrastructure. With its Templates and Provisioners it lays the foundation for other CM and automation tools like Ansible, which is our CM (Configuration Manager) of choice, to deploy systems in an infrastructure environment.

(Visited 270 times, 1 visits today)