Troubleshoot Terraform

17min
|
Terraform

The primary method for interacting with Terraform is the HashiCorp Configuration Language (HCL). When Terraform encounters an error in your configuration, it will report an error including line numbers and the type of issue found in the configuration.

In this tutorial, you will clone a repository with a broken Terraform configuration to deploy an EC2 instance and underlying networking. This configuration contains intentional errors to introduce you to troubleshooting the configuration language.

Prerequisites

For this tutorial, you will need:

Terraform 0.15.2+ installed locally
an AWS account with credentials configured for Terraform

Review the Terraform troubleshooting model

The Terraform application layers

There are four potential types of issues that you could experience with Terraform: language, state, core, and provider errors. Starting from the type of error closest to the user:

Language errors: The primary interface for Terraform is the HashiCorp Configuration Language (HCL), a declarative configuration language. The Terraform core application interprets the configuration language. When Terraform encounters a syntax error in your configuration, it prints out the line numbers and an explanation of the error.
State errors: The Terraform state file stores information on provisioned resources. It maps resources to your configuration and tracks all associated metadata. If state is out of sync, Terraform may destroy or change your existing resources. After you rule out configuration errors, review your state. Ensure your configuration is in sync by refreshing, importing, or replacing resources.
Core errors: The Terraform core application contains all the logic for operations. It interprets your configuration, manages your state file, constructs the resource dependency graph, and communicates with provider plugins. Errors produced at this level may be a bug. Later in this tutorial, you will learn best practices for opening a GitHub issue for the core development team.
Provider errors: The provider plugins handle authentication, API calls, and mapping resources to services. Later in this tutorial, you will learn best practices for opening a GitHub issue for the provider development team.

Clone the GitHub example repository

Clone the example GitHub repository.

$ git clone https://github.com/hashicorp-education/learn-terraform-troubleshooting

Change into the repository directory.

$ cd learn-terraform-troubleshooting

Open the main.tf file in your file editor.

This configuration includes intentional problems and you may find some issues immediately, but do not edit your configuration yet. In the next section, you will run the format command to identify errors.

Format the configuration

The format command scans the current directory for configuration files and rewrites your Terraform configuration files to the recommended format.

In your terminal, run the terraform fmt command. This command returns two errors informing you of an invalid character and an invalid expression. Both errors occur on line 46.

$ terraform fmt
terraform.tfvars
╷
│ Error: Invalid character
│
│   on main.tf line 52, in resource "aws_instance" "web_app":
│   52:     Name = $var.name-learn
│
│ This character is not used within the language.
╵

╷
│ Error: Invalid expression
│
│   on main.tf line 52, in resource "aws_instance" "web_app":
│   52:     Name = $var.name-learn
│
│ Expected the start of an expression, but found an invalid expression token.
╵

Terraform found problems that it could not parse, and output errors so you can fix the configuration manually.

Correct a variable interpolation error

In main.tf, find the aws_instance.web_app resource's tags attribute on line 46. The Name tag is incorrectly formatted. It is trying to append a string to the name input variable without the necessary interpolation syntax. Update the Name attribute with the correct syntax.

##...
   tags = {
-     Name = $var.name-learn
+     Name = "${var.name}-learn"
   }
 }
##...

Run terraform fmt again to ensure your variable name meets the formatting requirements.

$ terraform fmt
main.tf

You resolved the invalid character and expression errors, and they don't appear again. Terraform parses "${var.name}-learn" as your variable name in the interpolation shorthand with the hardcoded -learn string appended to form a custom value.

Validate your configuration

terraform fmt only parses your HCL for interpolation errors or malformed resource definitions, which is why you should use terraform validate after formatting your configuration to check your configuration in the context of the providers' expectations.

Initialize your Terraform directory to download the providers that your configuration requires.

$ terraform init
Initializing the backend...

Initializing provider plugins...
- Finding hashicorp/aws versions matching ">= 3.24.1"...
- Finding latest version of hashicorp/http...
- Installing hashicorp/aws v5.56.1...
- Installed hashicorp/aws v5.56.1 (signed by HashiCorp)
- Installing hashicorp/http v3.4.3...
- Installed hashicorp/http v3.4.3 (signed by HashiCorp)

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Run terraform validate in your terminal. The output contains a cycle error that highlights a mutual dependency between two security group resources.

$ terraform validate
╷
│ Error: Cycle: aws_security_group.sg_ping, aws_security_group.sg_8080
│
│
╵

Cycle errors are instances of circular logic in the Terraform dependency tree. Terraform analyzes the dependencies between resources in your infrastructure configuration to determine the order to perform your operations.

In the next section, you will correct this dependency graph error.

Correct a cycle error

Your aws_security_group resources reference one another in their security_groups attributes. AWS cannot create the security groups because their configurations each reference the other group, which would not exist yet.

Remove the mutually dependent security group rules in your configuration, leaving the two group resources without ingress attributes.

 resource "aws_security_group" "sg_ping" {
   name = "Allow Ping"

-  ingress {
-    from_port       = -1
-    to_port         = -1
-    protocol        = "icmp"
-    security_groups = [aws_security_group.sg_8080.id]
-  }
 }

 resource "aws_security_group" "sg_8080" {
   name = "Allow 8080"

-  ingress {
-    from_port       = 8080
-    to_port         = 8080
-    protocol        = "tcp"
-    security_groups = [aws_security_group.sg_ping.id]
-  }
  // connectivity to ubuntu mirrors is required to run `apt-get update` and `apt-get install apache2`
   egress {
     from_port   = 0
     to_port     = 0
     protocol    = "-1"
     cidr_blocks = ["0.0.0.0/0"]
   }
 }

Instead of including the rules in the aws_security_group configuration, use the aws_security_group_rule resource and reference the security group IDs instead. This avoids a cycle error because the provider will have AWS create both of the aws_security_group resources first, without interdependent rules. It will create the rules next, and attach the rules to the groups last.

Add the new, independent rule resource configurations to main.tf.

resource "aws_security_group_rule" "allow_ping" {
    type = "ingress"
    from_port = -1
    to_port = -1
    protocol = "icmp"
    security_group_id = aws_security_group.sg_ping.id
    source_security_group_id = aws_security_group.sg_8080.id
}

resource "aws_security_group_rule" "allow_8080" {
    type = "ingress"
    from_port = 80
    to_port = 80
    protocol = "tcp"
    security_group_id = aws_security_group.sg_8080.id
    source_security_group_id = aws_security_group.sg_ping.id
}

Terraform does not continue validating once it catches an error. Run the terraform validate command to catch new errors: an invalid reference a the for_each attribute because of a splat expression (*) in its value.

$ terraform validate
╷
│ Error: Invalid reference
│
│   on main.tf line 39, in resource "aws_instance" "web_app":
│   39:   for_each               = aws_security_group.*.id
│
│ A reference to a resource type must be followed by at least one attribute
│ access, specifying the resource name.
╵
╷
│ Error: Invalid "each" attribute
│
│   on main.tf line 42, in resource "aws_instance" "web_app":
│   42:   vpc_security_group_ids = [each.id]
│
│ The "each" object does not have an attribute named "id". The supported
│ attributes are each.key and each.value, the current key and value pair of the
│ "for_each" attribute set.
╵

The each attribute in the vpc_security_group_ids cannot return the IDs because of the for_each error above it. Terraform did not return any security group IDs, so the each object is invalid.

Review the construction of the instance resource.

resource "aws_instance" "web_app" {
  for_each               = aws_security_group.*.id
  ami                    = data.aws_ami.ubuntu.id
  instance_type          = "t2.micro"
  vpc_security_group_ids = [each.id]
  user_data              = <<-EOF
            #!/bin/bash
            apt-get update
            apt-get install -y apache2
            sed -i -e 's/80/8080/' /etc/apache2/ports.conf
            echo "Hello World" > /var/www/html/index.html
            systemctl restart apache2
            EOF
  tags = {
    Name = "${var.name}-learn"
  }
}

Terraform cannot automatically convert types without additional functions.

In the next section, you will correct this expression and for_each error.

Correct a `for_each` error

Terraform's for_each attribute allows you to create a set of similar resources based on the criteria you define.

In this example, you need to create a set of similar instances, each assigned to a different security group. Terraform cannot parse aws_security_group.*.id in this attribute because the splat expression (*) only interpolates list types, while the for_each attribute is reserved for map types. A local value can return a map type.

In main.tf, on line 44, replace the value of the for_each attribute with a local value. On line 47, replace the vpc_security_group_ids value with the value from the for_each attribute. Finally, update the tags attribute to give each instance a unique name.

resource "aws_instance" "web_app" {
-  for_each               = aws_security_group.*.id
+  for_each               = local.security_groups
   ami                    = data.aws_ami.ubuntu.id
   instance_type          = "t2.micro"
-  vpc_security_group_ids = [each.id]
+  vpc_security_group_ids = [each.value]
   user_data              = <<-EOF
               #!/bin/bash
               apt-get update
               apt-get install -y apache2
               sed -i -e 's/80/8080/' /etc/apache2/ports.conf
               echo "Hello World" > /var/www/html/index.html
               systemctl restart apache2
               EOF
   tags = {
-   Name = "${var.name}-learn"
+   Name = "${var.name}-learn-${each.key}"
   }
 }
}

Define the local value in your main.tf file. This converts the list of security groups to a map.

locals {
  security_groups = {
    sg_ping   = aws_security_group.sg_ping.id,
    sg_8080   = aws_security_group.sg_8080.id,
  }
}

After editing your configuration files, they may not be formatted correctly. Format your configuration.

$ terraform fmt
main.tf

Your next terraform validate operation will produce errors output errors because of the for_each value you corrected. Your outputs do not capture the multiple instances in the aws_instance.web_app resources.

Validate your configuration to return the output errors.

$ terraform validate
╷
│ Error: Missing resource instance key
│
│   on outputs.tf line 6, in output "instance_id":
│    6:   value       = aws_instance.web_app.id
│
│ Because aws_instance.web_app has "for_each" set, its attributes must be
│ accessed on specific instances.
│
│ For example, to correlate with indices of a referring resource, use:
│     aws_instance.web_app[each.key]
╵
╷
│ Error: Missing resource instance key
│
│   on outputs.tf line 11, in output "instance_public_ip":
│   11:   value       = aws_instance.web_app.public_ip
│
│ Because aws_instance.web_app has "for_each" set, its attributes must be
│ accessed on specific instances.
│
│ For example, to correlate with indices of a referring resource, use:
│     aws_instance.web_app[each.key]
╵
╷
│ Error: Missing resource instance key
│
│   on outputs.tf line 16, in output "instance_name":
│   16:   value       = aws_instance.web_app.tags
│
│ Because aws_instance.web_app has "for_each" set, its attributes must be
│ accessed on specific instances.
│
│ For example, to correlate with indices of a referring resource, use:
│     aws_instance.web_app[each.key]
╵

In the next section, you will correct these errors by implementing a for expression to define outputs with lists of your instance IDs, IP addresses, and names.

Correct your outputs to return all values

To correct your outputs, you need the for expression to capture the elements of the multiple resources.

The for expression captures all of the elements of aws_instance.web_app in a temporary variable called instance. Then, Terraform returns all of the specified values of the instance elements. In this example, instance.id, instance.public_ip, and instance.tags.Name return every matching key value for each instance you created.

Open outputs.tf and update the output values with the for expression.

 output "instance_id" {
   description = "ID of the EC2 instance"
-   value       = aws_instance.web_app.id
+   value       = [for instance in aws_instance.web_app: instance.id]
 }

 output "instance_public_ip" {
   description = "Public IP address of the EC2 instance"
-   value       = aws_instance.web_app.public_ip
+   value       = [for instance in aws_instance.web_app: instance.public_ip]
 }

output "instance_name" {
   description = "Tags of the EC2 instance"
-  value        = aws_instance.web_app.tags
+  value        = [for instance in aws_instance.web_app: instance.tags.Name]
}

Format your configuration.

$ terraform fmt
outputs.tf

Validate your configuration

$ terraform validate
Success! The configuration is valid.

Apply your changes

Now that you have corrected the Terraform configuration, run terraform apply to create your resources. Enter yes when prompted to confirm your changes.

$ terraform apply
data.http.myip: Reading...
data.http.myip: Read complete after 0s [id=http://ipv4.icanhazip.com]
data.aws_ami.ubuntu: Reading...
data.aws_ami.ubuntu: Read complete after 0s [id=ami-01943ba6b3c809b0e]

Terraform used the selected providers to generate the following execution plan.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_instance.web_app["sg_8080"] will be created
  + resource "aws_instance" "web_app" {

## ...

Plan: 8 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + instance_id        = [
      + (known after apply),
      + (known after apply),
    ]
  + instance_name      = [
      + "terraform-learn-sg_8080",
      + "terraform-learn-sg_ping",
    ]
  + instance_public_ip = [
      + (known after apply),
      + (known after apply),
    ]

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

## ...

Apply complete! Resources: 8 added, 0 changed, 0 destroyed.

Outputs:

instance_id = [
  "i-08cac1508a6baab3b",
  "i-056c2e5bc7dba098f",
]
instance_name = [
  "terraform-learn-sg_8080",
  "terraform-learn-sg_ping",
]
instance_public_ip = [
  "52.14.220.20",
  "18.224.3.102",
]

Bug reporting best practices

You may experience errors due to provider or application issues. Once you eliminate the possibility of language misconfiguration, version mismatching, or state discrepancies, consider bringing your issue to the core Terraform team or Terraform provider community as a bug report.

To provide the development team or the community working on your issue with full context, here are some best practices for opening up a GitHub issue.

Confirm versioning

Confirm the versions of the providers you are using and the version of Terraform you have in your environment. To confirm your provider and Terraform versions, run the version command.

$ terraform version
Terraform v1.8.3
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v5.56.1
+ provider registry.terraform.io/hashicorp/http v3.4.3

Your version of Terraform is out of date! The latest version
is 1.9.0. You can update by downloading from https://www.terraform.io/downloads.html

You can also validate that you are using the correct and most recent versions of your providers before reporting a bug. If your lock file specifies an older version, consider updating your providers and attempting to run your operation again.

Enable Terraform logging

Terraform 0.15 and later allow you to generate logs from the Terraform provider and the core application separately. The Terraform development team needs the core logs for your attempted operation to troubleshoot core-related errors. To enable core logging, set the TF_LOG_CORE environment variable to the appropriate log level. For bug reports, you should use the TRACE level.

$ export TF_LOG_CORE=TRACE

TRACE provides the highest level of logging and contains all the information the development teams need. There are other logging levels, but are typically reserved for developers looking for specific information.

You can also generate provider logs by setting the TF_LOG_PROVIDER environment variable. By including these in your bug reports, the provider development team can reproduce and debug provider specific errors.

$ export TF_LOG_PROVIDER=TRACE

Once you have configured your logging, set the path for your error logs as an environment variable. If your TF_LOG_CORE or TF_LOG_PROVIDER environment variables are enabled, the TF_LOG_PATH variable will create the specified file and append logs generated by Terraform.

$ export TF_LOG_PATH=logs.txt

To generate an example of the core and provider logs, run a terraform refresh operation.

$ terraform refresh
data.http.myip: Reading...
data.http.myip: Read complete after 0s [id=http://ipv4.icanhazip.com]
data.aws_ami.ubuntu: Reading...

## ...

Outputs:

instance_id = [
  "i-08cac1508a6baab3b",
  "i-056c2e5bc7dba098f",
]
instance_name = [
  "terraform-learn-sg_8080",
  "terraform-learn-sg_ping",
]
instance_public_ip = [
  "52.14.220.20",
  "18.224.3.102",
]

Open and review the logs.txt file. This log file contains both provider.terraform-provider-aws and Terraform core logging.

$ cat logs.txt
##...
2024-07-02T12:40:42.125-0500 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.56.1/darwin_arm64/terraform-provider-aws_v5.56.1_x5 pid=12485
2024-07-02T12:40:42.125-0500 [DEBUG] provider: plugin exited
2024-07-02T12:40:42.125-0500 [TRACE] vertex "provider[\"registry.terraform.io/hashicorp/aws\"] (close)": visit complete
2024-07-02T12:40:42.125-0500 [TRACE] vertex "root": starting visit (*terraform.nodeCloseModule)
2024-07-02T12:40:42.125-0500 [TRACE] vertex "root": does not belong to any module instance
2024-07-02T12:40:42.125-0500 [TRACE] vertex "root": visit complete
2024-07-02T12:40:42.125-0500 [TRACE] LoadSchemas: retrieving schema for provider type "registry.terraform.io/hashicorp/aws"
2024-07-02T12:40:42.125-0500 [TRACE] terraform.contextPlugins: Schema for provider "registry.terraform.io/hashicorp/aws" is in the global cache
2024-07-02T12:40:42.125-0500 [TRACE] LoadSchemas: retrieving schema for provider type "registry.terraform.io/hashicorp/http"
2024-07-02T12:40:42.125-0500 [TRACE] terraform.contextPlugins: Schema for provider "registry.terraform.io/hashicorp/http" is in the global cache
2024-07-02T12:40:42.128-0500 [TRACE] Plan is complete
2024-07-02T12:40:42.128-0500 [TRACE] Plan is not applyable
2024-07-02T12:40:42.128-0500 [TRACE] terraform.contextPlugins: Schema for provider "registry.terraform.io/hashicorp/aws" is in the global cache
2024-07-02T12:40:42.128-0500 [TRACE] LoadSchemas: retrieving schema for provider type "registry.terraform.io/hashicorp/aws"
2024-07-02T12:40:42.128-0500 [TRACE] terraform.contextPlugins: Schema for provider "registry.terraform.io/hashicorp/aws" is in the global cache
2024-07-02T12:40:42.128-0500 [TRACE] LoadSchemas: retrieving schema for provider type "registry.terraform.io/hashicorp/http"
2024-07-02T12:40:42.128-0500 [TRACE] terraform.contextPlugins: Schema for provider "registry.terraform.io/hashicorp/http" is in the global cache
2024-07-02T12:40:42.128-0500 [DEBUG] no planned changes, skipping apply graph check
2024-07-02T12:40:42.128-0500 [INFO]  backend/local: refresh calling Refresh
2024-07-02T12:40:42.129-0500 [TRACE] statemgr.Filesystem: creating backup snapshot at terraform.tfstate.backup
2024-07-02T12:40:42.130-0500 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 10
2024-07-02T12:40:42.130-0500 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2024-07-02T12:40:42.137-0500 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info
2024-07-02T12:40:42.140-0500 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock

To remove a log stream, unset the environment variable you do not need. Unset Terraform core logging. When you re-run Terraform, Terraform will only log provider specific operations. When you close your terminal session, all environment variables unset.

export TF_LOG_CORE=

Open a ticket

If you would like input from the community before submitting your issue to the repository, consider submitting your issue as a forum topic in the HashiCorp Discuss forum.

To create a bug review issue, you must determine which log stream contains your error. In your logs.txt file, find the final error message and trace it back to the source. It should contain provider-terraform-<PROVIDER-NAME> if it is a provider issue.

When you determine where your error originated, navigate to the Terraform core GitHub repository or search the provider registry for your provider's GitHub repository.

Some providers may have different suggestions for opening issues, but the Terraform core repository has a ticket template you should follow to provide the team with the information they need.

First, navigate to the Terraform GitHub repository and choose "Issues" from the top tabs.

The Terraform GitHub repository

Choose "New Issue".

Terraform GitHub repository new issue

Select "Get started" with a bug report.

Terraform GitHub Bug Report

Familiarize yourself with the code of conduct.

Using the Terraform core template, fill in the information you collected and note the expected behavior. When you finish filling out the template, select "Submit New Issue," and the team will review your issue.

Clean up resources

Destroy the resources you created. Respond yes to the prompt to confirm.

$ terraform destroy

Next steps

In this tutorial, you learned how to troubleshoot Terraform by correcting broken configuration for an EC2 instance and security groups. You corrected a cycle error, a variable interpolation error, and a looping error by formatting and validating your configuration. You also learned how to enable logging, and the best practices for reporting issues to the Terraform and provider teams on GitHub.

For more information on state and recommended practices, review the following tutorials:

Provider versions

Terraform versions

This tutorial also appears in:

37 tutorials

Terraform Associate (003) Tutorials
Progress through these tutorials to prepare for the Terraform Associate (003) certification exam.
- Terraform
11 tutorials

Manage Terraform State
Manage Terraform state. Follow these tutorials to import existing infrastructure and manipulate state storage.
- Terraform
16 tutorials

Use the Command Line Interface
Use the Terraform Command Line Interface (CLI) to manage infrastructure, and interact with Terraform state, providers, configuration files, and Terraform Cloud.
- Terraform

Prerequisites

Review the Terraform troubleshooting model

Clone the GitHub example repository

Format the configuration

Correct a variable interpolation error

Validate your configuration

Correct a cycle error

Correct a for_each error

Correct your outputs to return all values

Apply your changes

Bug reporting best practices

Confirm versioning

Enable Terraform logging

Open a ticket

Clean up resources

Next steps

This tutorial also appears in:

Correct a `for_each` error