Packer Retry on Error: Handle Transient Build Failures (2026)

Packer is silently eating your build failures, masking the real issues.

The problem is that Packer’s on_error directive, when set to retry, doesn’t just retry on transient errors. It retries on any error, including fundamental configuration mistakes or network issues that will never resolve on their own. This can lead to hours of wasted build time and obscure the root cause of the failure. You end up chasing phantom issues because Packer is aggressively retrying a build that’s doomed from the start.

Let’s look at a typical scenario: a shell provisioner failing because a command doesn’t exist on the target image, or a file provisioner failing because the source file is missing. Packer’s default on_error behavior (or retry) will just keep spinning.

Here’s how to effectively manage on_error and diagnose failures:

Common Causes and Fixes

Command Not Found in Shell Provisioner:
- Diagnosis: Examine the Packer build output carefully. You’ll see the shell command that failed and the error message from the target machine (e.g., sudo: apt-get: command not found or bash: yum: command not found).
- Fix:
  - If the command is genuinely missing: Install it using the appropriate package manager for the base image before your failing provisioner. For example, if you’re on a minimal Debian/Ubuntu and need wget:
```
{
  "type": "shell",
  "inline": [
    "sudo apt-get update",
    "sudo apt-get install -y wget"
  ]
}
```
    This works because you’re explicitly installing the missing dependency, allowing the subsequent provisioner to find and execute it.
  - If the command is misspelled or incorrect: Correct the spelling in your inline or script block.
```
{
  "type": "shell",
  "inline": [
    "sudo apt-get update",
    "sudo apt-get install -y awscli" // Corrected from 'aws-cli' if that was the mistake
  ]
}
```
    This resolves the issue by ensuring the provisioner is calling the correct, existing command.
- Why it works: The shell provisioner executes commands within the context of the guest OS. If a command isn’t in the PATH or isn’t installed, the OS itself will report an error, which Packer then forwards. By ensuring the command exists, you satisfy the OS’s requirement.
Source File Not Found for file Provisioner:
- Diagnosis: Packer will report an error like Error uploading file: The system cannot find the file specified. or file not found on the host. The error message points directly to the missing source file on your build machine.
- Fix: Verify that the source path in your file provisioner is correct relative to your Packer template file, or is an absolute path that exists.
```
{
  "type": "file",
  "source": "configs/app.conf", // Ensure 'configs/app.conf' exists on your build machine
  "destination": "/etc/app.conf"
}
```
  This works because Packer needs to read the file from your local filesystem before it can upload it to the target instance. If the file isn’t there, the upload fails immediately.
Incorrect Permissions on Source File for file Provisioner:
- Diagnosis: You might see a generic Error uploading file or a permission denied error when Packer tries to read the source file on your build machine.
- Fix: Ensure the user running packer build has read permissions on the source file.
```
chmod +r configs/app.conf
```
  This grants read access, allowing Packer to open and read the file for uploading.
Network Issues During Instance Boot/Provisioning (e.g., SSH Timeout):
- Diagnosis: Packer output will show Error: Timed out waiting for SSH to become available. or similar messages indicating it couldn’t connect to the instance.
- Fix:
  - Security Group/Firewall: Verify that the instance’s security group (AWS, Azure, GCP) or any network firewalls allow inbound SSH traffic (TCP port 22) from the IP address Packer is using.
    - AWS Example: In your AWS console, navigate to EC2 -> Security Groups, find the group associated with your instance, and add an inbound rule for SSH (port 22) allowing access from your build machine’s IP or a trusted range.
    - This works because the instance’s network layer is blocking the SSH connection attempt. Opening the port allows the connection.
  - Instance Reachability: Ensure the instance has a public IP address (if needed) or that your build environment can reach the instance’s private IP (e.g., via VPN or within the same VPC).
    - AWS Example: Check the instance’s subnet settings and ensure it’s in a public subnet if you expect to connect directly over the internet, or that routing is correctly configured for private IP access.
    - This ensures that network packets can actually reach the instance’s SSH server.
- Why it works: SSH requires a network path to be open. If firewalls or routing misconfigurations block traffic on port 22, Packer cannot establish the SSH connection needed to upload files or run commands.
Invalid Cloud Provider Credentials or Permissions:
- Diagnosis: Errors will be specific to your cloud provider, often mentioning Access Denied, InvalidClientTokenId, Authentication Failed, or AuthorizationError.
- Fix:
  - AWS: Ensure your ~/.aws/credentials file is correctly populated with a valid aws_access_key_id and aws_secret_access_key, or that your environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) are set. Also, verify the IAM user/role has permissions to create EC2 instances, security groups, etc.
```
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
# Ensure IAM role/user has "AmazonEC2FullAccess" or equivalent permissions
```
    This works by providing Packer with the necessary cryptographic proof of identity and authorization to interact with the cloud API.
  - Azure: Ensure your ~/.azure/credentials or environment variables (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID) are correct, and the service principal has the appropriate RBAC roles (e.g., "Virtual Machine Contributor").
```
export ARM_CLIENT_ID="YOUR_CLIENT_ID"
export ARM_CLIENT_SECRET="YOUR_CLIENT_SECRET"
export ARM_TENANT_ID="YOUR_TENANT_ID"
export ARM_SUBSCRIPTION_ID="YOUR_SUBSCRIPTION_ID"
```
    This validates your identity and permissions to perform actions within your Azure subscription.
- Why it works: Cloud providers use credentials and permissions to control who can do what. Incorrect credentials mean the API calls fail authentication; insufficient permissions mean the API calls fail authorization.
Syntax Errors in Packer Template:
- Diagnosis: Packer will fail before attempting to build, with an error message like Error: Invalid character encountered at ... or Error: Missing required field "type".
- Fix: Run packer validate <your-template.json> to catch these errors early. Correct the JSON syntax, missing fields, or incorrect key-value pairs.
```
packer validate my-aws-template.json
```
  This works because Packer performs a static analysis of your template file to ensure it conforms to the expected structure and syntax before it even starts provisioning.

The `on_error` Directive: Use with Caution

Instead of retry, consider setting on_error to abort (the default) or cleanup.

abort: Stops the build immediately on the first error. This is usually what you want for debugging.
cleanup: Stops the build and attempts to clean up any resources created by the failed build (e.g., terminates the instance).

If you must use retry, it’s often best to combine it with a max_retries count and a retry_wait duration to prevent infinite loops on persistent failures.

"on_error": "retry",
"max_retries": 3,
"retry_wait": "5m"

This will retry the build up to 3 times, waiting 5 minutes between each attempt. This is still risky if the underlying issue isn’t transient.

The next error you’ll hit after fixing fundamental issues is often a subtle misconfiguration in a provisioner that does exist but behaves unexpectedly, or a dependency on a resource that wasn’t created correctly in a previous step.

Common Causes and Fixes

The on_error Directive: Use with Caution

The `on_error` Directive: Use with Caution