Resolving AWS CDK’s UPDATE_ROLLBACK_FAILED: A Real Use Case Solution

Tomoaki Imai
4 min readAug 17, 2023
No one likes this state :(

CDK often encounters issues with updates, getting stuck at UPDATE_ROLLBACK_FAILED. Several factors contribute to this problem. One common trigger is updating the lambda layer within the function. The function can't revert to the prior state if the previous layer is absent due to rollback sequencing.

When you google solutions, an AWS blog post provides insights: https://repost.aws/knowledge-center/cloudformation-update-rollback-failed

From AWS knowledge center

According to the article:

the only actions that you can perform on the stack are the ContinueUpdateRollback or DeleteStack operations.

Yep, we can skip the resource that failed to rollback! Then it also mentions

After the rollback is complete, the state of the skipped resources is inconsistent with the state of the resources in the stack template. Before performing another stack update, update the stack or resources to be consistent with each other.

But what does ‘consistent resources’ imply? What’s the update strategy? To clarify, I’ll outline the recovery steps.

Let’s assume you have a Lambda function in the CloudFormation that failed to rollback. Here’s my example.

hmm…

(Sidenote: this was caused by the similar issue reported on https://github.com/agutoli/serverless-layers/issues/51)

Because the function failed to rollback, the successor steps also failed. We need to complete the rollback first in order to add another change to the stacks.

Step1. Skip rollback

I won’t go over this in detail for this step as it is written in https://repost.aws/knowledge-center/cloudformation-update-rollback-failed.

  • Identify the failing resource. In my case, it was Web3genImagefunction
Subsequent resources are also failing to rollback
  • Select the stack that’s stuck in UPDATE_ROLLBACK_FAILED status from Stacks column.
  • Choose Stack Actions, and then choose Continue update rollback.
  • Skip the rollback by selecting “Advanced troubleshooting
  • Resources to skip — optional section, select the resources that you want to skip. This will skip the failing resource
skipping…

Confirm resource skipping and finalize rollback. Note: State should switch to UPDATE_ROLLBACK_COMPLETE

Step 2. Assess the current Stack state

Navigate to the stack’s Template section and locate the CloudFormation template.

Within the template, find the skipped resource. Here’s mine

Now, manually match the lambda function setup from the AWS web console. In my case, I corrected the uploaded function, layers and runtimes.

  • The function zip file can be uploaded from S3 using keyin Code section
  • Runtime is configurable
  • We can find ARN from Layers page

Step3. Redeploy CDK without the update

With settings in place, push your CDK using the previous, functional implementation — not the failed one! We want to make sure that the stack or resources are consistent with each other.

Once everything is consistent, then update your change again.

That’s it! I hope this article helps you resolve the rollback issue without deleting stacks :)

--

--

Tomoaki Imai

CTO at Noxx https://www.noxx.net/ AI hiring tool. FullStack developer and leader. Love to share ideas about software development. https://github.com/tomoima525