Resolving AWS CDK’s UPDATE_ROLLBACK_FAILED:
A Real Use Case Solution
CDK often encounters issues with updates, getting stuck at UPDATE_ROLLBACK_FAILED
. Several factors contribute to this problem. One common trigger is updating the lambda layer within the function. The function can't revert to the prior state if the previous layer is absent due to rollback sequencing.
When you google solutions, an AWS blog post provides insights: https://repost.aws/knowledge-center/cloudformation-update-rollback-failed
According to the article:
the only actions that you can perform on the stack are the ContinueUpdateRollback or DeleteStack operations.
Yep, we can skip the resource that failed to rollback! Then it also mentions
After the rollback is complete, the state of the skipped resources is inconsistent with the state of the resources in the stack template. Before performing another stack update, update the stack or resources to be consistent with each other.
But what does ‘consistent resources’ imply? What’s the update strategy? To clarify, I’ll outline the recovery steps.
Let’s assume you have a Lambda function in the CloudFormation that failed to rollback. Here’s my example.
(Sidenote: this was caused by the similar issue reported on https://github.com/agutoli/serverless-layers/issues/51)
Because the function failed to rollback, the successor steps also failed. We need to complete the rollback first in order to add another change to the stacks.
Step1. Skip rollback
I won’t go over this in detail for this step as it is written in https://repost.aws/knowledge-center/cloudformation-update-rollback-failed.
- Identify the failing resource. In my case, it was
Web3genImage
function
- Select the stack that’s stuck in UPDATE_ROLLBACK_FAILED status from Stacks column.
- Choose Stack Actions, and then choose Continue update rollback.
- Skip the rollback by selecting “Advanced troubleshooting”
- Resources to skip — optional section, select the resources that you want to skip. This will skip the failing resource
Confirm resource skipping and finalize rollback. Note: State should switch to UPDATE_ROLLBACK_COMPLETE
Step 2. Assess the current Stack state
Navigate to the stack’s Template section and locate the CloudFormation template.
Within the template, find the skipped resource. Here’s mine
Now, manually match the lambda function setup from the AWS web console. In my case, I corrected the uploaded function, layers and runtimes.
- The function zip file can be uploaded from S3 using
key
inCode
section
- Runtime is configurable
- We can find ARN from
Layers
page
Step3. Redeploy CDK without the update
With settings in place, push your CDK using the previous, functional implementation — not the failed one! We want to make sure that the stack or resources are consistent with each other.
Once everything is consistent, then update your change again.
That’s it! I hope this article helps you resolve the rollback issue without deleting stacks :)