Recover access to your lost ec2 instance

After having this expereince for the 3rd time in the past 18 months and I was looking for info and all I got was bits and pieces I decided to write this short post.

The problem I am trying to solve is one of my colleagues changed the file permissions of /etc/sudoers and even though Ansible is the one configuring the machine no user can now sudo which means my instance is pretty much useless :(

I can tell you that the following procedure also saved me from a time the private key got lost and we couldn’t get access to the machine.

worth noting this will only work for Amazon EBS-backed Instances more on EBS vs. Instance Store at the end of this post.

Reminder - Problem I needed to solve:

Any user logged in cannot sudo, we sonot allow root login so that’s not an option (no password/key or backdoor), so I need to change the /etc/sudoers file to mode 0440.

Steps to solution:

  1. Get instance id [ aws cli / aws dashboard ]
  2. Get root volume device [ e.g /dev/xvda ]
  3. Get EBS volume id [ the id of the ebs volume which is used as your root partition ]
  4. My server named server1 will need to be stopped [ step
  5. Detach the root volume
  6. Attach it to server2 (which I have access to of course)
  7. Mount the volume
  8. Perform the required change ( chmod 0440 /etc/sudoers )
  9. Detach the volume
  10. Re-attach it to the instance whilst also specifying the root device
  11. Start server1 { _try to sudo now … }

In more detail:

Prequisite: Considering you have aws command line tools installed you should do it like so: Get your instance id (server1 herein), root volume device id and the EBS volume id + a second server you will use to mount the EBS volume to (server2 herein).

1. Get instance id

Go for it either via AWS web console or aws ec2 describe-instances command

2. Getting root device:

In most cases the volume device would be: /dev/xvda but for just in case you can run the following command (using –output text would be better in this case):

 aws ec2 describe-instances --instance-ids <server1-id> --query 'Reservations[*].Instances[*].RootDeviceName' --output text

would yield:


3. Get EBS volume id

aws ec2 describe-instances --instance-ids <server1-id> --query 'Reservations[*].Instances[*].BlockDeviceMappings[?DeviceName==`'/dev/xvda'`].[Ebs.VolumeId]' --output text

Which would yield something like: vol-e12345e

4. Stop server1

aws ec2 stop-instances --instance-ids <server1-id>

5. Detach volume from server1

aws ec2 detach-volume --volume-id <volume id> --output text

6. Attache to server2

aws ec2 attach-volume  --volume-id <volume id> --instance-id <server2-id>

7. Mount volume on server2 (Steps 7-9)

Login to server2

mkdir /mnt/Server1 
mount <new device id e.g /dev/xvda.> /mnt/Server1
chmod 0440 /mnt/Server1/etc/sudoers
umount /mnt/Server1

9. Detach the volume

aws ec2 detach-volume --volume-id <volume id> --output text

10. Attach fixed volume + Start server1 (10-11)

aws ec2 attach-volume  --volume-id <volume id> --instance-id <server1-id> --device $root_device
aws ec2 start-instances --instance-ids $instance_a

If you performed all these steps you are most definitely blessed with a recovered server ;)

One of my colleagues told me that he automates something only if it happens 3 times in the 4th he automates it, so I guess if this happens to me another time the outcome would be a playbook … (which it’s logic is quite simple I suppose …)

Another thing worth noting you can specify the root device from the AWS console but there are so many steps API is much more convenient (IMHO …)

Hope you enjoy this and I do not need to do this more times in the future & god if your listening … please god make all instances immutable so next time I just launch a new one and delete the problematic one.

Keep on hacking, HP

Note on EBS vs. Instance Store EBS backed instances mean that your instance’s root device resides on EBS apposed to Instance Store-backed which are stored on S3. EBS are better considering they launch faster and use persistent storage which also means they support the stop method and Store-backed do not ( + EBS provide better performance overall …)