# OS Volume Ran Out of Memory

# What Happened

The search EC2 instance went down on a Magento Cluster deployment. Restarting the node did not help. Connecting via SSH and SSM was not possible.

# Root Cause

The OS volume on the search node was full because of syslog entries. These entries were all depreciation warning.

# Resolution

Using a backup to restore was not possible since the search EC2 instance was not being backed up. The volume needed to be grown and the filesystem also needed to be grown, this would prove challenging because there was no access to the node via SSH.

  • First, a backup AMI was taken of the effected EC2 instance.
  • Next, the volume was manually grown to double the amount it initially was.
  • The instance was stopped via ssm console.
  • The root volume was force detached via ssm console.
  • The root volume was attached to the jump instance via ssm console.
  • On the jump instance, the attached volume's (/dev/nvme2n1) root partition was grown, filesystem check was run, and filesystem was grown.
growpart /dev/nvme2n1 1
e2fsck -f /dev/nvme2n1p1
resize2fs /dev/nvme2n1p1
  • The volume was detached from jump host and reattached as a root volume to the effected instance.
  • The effected instance was booted up.

# Impact

Website was still up but critical functionality such as search were not usable.

# Existing Guardrails

Every EC2 instance is configured with an alarm for OS disk utilization. It will go into alarm when utilization is >= 90%.

# Planned Actions

  • Make sure we are subscribed to that alarm (for managed customers)
  • Preferably we would resolve deprecation warnings
  • Change log level for search service
  • Rotate syslog more often