Proxyclick by Eptura - Users Unable to Authenticate to Web Application/Dashboard
Incident Report for Proxyclick
Resolved
Proxyclick by Eptura Detailed Root Cause Analysis (RCA) – S1 Event 2023-02-15
On February 15, 2023, at 15:33 UTC, Proxyclick started to receive reports that users were unable to access the application. Engineering and DevOps teams isolated an issue with the Web Application servers and restarted them, restoring service.

Type of Event:
Service Disruption

Services Impacted:
Proxyclick Web Application

Remediation:
DevOps and Engineering restarted the impacted service hosts, restoring normal operation.

Timeline of Events:
15:33 UTC - First reports received by Support
15:35 UTC - DevOps and Engineering begin investigation
15:42 UTC - DevOps restarts impacted service hosts
15:44 UTC - Service host restart completes and normal operations resume

Total Duration: 11 Minutes

Groups Involved in the Event:
Support
DevOps
Engineering

Root Cause Analysis:
A primary service host for the Proxyclick Web Application crashed due to an Out-of-Memory exception and was not automatically restarted. The cause of this OOM exception was identified as a memory leak in the application which had previously escaped notice due to frequent restarting of the service hosts during regular product update deployments. Proxyclick Engineering had a larger than normal gap between releases after the service migration event on January 15th, 2023 which surfaced the conditions for this memory leak to consume all available memory on the service host.

Preventative Action and Analysis:
DevOps has implemented additional health monitoring to Proxyclick Load Balancer infrastructure to detect service hosts failing due to memory limits and remove them from the pool. Additionally, a self-healing trigger has been added to this health check response to bring the failed host back into service automatically to maintain HA and load capacity.

Engineering will investigate the memory leak to produce a patch that resolves the root issue permanently in a future release.
Posted Feb 15, 2023 - 16:33 CET