S2- Elevated response times in the admin dashboard

Incident Report for Eptura Visitor

Postmortem

We are grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.

Type of Event:
S2 – Visitor (PXC) API and Dashboard Performance Issues

Services/Modules Impacted:
Visitor Dashboard, web application, and API

Issue Summary/Background:
An increase in API response time and a significant spike in HTTP 500 errors were observed. Upon investigation, the logs revealed a rise in MySQL deadlocks and application server errors.

Specifically, the following error was noted in the application logs:

[HPM] Error occurred while trying to proxy request /full/app/CO-DEDP117/vm/kiosk/instances from app.example.com to https://api.example.com (ECONNRESET) (Errors | Node.js v23.11.0 Documentation)

‌

These issues affected the overall application performance and user experience.

Root Cause:

High latency and increased MySQL deadlocks.

Remediation:

While reviewing MySQL and application server logs, the team identified an increase in deadlocks in MySQL logs and application connection resets. To mitigate the issue, the Eptura CloudOps team performed a database. The restart helped in returning the application functionality to normal. We also noticed that API response times significantly improved and the HTTP 500 errors decreased substantially.

Timeline:

All times listed in CEST

9:22 a.m: A large spike in 5xx HTTP response errors and elevated response times from API endpoints were observed.

2:56  p.m.: An incident was reported, prompting Eptura to initiate the investigation.

3:51 p.m: Eptura updated the Visitor status page to reflect the incident and investigation.

4:57 p.m.: The Eptura Infra team identified the root cause of the issue and suggested that the response times have now been stable.

5:53 p.m: The Eptura team updated cases to ask customers for initial feedback on the issue.

6:22 p.m: The Eptura team updated the status page confirming the issue is resolved.

Total Duration of Event:

1 Hour 12 minutes

Preventive Actions:
To mitigate future occurrences, we have scheduled a proactive MySQL service restart every Monday before working hours to maintain database stability.

Posted May 19, 2025 - 08:04 UTC

Resolved

We are pleased to share that this incident is resolved as we have confirmed full restoration of admin response time performance.
We will publish our root cause analysis findings on this incident within 10 business days.

Thank you for your continued patience.

Posted Nov 18, 2024 - 16:22 UTC

Update

We are continuing to investigate this issue.

Posted Nov 18, 2024 - 14:45 UTC

Investigating

Our Infra team is currently investigating reports of increased loading times and slow response in the Proxyclick admin dashboard.

Posted Nov 18, 2024 - 13:51 UTC

This incident affected: Dashboard.