S2 - All Access Control Systems are down, ACS Extender showing as offline

Incident Report for Eptura Visitor

Postmortem

We are grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.

Type of Event:

S2 - All Access Control Systems are down, ACS Extender showing as offline

 

Services/Modules Impacted:
Access Control System

 

Issue Summary/Background:
All the clients using ACS reported that they are unable to scan the QR codes.

 

Root Cause:

Eptura CloudOps investigated the message processing mechanisms, ensuring all pending tasks were addressed. Our team discovered that incorrect timing parameters were causing the ACS service to stop unexpectedly.

We took immediate action by updating the Last Fire and Next Fire event date parameters in the database, effectively restoring the ACS services.

 

Remediation:

These entries for the Last Fire and Next Fire event dates were updated with accurate values in the database for the scheduling system, which restored normal ACS service functionality.

Timeline (UTC):

03 May 2025

20:00:00 UTC - Customers first reported the issue with the QR codes.

 

05 May 2025

14:04:00 UTC - Issue was identified and reported to Eptura CloudOps/Infra team.

15:35:00 UTC - Eptura CloudOps team fixed the issue in the database and requested to check with the customers.

17:15:00 UTC - Customers confirmed the resolution to the issue.

Total Duration of Event:

1 day and 21 hours

Preventive Actions:

Initially, our monitoring strategy relied on examining logs and the user interface (UI). However, during this incident, the UI incorrectly showed that the ACS service was online, which delayed our response. We are enhancing our monitoring approach to prevent similar issues in the future.

We will now focus on the table that manages scheduler operations, ensuring that all changes to relevant fields are logged. Using this data, we will set up alerts to notify our team when the Next Fire time exceeds a predetermined threshold. This proactive monitoring will enable us to respond more quickly to incidents and improve our overall operational efficiency.

Thank you for your patience and support as we continue to enhance our services.

Posted May 19, 2025 - 08:02 UTC

Resolved

We are pleased to inform you that the issue with the Access Control System for Visitor has been resolved. Our internal team has completed the necessary actions and verified that the service is functioning normally after a brief monitoring period.

A Root Cause Analysis (RCA) will be conducted to understand the incident in detail. It will be available on our Status Page within 10 days.

Thank you for your patience and cooperation throughout this process. If you have any further questions or concerns, please feel free to reach out.
Posted May 06, 2025 - 12:45 UTC

Monitoring

As per the communication from our infrastructure team, the ACS platform is now online.
We will continue to monitor the situation for the next 24 hours.
Posted May 05, 2025 - 15:38 UTC

Update

All Access Control Systems are down. Our teams are actively investigating the issue and we will update you on the progress.

Next Update - 16:30 UTC
Posted May 05, 2025 - 14:21 UTC

Investigating

We are currently investigating an issue with Proxyclick. We will update you when we have more information.
Posted May 05, 2025 - 14:14 UTC
This incident affected: Access Control.