S2 - Issues in the download and printing of badges

Incident Report for Eptura Visitor

Postmortem

We are grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.

Type of Event:
S2 - All Access Control Systems are down, ACS Extender showing as offline.

 

Services/Modules Impacted:
Access Control System

 

Issue Summary/Background:
All the clients using ACS reported that they are unable to scan the QR codes.

 

Root Cause:
Eptura CloudOps investigated the message processing mechanisms, ensuring all pending tasks were addressed. Our team discovered that incorrect timing parameters were causing the ACS service to stop unexpectedly.

We took immediate action by updating the Last Fire and Next Fire event date parameters in the database, effectively restoring the ACS services.

Remediation:
These entries for the Last Fire and Next Fire event dates were updated with accurate values in the database for the scheduling system, which restored normal ACS service functionality.

Timeline (UTC):

03 May 2025

20:00 UTC - Customers first reported the issue with the QR codes.

05 May 2025

14:04 UTC - Issue was identified and reported to Eptura CloudOps/Infra team.

15:35 UTC - Eptura CloudOps team fixed the issue in the database and requested to check with the customers.

17:15 UTC - Customers confirmed the resolution to the issue.

Total Duration of Event:
1 day and 21 hours

Preventive Actions:
Initially, our monitoring strategy relied on examining logs and the user interface (UI). However, during this incident, the UI incorrectly showed that the ACS service was online, which delayed our response. To prevent similar issues in the future, we are enhancing our monitoring approach.

We will now focus on the table that manages scheduler operations, ensuring that all changes to relevant fields are logged. Using this data, we will set up alerts to notify our team when the Next Fire time exceeds a predetermined threshold. This proactive monitoring will enable us to respond more quickly to incidents and improve our overall operational efficiency.

Thank you for your patience and support as we continue to enhance our services.

Posted May 30, 2025 - 10:48 UTC

Resolved

We are pleased to inform you that the issue concerning the missing downloads for Visitor has been successfully resolved following the restoration of backups. Our internal team has completed all necessary remediation steps and confirmed that the service is operating normally after a period of monitoring.

A detailed Root Cause Analysis (RCA) is currently underway to thoroughly investigate the incident. The report will be published on our Status Page within the next 10 days.

We sincerely appreciate your patience and cooperation throughout this process.
Posted May 16, 2025 - 07:11 UTC

Update

As the issue was widespread and affected the Americas region as well, we shall continue to monitor the situation during Americas working hours today and mark the incident as resolved at or after 0700 UTC on 16 May 2025.
Posted May 15, 2025 - 13:02 UTC

Monitoring

All the backups have been restored successfully. As the issue has now been resolved, we shall monitor the situation for next 4 hours and progress it to resolved after that.
Posted May 15, 2025 - 09:21 UTC

Update

The final backup is in the final stages of being restored. We shall post an update once the restoration is completed.
Posted May 15, 2025 - 09:05 UTC

Update

Two of the three backups have been successfully restored. We are currently awaiting the completion of the remaining restoration processes. Our Cloud Operations team is diligently working to restore all data. We appreciate your patience as we navigate this process. The next update will be provided at or before 9 AM UTC.
Posted May 15, 2025 - 05:59 UTC

Update

Our Cloud Operations team continues to work on restoring data. We appreciate your patience we work through this. Our next update will be provided at 6 AM UTC.
Posted May 15, 2025 - 02:00 UTC

Update

Our Cloud Operations team continues to work on restoring data. We appreciate your patience we work through this. Our next update will be provided 2am UTC.
Posted May 14, 2025 - 23:33 UTC

Update

One of the three backups has been successfully restored. We are waiting for the rest to complete the restoration.
Posted May 14, 2025 - 20:39 UTC

Identified

Our infrastructure team has identified that the issue is with the image storage, and they are trying to fix this. In the meantime, as a workaround user's should be able to replace the missing images (e.g. in badge design) by uploading a new ones.
Posted May 14, 2025 - 17:08 UTC

Update

We have identified that customer logos are not visible from their sites, and the badges are no longer downloading or printing because they can't find the image from the linked path. Our infrastructure team is currently investigating the issue. Next update on 7:01 PM UTC.
Posted May 14, 2025 - 17:02 UTC

Update

We are continuing to investigate this issue.
Posted May 14, 2025 - 16:54 UTC

Investigating

S2 - Issues in the download and printing of badges
Posted May 14, 2025 - 16:41 UTC
This incident affected: Dashboard.