Type of Event:
S1: Visitor app down for users - Error: "Something went wrong while trying to validate your credentials"
Services/Modules Impacted:
Visitor app login and related applications
Root Cause:
Our Infra team discovered that the service principal, essential for authenticating access to critical services, had expired. As a result, the certificate in the gateway was not updated.
Remediation:
The service principal was updated manually which successfully restored service functionality.
Timeline:
All times listed in CEST
16:33 - Received alert indicating a service issue.
16:35 - Infra Team started investigating the alert.
16:50 - Root cause was identified.
16:51 - First client reported issue.
17:30 - Fire alarm triggered.
17:41 - The service principal was updated, and the issue was resolved at Infra level.
18:05 - Status page updated to monitoring as the issue was no longer observed.
19:12 - Status page updated to resolved.
Total Duration of Event:
2 hours 21 minutes
Preventive Action:
To proactively enhance our service reliability, our Infra team is implementing an automated alert system for service principal expirations. This system will provide advance notice, facilitating timely renewals and helping to ensure uninterrupted service, aligning with our commitment to seamless service delivery.