Major Service Disruption
Resolved
Apr 10 at 08:42am CEST
The issue has been mitigated. The database lock was cleared, error rates returned to normal, and Kubernetes has stabilized. All systems are currently operating as expected.
Affected services
Created
Apr 10 at 08:35am CEST
On 04/10/2025 09:35 EEST, a critical database lock occurred, which led to a cascade of application errors across multiple services. Due to the increased error rate and failing health checks, Kubernetes initiated a full node recycling process in an attempt to recover. This aggressive recovery mechanism caused widespread disruption, including temporary downtime and service instability. Root cause analysis points to the locked database as the initial trigger, compounded by insufficient error handling and aggressive liveness probe configurations.
Affected services