What Went Wrong On January 29th at approximately 5:50 PM EST, customers experienced intermittent outages and delays across multiple API services. The disruptions were caused by an infrastructure upgrade that unexpectedly impacted critical systems.
Detection The issue was detected through internal monitoring alerts and engineering reports. Services would go down briefly and then recover, leading to inconsistent availability.
Mitigation Once the upgrade was identified as the cause, it was halted, and a rollback was initiated. Services were gradually restored, and system stability was confirmed.
Root Cause The outage occurred because a scheduled infrastructure upgrade required rebooting key systems, which inadvertently disrupted API services.
Next Steps Improve planning around infrastructure upgrades. Enhance monitoring and alerting to detect issues faster. Implement processes to minimize service disruptions during future updates.
We apologize for the inconvenience and are taking steps to prevent similar issues in the future. Thank you for your patience and for being a valued customer.