API downtime
Resolved
Sep 12 at 03:59pm CEST
The API was not available during 2 time chunks
chunk 1: from 12:34:33pm to 12:35:08pm (35 seconds)
chunk 2: from 12:35:46pm to 12:36:42 (56 seconds)
The first chunk was the result of an unexpectedly large analytics query from one of our customers combined with an improper thread management at the server level (blocking other queries).
The second chunk was a consequence of the accumulation of requests during the first chunk that overloaded the replicas once they started processing requests again. This triggered a restart + scale up event. The whole process took 56 seconds after which situation went back to normal
At 13:51pm, a fix has been deployed to prevent large queries from blocking the server.
Affected services