Network Outage
Incident Report for TrackJS
Postmortem

Our monitoring alerted us to problems with our Redis cluster. The servers appeared to be online and functioning normally, but the network connectivity between certain servers was not great.

We reached out to our infrastructure provider, OVH.experienced network disruption on the private network that connects our servers (the VRACK). The network disruption was intermittent, but it caused our Redis cluster to often lose connectivity with our other servers.

We quickly swapped Redis master to a different server that appeared to be more stable on the network while OVH resolved the issue. However, during the outage, some customers saw an error page while interacting with the UI and there were some delays and drops in processing new errors.

Posted Jan 11, 2023 - 11:10 CST

Resolved
After monitoring last evening the issue appears entirely resolved.
Posted Jan 11, 2023 - 07:10 CST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jan 10, 2023 - 19:57 CST
Identified
Our hosting provider has identified the network issue and is working on remediation now. We are seeing most services return back to normal but our hosting provider has not given the all clear.
Posted Jan 10, 2023 - 17:17 CST
Investigating
We are experiencing a widespread network outage with our hosting provider. We are working to get things resolved as quickly as possible.
Posted Jan 10, 2023 - 16:08 CST
This incident affected: Error Ingestion and Management UI.