Our monitoring alerted us to problems with our Redis cluster. The servers appeared to be online and functioning normally, but the network connectivity between certain servers was not great.
We reached out to our infrastructure provider, OVH.experienced network disruption on the private network that connects our servers (the VRACK). The network disruption was intermittent, but it caused our Redis cluster to often lose connectivity with our other servers.
We quickly swapped Redis master to a different server that appeared to be more stable on the network while OVH resolved the issue. However, during the outage, some customers saw an error page while interacting with the UI and there were some delays and drops in processing new errors.