During the evening of Wednesday, October 8 starting at about 17:00 CDT, our servers started experiencing problems handing off notifications to Google’s Android messaging servers. These problems included large delays, server timeouts, and increased server error responses from Google which started slowing down the rate at which we could hand off notifications. A related problem on our end also caused a brief delay in processing notifications bound for Apple’s notification servers which was quickly resolved.
The problem with Google’s notification servers sporadically cleared up on its own but then returned a few more times that evening, each time causing a backlog of a few thousand notifications on our servers but then quickly clearing out once Google’s servers started responding quickly again. We reached out to Google’s support resource for its notification servers and tried additional workarounds such as delivering notifications from different servers in case of possible network issues on our end.
By about 23:45 CDT, we had sent out all of our backlog of notifications while also keeping up with new notifications received by our API.
The problem with Google’s servers returned again on the morning of Thursday, October 9 and we rewrote our message sending system to try different strategies for trying to deal with the slow receiving side at Google. Eventually we found an appropriate level of concurrency and timeout length that allowed us to clear out our backlog while keeping up with new incoming messages received during our peak time.
This new solution worked throughout the day until about 00:10 CDT on Friday, October 10 when our message sending system encountered an internal bug which caused a backlog, which was entirely our fault. We believe this problem has been properly resolved by putting in additional error condition checks as well as additional server-side monitoring of this system. As of 3:00 CDT we are back to operating at near-realtime delivery of Android notifications.
We realize that our customers rely on timely delivery of notifications and we are doing everything we can to meet that expectation as we have for the past two and a half years. We apologize for the inconvenience these delays have caused you and we welcome any and all feedback from you.