In preparation for a fiber vendor maintenance, Teraswitch drained backbone traffic from FRA2’s edge network at approximately 09:35 UTC 11/27/2024.
At approximately 09:45 UTC, connectivity was impacted in FRA2 - and shortly after that the fiber vendor maintenance began. The FRA2 core network was unable to reach the Internet via any means for a period of time, lasting approximately 12 minutes. A fix was implemented to ensure the traffic would be forced to the next exit point towards the internet no matter the status of the connection, and site connectivity was restored around 09:57 UTC.
Teraswitch engineering determined the root cause to be both an unfortunate timing of a provider issue and also a logical error in the BGP tooling used to divert traffic away from the links about to undergo maintenance. This sequence of events caused routes to be inadvertently withdrawn from the FRA2 core data center network. The issue is now well understood and our tooling has been adjusted to prevent this going forward.
Our understanding of the real events and lab simulations verified this issue sequence:
To prevent this from happening in the future, the application of the no-export flag to default routes has been entirely removed - this does not have any traffic steering effect as Internet destinations will almost always have a more-specific route within the Teraswitch network. Second, we have also added a locally generated default route that will activate if the data center network believes it is isolated from all edge routers.
We apologize for any inconvenience caused by this incident. If you have any questions or concerns, please reach out to Teraswitch Support at support@teraswitch.com.