Anycast Resolution Latency and Our Commitment to Transparency

by DNSFilter on Mar 23, 2023 4:21:00 PM

Early today at 11:40 a.m. UTC, we detected degraded performance across the DNS2 anycast network. Our team escalated the issue to our hosting provider immediately, and took action to implement a fix by 1:00 p.m. UTC. Performance was fully restored by 1:44 p.m. UTC, and our team continued to monitor the situation. You can review the updates on our status page here.

In the interest of transparency, I wanted to write this article to detail exactly what we experienced to our customers to provide additional information around the incident around this somewhat unique issue.

The complete incident details

At 11:49 a.m. UTC we detected degraded performance on part of our DNS2 anycast network. One of our hosting providers stopped sending our secondary prefixes, pushing the majority of DNS2 traffic to our DNS1 anycast network, which is the initial cause of this degradation.

During the shift from DNS2 to DNS1, much of that traffic shifted to nodes in Copenhagen, Prague, Marseille, and Stockholm. But those nodes could not handle the entire surge from DNS2, and traffic was again rerouted to Sydney and Miami. While this failover mechanism maintained DNS resolution for our customers, it also created latency primarily in central and eastern US time zones. DNS resolution speeds increased at their height to roughly 300ms (3/10 of a second), though the average response time in that window was 11ms.

Since we use our own service internally, we also experienced this incident firsthand. While you might not have noticed the impact if you were browsing a news site at this time, sites that use a lot more dynamic resources may have seemed slow based on the knock-on impacts of slower resolution.

Because we saw this incident occur in real-time, we immediately escalated the issue to our provider and collaborated to resolve the problem. Our hosting provider is also conducting further RCA (root cause analysis) to understand what led to the routing interruption of our secondary prefixes.

Our fully redundant architecture allowed DNS resolution to continue, despite increased latency of resolution time.

Changes we’re making

We are still investigating this incident with our hosting provider, as mentioned above. One thing we’re looking at doing a better job of is decreasing the MTTR (meantime to recovery) for these types of situations. We believe we will resolve these issues significantly faster even when the impact is low.

We are also reviewing internal processes and how we’ve structured our architecture to determine what changes we can make to reduce the impact surface area if an anycast node goes down.

When we built our anycast network, we purposefully created two parallel BGP networks so that if one network had any failures or latencies, the other network would pick up the slack. In one way, this incident was a testament to the success of that strategy; But in another way, this incident will allow us to build further improvements to account for the infinite landscape of problems that come with running a complex global anycast network.

I keep saying transparency

I often correlate the service we provide to oxygen. If we’re controlling the oxygen flow for other companies like ours, we need all of the gauges to report accurately and every tank has to be filled.

Providing our customers with a reliable, high performance service remains a core value of ours. We know that we are an integral part of your technology stack—one that you need to simply work. That’s why we take incidents like this very seriously.

But I also recognize the need to share information when things like this occur. I’m a software user, too. I get impacted by incidents, too. As a technical user, I want answers to why these things occur. That is what we strive to do here: Be honest and responsive when incidents of this type do occur.

We are committed to our customers beyond the product itself. Each of you has chosen to partner with DNSFilter as your DNS resolution and filtering provider, deploying security to your organization via DNS through us. Thank you for choosing us, and we will continue to work hard to ensure that oxygen levels are at full capacity. And if the readings are ever off, we will always let you know.

Visit DNSFilter’s status page for details on this incident.

‍

Topics: Product & Features

Shadow IT: The Hidden Threat in Your Clients' Networks

Shadow IT is quickly becoming one of the biggest blind spots in cybersecurity, especially for MSPs. As clients increasingly adopt cloud-based tools, browser extensions, and AI-powered applications, many of these services bypass traditional IT oversight. These unsanctioned tools may seem minor at first, but they can introduce serious vulnerabilities to your clients' environments.

Tycoon 2FA Infrastructure Expansion: A DNS Perspective, and Release of 65 Root Domain IOCs

Our analysis of Tycoon 2FA infrastructure has revealed significant operational changes, including the platform's coordinated expansion surge in Spanish (.es) domains starting April 7, 2025, and evidence suggesting highly targeted subdomain usage patterns. This blog shares our findings from analyzing 11,343 unique FQDNs (fully qualified domain names) and provides 65 root domain indicators of compromise (IOCs) to help network defenders implement mo...

The Best Content Filter Software Checklist: A Buyer's Guide to DNS-Level Protection

Staying Ahead with Smarter Web Filtering

Across every industry and network environment, content filtering isn’t just a matter of productivity, it’s a front line of defense. From malware and phishing to compliance risks and productivity drains, the threats are real, and the stakes are high.