DNSFilter Data Analytics Platform: From Fragmented Data Mess to a Living DataMesh

Listen to this article instead
8:12


When I stepped into the role of Senior Director of Data Platform Engineering at DNSFilter nearly 3 years ago, one sentiment stood out: We had plenty of data, but very little “truth.”

Each team had built its own data warehouse, and each one looked different, spoke a different dialect, and followed its own unwritten rules. None were complete, none had published schemas, and none connected to each other. 

If you wanted a full picture, you might have to chase half a dozen pipelines, reconcile mismatched definitions, and still end up with unanswered questions.

In short: We had data silos, not a data platform.

This is a relatively common problem that comes with a scrappy startup environment—it “works” until it doesn’t.

As DNSFilter grew, as threats became more sophisticated, and as customer needs expanded, the old approach began to buckle. We needed to make a quick shift to something that could scale with us—not just in storage, but in vision.

The Shift from Data Mess to DataMesh

In 2023, we chose to take a bold step: Reimagine how data at DNSFilter worked. Instead of fixing old warehouses or forcing everything into a single monolithic store, we drew inspiration from the DataMesh principles described by Zhamak Dehghani and Martin Fowler.

The vision was compelling:

  • Data belongs to the teams who know it best.
  • A central platform empowers, not dictates.
  • Sharing happens through clear contracts, not handshakes or tribal knowledge.

This wasn’t just about modern technology. It was about culture. It was about treating data not as a byproduct of systems, but as a first-class product with ownership, accountability, and discoverability.

Building the Foundation

We started with a modern, open, and scalable foundation. 

Some of the choices were technical but behind each was a story of enabling teams, not just tools.

  • Amazon S3 + Apache Iceberg gave us infinite, low-cost storage with the reliability of table semantics.

  • Parquet + Snappy compression meant fast, efficient files that are large enough (512–1024 MB) to serve analytics at scale, small enough to stay nimble.

  • Partitioning by date and hour made sense. Most of our threat and customer data lives in time, and queries follow time.

  • AWS Glue, Athena, and TrinoDB created a shared catalog and query fabric, allowing any team to query any dataset without hidden barriers.

  • Apache Kafka became our real-time backbone, streaming billions of events per day. It powers everything from immediate telemetry ingestion to near-instant alerting, ensuring that detection and analytics don’t just happen at scale, but also in real time.

  • Airflow + dbt Cloud turned pipelines into transparent, reproducible, versioned models rather than black boxes.

  • Holistics provided a visualization layer where raw data transformed into insights and stories.

But technology was just the scaffolding. The real breakthrough came from how we organized ownership.

Data Zones: Freedom Through Ownership

Instead of one warehouse for everyone, we introduced data zones. These are dedicated spaces tied to business domains: Security Intelligence, Support, Product, etc.

A data zone is owned by the team closest to that domain. Within the data zone, teams have freedom to:

  • Integrate their own sources
  • Define their own transformations
  • Publish their own models
  • When a team wants to share data outside its zone, they do so through contracts. These models are versioned, so producers can evolve without breaking consumers. Consumers can then upgrade at their own pace.

The result is much like a mesh: Loosely coupled but strongly aligned. Each team is empowered, but nobody is isolated. Ownership becomes freedom.

What DataMesh Enabled

This shift changed more than our pipelines. It completely changed how DNSFilter works with data. Today we have:

Scale without chaos

At the time of writing this, we manage almost 3 petabytes of data (growing daily) with 170 billion new events ingested each day.

Searchable archives

Threat researchers can query across the entire history of data, enabling faster detection of malicious actors and better long-term insights.

Real-time analytics

With Kafka as our streaming fabric, we don’t just analyze what happened yesterday—we see what’s happening right now. This shortens the feedback loop between detection and defense.

Self-service access

Teams no longer wait on a central team to provision systems, join databases, or build pipelines. They build and consume directly within, and across, zones.

A single source of truth

One source for both operational and business data.

And perhaps most importantly, the data remains anonymized and responsibly handled, powering threat analysis and detection without risk of exposing customer identities.

Beyond Engineering: The Impact Across DNSFilter

The ripple effects of the platform go well beyond the engineering team.

Data science

With petabytes of structured, reliable data, our scientists train smarter models, detect threats faster, and prototype detection methods that weren’t possible before. DataMesh gives them access to both fresh streams and deep archives, making experimentation seamless.

Support

Unified dashboards now give our Support teams a single pane of glass for every customer interaction. Instead of juggling systems and partial views, they have a complete picture making troubleshooting faster and experiences smoother.

The overall business

By anchoring data in business domains, the platform speaks the same language as leadership, product, and operations. Insights are clearer, decisions are faster, and strategy is better informed.

Why This Makes DNSFilter World-Class in Threat Detection

At the end of the day, these cumulative shifts in our data platform are really about DNSFilter’s ability to stop threats faster and more effectively than anyone else.

Threat detection lives and dies on three things:

  1. Coverage – seeing as much of the Internet’s activity as possible.
  2. Context – understanding not just what happened, but why and how it relates to everything else.
  3. Speed – detecting and responding before an attacker can cause damage.

Our DataMesh gives us all of the above at a scale few can match:

  • With multiple petabytes of structured data and 170 billion new rows added daily, we have one of the most comprehensive datasets in DNS security.

  • Because everything is organized into domain-owned data zones, we capture context-rich intelligence, from threat telemetry to customer support signals, in a way that’s both accurate and actionable.

  • With Kafka streaming billions of events in real time, we pair historical depth with immediate visibility. Analysts and models don’t just see the past, they see the present as it happens.

  • Our optimized formats, compression, and query engines mean searches across the entire archive happen in seconds, not hours.

This is what transforms raw data into real-time defense. It means we can identify new attack vectors as they emerge, trace malicious actors across time and space, and continuously improve our detection models with historical depth and global reach.

To sum it up, the DataMesh is not just infrastructure, it’s the foundation of world-class threat detection at DNSFilter.

Toto, We're Not in Silos Anymore

The journey from fragmented warehouses to a living DataMesh was a cultural transformation within DNSFilter. We moved from “data is IT’s job” to “data is everyone’s product.”

The vision is clear: A data platform that grows with DNSFilter, adapts to new challenges, and fuels innovation across every part of the company.

For us, data is no longer just something we collect. It’s something we responsibly own, share, and use together. And in doing so, we’ve built not only a world-class data platform, but a world-class defense against the evolving threats of the Internet.

Ready to defend your organization from said evolving threats? Try DNSFilter risk-free for 14 days.

Search
  • There are no suggestions because the search field is empty.
Latest posts
DNSFilter Data Analytics Platform: From Fragmented Data Mess to a Living DataMesh DNSFilter Data Analytics Platform: From Fragmented Data Mess to a Living DataMesh

When I stepped into the role of Senior Director of Data Platform Engineering at DNSFilter nearly 3 years ago, one sentiment stood out: We had plenty of data, but very little “truth.”

The Hidden Dangers of Clicking on Links: Why Every Click Needs Protection The Hidden Dangers of Clicking on Links: Why Every Click Needs Protection

Not All Clicks Are Created Equal

Clicking a hyperlink is one of the most common user actions online, but not all clicks have the same implications. A link in an email from a trusted vendor is different from a shortened URL in a social post, and both are different from a CAPTCHA prompt on an unfamiliar site. Security teams must acknowledge that the context of a click determines its risk level.

Explore More Content

Ready to brush up on something new? We've got even more for you to discover.