You google a site you need for work and come across DNSFilter's block page. Ever wonder what happens when you hit the report button in our software or notify the IT department about how you definitely need this website “scientifically ranking cute puppies" to be accessible?
First off, I agree with you, it is a travesty that it is blocked in the first place. Allow me to take you on a journey behind the curtain for a second to get a peek at the wide and wild world of domain categorization and report processing.
First there's a rather common report. This involves the good ol' Parked & Under Construction category. Oft victims of not being built soon enough with enough significant content for when they get queued up for analysis, these are some road-less-traveled sites.
I can hear your thought process now, dear reader: "You mean there aren't millions of visitors to Uncle Jim's Clown Emporium with one picture of him crying while making balloon animals?!?"
Correct. When there isn't enough content, or if it misses the timing of its initial scan, the AI will look at it and say, "Hey looks like this one is still in the oven," and sticks it into Parked & Construction.
This is where the Domain Intelligence team comes in. "Oh this is definitely Art & Entertainment." Though we may be stretching the definition of Art in Uncle Jim's case (does this even qualify as a business?) Oh well, yes technically, he has a link for tips in the very bottom left corner of an infinite scroll.
Joking aside, there are many factors when we take into account categorization of a domain. Overall a stout ruling of "empirical evidence is king" when looking to fit things into categories is the way to go. This involves an amount of research that would surprise you and a meticulous evaluation of every site that comes across the desk.
"But how hard is it to come to Arts/Business and send it on its way?"
Well that varies, to be honest, and some sites are easier than others. There are other research factors as well, especially when evaluating threats such as malware and phishing—things to consider like history, overall health of the domain, security flaws and more. Luckily, we have the luxury of being able to have multiple categories across sites and a large repository of data to consider when evaluating sites.
For a prime and tangible day-to-day example you can take YouTube: It takes content all over the spectrum from Sports to Games to Music to Education and Self Help, to Business talks, to Tech etc etc etc. The category list could get staggering very quickly. In this case we can distill down YouTube into a broader Entertainment Category, since its primary goal is to entertain. Something that is a bit more specific, like Twitch.tv that caters more to the gamer niche, would be Games & Entertainment.
This is a way zoomed out look at the rabbit hole you can go down when dissecting content and dealing with even basic categorizations. This can get even zanier the more extensive the content and the more complex the website. So as you hit the report button or forward onto IT the Clown Emporium to get unblocked, consider what you would paint a site as.
It's always an interesting time to process these reports! Between all of the variations and combinations, it's well worth it to make sure the content our customers’ experience is well managed, well labeled and making the internet overall a safer place.
That's all from the Domain Intelligence Desk today! Have questions you want answered? Tweet us @dnsfilter!