DNSFilter Chief Data Scientist: Where we're going, we don't need (negative) labels

Have you ever tried to build a machine learning classifier where you only had labels for one of the classes?

In computer security, researchers usually have easy access only to labels for malicious samples (malware, phishing domains, etc.), while labels for benign samples (productivity software, e-commerce domains, etc.) are missing entirely—or they are tedious and expensive to collect at scale. Typically, this leads to researchers regarding the “known bad” samples as malicious, while the rest is presumed to be benign.

In recent research published by DNSFilter's Chief Data Scientist, David Elkind, we show that this solution leads to a biased model when compared to an alternative procedure which removes the malicious-but-unlabeled samples from the training set. We show significant improvements in model quality on two different computer security datasets.

Click the button below to read the full research paper. For additional materials, including the code and CAMLIS 2024 poster David presented on October 24, visit GitHub here.

Search
  • There are no suggestions because the search field is empty.
Categories

Categories

Latest posts
DNSFilter’s Latest Security Report Finds New Domains Driving Threat Activity DNSFilter’s Latest Security Report Finds New Domains Driving Threat Activity

Data shows bad actors increasingly using small nations’ domains for malicious activity

DNSFilter Research Warns Tycoon 2FA Expanding Phishing-as-a-Service Operation DNSFilter Research Warns Tycoon 2FA Expanding Phishing-as-a-Service Operation

65 root domain indicators of compromise identified in growing campaign

DNSFilter Builds Momentum with 33% Sales Growth through MSP & VAR Channel Partners DNSFilter Builds Momentum with 33% Sales Growth through MSP & VAR Channel Partners

Strategic investments in partner success and product innovation fuel continued growth

Explore More Content

Ready to brush up on something new? We've got even more for you to discover.