by DNSFilter Team on Oct 24, 2024 4:00:00 PM
Have you ever tried to build a machine learning classifier where you only had labels for one of the classes?
In computer security, researchers usually have easy access only to labels for malicious samples (malware, phishing domains, etc.), while labels for benign samples (productivity software, e-commerce domains, etc.) are missing entirely—or they are tedious and expensive to collect at scale. Typically, this leads to researchers regarding the “known bad” samples as malicious, while the rest is presumed to be benign.
In recent research published by DNSFilter's Chief Data Scientist, David Elkind, we show that this solution leads to a biased model when compared to an alternative procedure which removes the malicious-but-unlabeled samples from the training set. We show significant improvements in model quality on two different computer security datasets.
Click the button below to read the full research paper. For additional materials, including the code and CAMLIS 2024 poster David presented on October 24, visit GitHub here.
Each year, cybersecurity companies publish a number of research reports focusing on different aspects of cybersecurity and breach trends. Below is a list of some of the most alarming statistics from several reports published throughout the year from various companies.
{% module_block module "widget_6aeb08dc-4790-47de-a546-385b24cb0188" %}{% module_attribute "button_text" is_json="true" %}"READ MORE"{% end_module_attribute %}{...Have you ever tried to build a machine learning classifier where you only had labels for one of the classes?
Almost every company is chasing the latest shiny object in an effort to be more competitive. The latest shiny object isAI, but before that, it was cloud, 5G, etc. The problem is that all of these new technologies also increase security risks — and the reality is that most organizations are ill-prepared for the existing security risks, let alone the new ones created by the addition of emerging technologies.