by DNSFilter Team on Oct 24, 2024 4:00:00 PM
Have you ever tried to build a machine learning classifier where you only had labels for one of the classes?
In computer security, researchers usually have easy access only to labels for malicious samples (malware, phishing domains, etc.), while labels for benign samples (productivity software, e-commerce domains, etc.) are missing entirely—or they are tedious and expensive to collect at scale. Typically, this leads to researchers regarding the “known bad” samples as malicious, while the rest is presumed to be benign.
In recent research published by DNSFilter's Chief Data Scientist, David Elkind, we show that this solution leads to a biased model when compared to an alternative procedure which removes the malicious-but-unlabeled samples from the training set. We show significant improvements in model quality on two different computer security datasets.
Click the button below to read the full research paper. For additional materials, including the code and CAMLIS 2024 poster David presented on October 24, visit GitHub here.
By 2025, zero trust will be the dominant architecture model, fully replacing outdated perimeter-based models. Security controls will focus increasingly on the workforce and workloads rather than just the workplace, leading to enhanced protection across diverse environments.
Data shows one in every 174 requests is malicious, up from one in every 1,000 in the previous report
By 2025, generative AI will be integrated into nearly every business and department, significantly boosting productivity. However, this will also introduce new security risks that organizations will need to address. Simply automating tasks won't be enough. A focus on secure automation and responsible AI practices will be essential. Additionally, creating cyber exploits will become easier, as the barrier to entry lowers. Individuals will need to t...