How does ASPC use Generative Adversarial Networks (GANs) to protect ML models from adversarial attacks?

How does ASPC use Generative Adversarial Networks (GANs) to protect ML models from adversarial attacks?

Adaptive Security & Policy Control (ASPC) utilizes Generative Adversarial Networks (GANs) to protect Machine Learning (ML) models from adversarial attacks through a process known as adversarial training. This technique involves generating sophisticated "adversarial examples" (AEs)—malicious inputs designed to fool the detection system—and using them to retrain and harden the ML models against such threats.

Adversarial Training Process

The core strategy relies on the fact that Deep Neural Networks (DNNs) are often vulnerable to small, imperceptible perturbations in input data that can cause them to make incorrect predictions (e.g., classifying malicious traffic as benign). ASPC uses GANs to automate the generation of these attacks to strengthen the system:

  1. Generation of High-Quality Adversarial Examples: A GAN architecture is employed to generate synthetic attack data that mimics real malicious traffic but is specifically engineered to evade detection by the target ML model.
  2. Retraining the Detector: These generated AEs are incorporated into the training dataset of the Centralized Attack Detector (CAD). The ML model is then retrained with this augmented dataset, forcing it to learn the characteristics of these sophisticated attacks.
  3. Hardening the Model: By exposing the model to these "optical illusions" during training, the model adjusts its decision boundaries to correctly classify them. This significantly increases its robustness, making it much harder for attackers to bypass the detector using similar techniques.

Specific Implementation: Enhanced MalGAN

ASPC implements a specific GAN architecture based on MalGAN to perform this task in a "black-box" setting, meaning the attacker does not need access to the target model's internal parameters.

  • Architecture: The system consists of three components:
    • Generator: Creates synthetic adversarial examples (AEs) from noise and real malign samples.
    • Black-Box Model: The target ML detector that the system wants to fool.
    • Discriminator (Substitute Model): Tries to distinguish between real and generated samples, learning the behavior of the black-box model to guide the generator.
  • Novel Enhancement (Smirnov Transform): Standard GANs often produce AEs that, while successful at fooling the detector, are statistically different from real attacks and easily identifiable by simple filters. To solve this, ASPC integrates a custom activation function based on the Smirnov Transform into the generator. This forces the generated AEs to possess the same statistical distribution as real malicious traffic, making them virtually indistinguishable from real attacks.

Results

Experimental evaluations demonstrated that retraining the ML model with these high-quality, GAN-generated adversarial examples significantly improved its resilience. In one test scenario involving crypto mining detection, the accuracy of the model in detecting new adversarial attacks increased to 99%, reducing the evasion ratio (successful attacks) from 48% to 1%.