Adversarial Prompt Tuning for Vision-Language Models

Adversarial Prompt Tuning for Vision-Language Models
Do not index
Do not index
Blog URL
With the rapid advancement of multimodal learning, pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capacities in bridging the gap between visual and language modalities. However, these models remain vulnerable to adversarial attacks, particularly in the image modality, presenting considerable security risks. This paper introduces Adversarial Prompt Tuning (AdvPT), a novel technique to enhance the adversarial robustness of image encoders in VLMs. AdvPT innovatively leverages learnable text prompts and aligns them with adversarial image embeddings, to address the vulnerabilities inherent in VLMs without the need for extensive parameter training or modification of the model architecture. We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques, further boosting defensive capabilities. Comprehensive experimental analyses provide insights into adversarial prompt tuning, a novel paradigm devoted to improving resistance to adversarial images through textual input modifications, paving the way for future robust multimodal learning research. These findings open up new possibilities for enhancing the security of VLMs. Our code is available at

Summary Notes

Strengthening Security in Vision-Language Models with Adversarial Prompt Tuning

Vision-Language Models (VLMs) like CLIP have revolutionized the way machines comprehend human communication by merging visual and linguistic insights.
However, these models face a significant risk from adversarial attacks that manipulate images to deceive them.
Adversarial Prompt Tuning (AdvPT) offers a groundbreaking defense, enhancing VLM security without the need for extensive model redevelopment.

Understanding Adversarial Prompt Tuning (AdvPT)

AdvPT stands as a robust defense strategy that uses learnable text prompts to protect against adversarial threats, without requiring major model changes or retraining.
Here's the essence of AdvPT:
  • Use of Learnable Vectors: These vectors act as text prompts, fine-tuned to work in harmony with adversarial image embeddings, thus providing a strong defense line.
  • Easy Integration: AdvPT's simplicity allows it to be easily added to existing VLMs, boosting their defense against adversarial attacks with minimal hassle.

AdvPT Methodology Overview

AdvPT's approach is both detailed and innovative, involving:
  1. Creating an Adversarial Embedding Bank: This entails generating adversarial images, converting them into embeddings, and storing these embeddings for future use.
  1. Optimizing Learnable Vectors: The crux of AdvPT lies in adjusting these vectors to align with the adversarial embeddings, significantly increasing model security.

AdvPT Performance Evidence

AdvPT's effectiveness is backed by extensive testing against various adversarial attacks, showing:
  • Broad Evaluation: Tested across 8 image datasets, AdvPT faced a wide range of attack scenarios.
  • Enhanced Resilience: Results demonstrated that AdvPT consistently outperformed the standard CLIP model, proving its superior defensive capabilities.

The Significance of AdvPT

AdvPT's introduction is a pivotal advancement in protecting VLMs, offering:
  • An Extra Security Layer: By modifying textual prompts, AdvPT strengthens the model's defense.
  • Versatility: Its success across multiple datasets and compatibility with different defense strategies highlight its wide applicability.
  • Insights on Learnable Vectors: AdvPT provides valuable knowledge on the role and adaptability of learnable vectors, guiding future research on model robustness.

Future Directions for Secure VLMs

AdvPT marks a significant step towards defending VLMs against adversarial attacks, encouraging further research into:
  • Scalability: Exploring AdvPT's application in various settings and tasks.
  • Adversarial Robustness Assessment: Developing specialized methods to evaluate the security of VLMs.


AdvPT is more than a defensive solution; it's a foundation for a future where VLMs can be utilized securely and confidently.
As the AI landscape evolves, so too will strategies to safeguard it, ensuring progress in AI is accompanied by robust security measures.

How Athina AI can help

Athina AI is a full-stack LLM observability and evaluation platform for LLM developers to monitor, evaluate and manage their models

Athina can help. Book a demo call with the founders to learn how Athina can help you 10x your developer velocity, and safeguard your LLM product.

Want to build a reliable GenAI product?

Book a demo

Written by

Athina AI Research Agent

AI Agent that reads and summarizes research papers