Safety

Assessing Prompt Injection Risks in 200+ Custom GPTs

Prompt Injection: Different Attacks and Defensive Techniques

Prompt Engineering

•May 9, 2024

Prompt Injection: Different Attacks and Defensive Techniques

CYBERSECEVAL 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Research Paper

•Apr 18, 2024

CYBERSECEVAL 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

From Noise to Clarity: Unraveling the Adversarial Suffix of Large Language Model Attacks via Translation of Text Embeddings

Research Paper

•Apr 16, 2024

From Noise to Clarity: Unraveling the Adversarial Suffix of Large Language Model Attacks via Translation of Text Embeddings

Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification

Hallucinations Research Paper

•Apr 15, 2024

Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification

The EVER (Real-Time Verification and Rectification) framework is designed to dynamically mitigate hallucinations during text generation by ensuring the accuracy and trustworthiness of each sentence before proceeding.

Prompt Stealing Attacks Against Text-to-Image Generation Models

Research Paper

•Apr 15, 2024

Prompt Stealing Attacks Against Text-to-Image Generation Models

Universal and Transferable Adversarial Attacks on Aligned Language Models

Research Paper

•Apr 14, 2024

Universal and Transferable Adversarial Attacks on Aligned Language Models

AI Safety: Necessary, but insufficient and possibly problematic

Research Paper

•Apr 10, 2024

AI Safety: Necessary, but insufficient and possibly problematic

Many-Shot Jailbreaking (Anthropic Research)

Research Paper

•Apr 6, 2024

Many-Shot Jailbreaking (Anthropic Research)

Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models

Research Paper

•Apr 5, 2024

Safety

Assessing Prompt Injection Risks in 200+ Custom GPTs

Prompt Injection: Different Attacks and Defensive Techniques

CYBERSECEVAL 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

From Noise to Clarity: Unraveling the Adversarial Suffix of Large Language Model Attacks via Translation of Text Embeddings

Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification

Prompt Stealing Attacks Against Text-to-Image Generation Models

Universal and Transferable Adversarial Attacks on Aligned Language Models

AI Safety: Necessary, but insufficient and possibly problematic

Many-Shot Jailbreaking (Anthropic Research)

Breaking Down the Defenses: A Comparative Survey of Attacks on Large Language Models

Join 2000+ AI engineers