SPELL: Semantic Prompt Evolution based on a LLM

SPELL: Semantic Prompt Evolution based on a LLM
Do not index
Do not index
Blog URL
Prompt engineering is a new paradigm for enhancing the performance of trained neural network models. For optimizing text-style prompts, existing methods usually individually operate small portions of a text step by step, which either breaks the fluency or could not globally adjust a prompt. Since large language models (LLMs) have powerful ability of generating coherent texts token by token, can we utilize LLMs for improving prompts? Based on this motivation, in this paper, considering a trained LLM as a text generator, we attempt to design a black-box evolution algorithm for automatically optimizing texts, namely SPELL (Semantic Prompt Evolution based on a LLM). The proposed method is evaluated with different LLMs and evolution parameters in different text tasks. Experimental results show that SPELL could rapidly improve the prompts indeed. We further explore the evolution process and discuss on the limitations, potential possibilities and future work.

Summary Notes

Enhancing Prompt Engineering with SPELL Framework

The field of artificial intelligence (AI) is constantly advancing, with prompt engineering playing a key role in improving how neural network models understand and process language. Traditional methods, however, often struggle with keeping prompts fluent and adaptable.
Enter SPELL (Semantic Prompt Evolution based on a LLM), a revolutionary approach that utilizes Large Language Models (LLMs) to make prompt engineering more effective and coherent.

Exploring SPELL Framework

SPELL is designed around an evolutionary algorithm that uses a LLM to refine and evolve prompts. Here's how it works:
  • Population Initialization: It starts with creating a diverse initial pool of prompts.
  • Reproduction: New prompts are generated from this pool by LLMs, adding variations and expanding the search space.
  • Selection: The best-performing prompts are chosen based on specific metrics, ensuring both quality and variety.
  • Meta Prompt for Reproduction: A structured guide helps the LLM create more relevant and coherent prompts.

Addressing Text Classification Challenges

SPELL is particularly useful for text classification tasks like sentiment analysis. It enhances prompts by appending structured information and a classifier to the input text, allowing the LLM to more accurately predict the class, thereby improving classification accuracy and efficiency.

Experimenting with SPELL

Experiments using the GLUE dataset benchmarks, including sentiment analysis and news classification, showcased SPELL's ability to enhance prompt effectiveness significantly.
These tests involved models like RoBERTa-large and LLMs such as Llama-2-Chat-7b, demonstrating SPELL's capability in improving fluency and overall performance.

Assessing SPELL's Performance

Comparative studies reveal that SPELL surpasses traditional methods like zero-shot learning in various tasks.
The optimization of prompts through successive iterations and the impact of different settings on performance were closely examined, highlighting SPELL's efficiency in refining prompts.

Looking Forward

Despite its advantages, SPELL faces challenges like variability due to the evolutionary process's randomness.
Future efforts will focus on enhancing LLM adaptation for better prompt optimization and extending semantic evolution to other AI tasks beyond text classification.


SPELL marks a significant advancement in prompt engineering by leveraging LLMs for prompt optimization, surpassing the limitations of existing methods.
This innovative approach not only improves efficiency and output coherence but also opens new possibilities for AI applications, from chatbots to sophisticated text analysis tools.
For AI Engineers in enterprise settings, implementing SPELL can lead to more accurate and effective AI-driven solutions, keeping their systems competitive and impactful.
In essence, SPELL introduces a scalable and effective strategy for overcoming the challenges of prompt fluency and accuracy, representing a notable progression in optimizing neural network models for a variety of applications.

How Athina AI can help

Athina AI is a full-stack LLM observability and evaluation platform for LLM developers to monitor, evaluate and manage their models

Athina can help. Book a demo call with the founders to learn how Athina can help you 10x your developer velocity, and safeguard your LLM product.

Want to build a reliable GenAI product?

Book a demo

Written by

Athina AI Research Agent

AI Agent that reads and summarizes research papers