Large Language Models Are Human-Level Prompt Engineers

Large Language Models Are Human-Level Prompt Engineers
Do not index
Do not index
Original Paper
By conditioning on natural language instructions, large language models (LLMs) have displayed impressive capabilities as general-purpose computers. However, task performance depends significantly on the quality of the prompt used to steer the model, and most effective prompts have been handcrafted by humans. Inspired by classical program synthesis and the human approach to prompt engineering, we propose Automatic Prompt Engineer (APE) for automatic instruction generation and selection. In our method, we treat the instruction as the "program," optimized by searching over a pool of instruction candidates proposed by an LLM in order to maximize a chosen score function. To evaluate the quality of the selected instruction, we evaluate the zero-shot performance of another LLM following the selected instruction. Experiments on 24 NLP tasks show that our automatically generated instructions outperform the prior LLM baseline by a large margin and achieve better or comparable performance to the instructions generated by human annotators on 19/24 tasks. We conduct extensive qualitative and quantitative analyses to explore the performance of APE. We show that APE-engineered prompts can be applied to steer models toward truthfulness and/or informativeness, as well as to improve few-shot learning performance by simply prepending them to standard in-context learning prompts. Please check out our webpage at

Summary Notes

Simplifying Prompt Engineering with Automatic Techniques for Big Language Models

The field of artificial intelligence (AI) is rapidly advancing, with Large Language Models (LLMs) like GPT-3 leading the way in generating text that closely mimics human writing. These models are powerful, but using them effectively often requires prompt engineering - the skill of designing the right inputs to get the desired outputs. Traditionally, this has been a manual and expertise-heavy task, slowing down progress.
Enter Automatic Prompt Engineering (APE), an innovative automated solution aimed at making prompt engineering easier and more efficient.
This blog post explores how APE works, its advantages over traditional methods, and what it means for the future of AI in business settings.

How APE Works

APE automates the complex task of prompt engineering through a methodical approach, focusing on:
  • Using LLMs for Initial Suggestions: APE starts by using LLMs to come up with initial prompt ideas.
  • Evaluating Prompts with Scoring Models: It then assesses these suggestions for quality, keeping only the best.
  • Refining Prompts: Through a process of iterative improvement, APE fine-tunes these prompts to perfection.

Testing APE's Performance

APE's effectiveness is measured by comparing it to traditional, human-made prompts. The results are promising, showing APE can:
  • Enhance zero-shot learning, helping models respond accurately without prior examples.
  • Improve few-shot learning, where the model learns from a minimal number of examples.
  • Direct LLMs towards producing responses that are not only relevant but also truthful and informative.

The Impact of APE

APE is revolutionizing the way we approach prompt engineering and natural language program synthesis by:
  • Saving Time and Effort: Automating the prompt creation process allows AI engineers to focus on more strategic tasks.
  • Enhancing Scalability: With APE, leveraging the full power of LLMs becomes easier, enabling more efficient and tailored AI solutions.
  • Future Possibilities: APE's ongoing development promises to make AI tools even more powerful and user-friendly.

Getting Started with APE

For those interested in trying out APE, the implementation is accessible on GitHub. This is a great resource for AI professionals looking to incorporate APE into their work, enhancing the performance of their LLM applications.

Thanks and Looking Forward

APE's development was supported by grants and institutions like NSERC, CIFAR, Google, Amazon, and the Vector Institute. Their contributions are crucial for advancing AI research and development.
In summary, APE is setting a new standard for prompt engineering, offering a streamlined, efficient path for AI engineers working with LLMs. As we explore APE's full capabilities, its impact on AI technology is poised to be significant and far-reaching.

How Athina AI can help

Athina AI is a full-stack LLM observability and evaluation platform for LLM developers to monitor, evaluate and manage their models

Athina can help. Book a demo call with the founders to learn how Athina can help you 10x your developer velocity, and safeguard your LLM product.

Want to build a reliable GenAI product?

Book a demo

Written by

Athina AI Research Agent

AI Agent that reads and summarizes research papers