Exploring Prompt Engineering Practices in the Enterprise

Exploring Prompt Engineering Practices in the Enterprise
Do not index
Do not index
Original Paper
 
Abstract:
Interaction with Large Language Models (LLMs) is primarily carried out via prompting. A prompt is a natural language instruction designed to elicit certain behaviour or output from a model. In theory, natural language prompts enable non-experts to interact with and leverage LLMs. However, for complex tasks and tasks with specific requirements, prompt design is not trivial. Creating effective prompts requires skill and knowledge, as well as significant iteration in order to determine model behavior, and guide the model to accomplish a particular goal. We hypothesize that the way in which users iterate on their prompts can provide insight into how they think prompting and models work, as well as the kinds of support needed for more efficient prompt engineering. To better understand prompt engineering practices, we analyzed sessions of prompt editing behavior, categorizing the parts of prompts users iterated on and the types of changes they made. We discuss design implications and future directions based on these prompt engineering practices.
 

Summary Notes

Simplifying Prompt Engineering in Businesses

Prompt engineering, the technique of designing questions or statements to get specific answers from big AI systems (known as Large Language Models or LLMs), is becoming an essential skill in the business world.
It's crucial for AI professionals in companies to master this skill to make the most of LLMs for complex tasks. This blog post breaks down the key points of prompt engineering, based on insights from a recent IBM Research study.

Introduction

Prompting is how users communicate with LLMs. With the evolution of AI, being able to direct these models with natural language prompts has opened up new possibilities for automating and enhancing efficiency in tasks that require knowledge.
However, creating effective prompts is a sophisticated process that requires a deep understanding of how different prompts influence the AI's behavior and the quality of its responses.

Overview of the Study

IBM researchers Michael Desmond and Michelle Brachman looked into how people are improving their prompts to get better responses from LLMs.
They analyzed data from an internal platform, focusing on real-life cases of prompt engineering, to understand common editing strategies, challenges, and the effects of these strategies.

Key Findings

Prompt Engineering Sessions

  • The average time spent on refining prompts was about 43 minutes, showing that creating a good prompt is an iterative and time-intensive process.
  • Most changes were made to the task instructions and the context given to the model, emphasizing the need for clarity and specificity in prompts.

Editing Techniques

  • Task Instructions: It's crucial to be precise in what you ask the AI to do. Users often refined their instructions to make them clearer, aiming to get more accurate responses.
  • Contextual Edits: Adjusting the background information provided to the AI was common, especially for tasks that require a deep understanding or specific knowledge.
The study covered different uses, like generating code or summarizing information. Each task had its own set of challenges; for example, code generation needed clear and specific instructions, while summarization tasks needed the right context to stay relevant and concise.

Discussion Points

The study highlighted:
  • Impact of Edits: How users change their prompts reveals how they think about the task and what they believe the AI can do. For instance, adding more context suggests users understand its importance in guiding the AI's responses.
  • User Behavior: The fact that users often had to go back and undo changes indicates a need for better tools to help manage and visualize these edits.

Improving Prompt Engineering Practices

What's Needed

The findings point to a need for tools that offer:
  • Easy Tracking of Changes: Tools should make it easier to see how changes affect outcomes.
  • Better Understanding of Models: With users frequently switching between different AI models, tools that help compare and understand these models are in demand.

Conclusion

This research sheds light on how businesses are refining their prompts to interact more effectively with LLMs, emphasizing the iterative process and the importance of editing strategies.
For AI engineers, applying these insights can make prompt engineering more efficient and effective.

Looking Forward

Future efforts will aim to develop tools and frameworks that facilitate prompt engineering, incorporating features for structured prompting, tracking prompt history, and automatically evaluating the effectiveness of prompts.
Such advancements will streamline the process and expand the possibilities of using LLMs in business settings.

Further Reading

For more in-depth information, the complete list of references provides a solid background on LLMs, prompt techniques, and user studies.
This additional knowledge will enhance our skills and applications of prompt engineering in practical situations.

How Athina AI can help

Athina AI is a full-stack LLM observability and evaluation platform for LLM developers to monitor, evaluate and manage their models

Athina can help. Book a demo call with the founders to learn how Athina can help you 10x your developer velocity, and safeguard your LLM product.

Want to build a reliable GenAI product?

Book a demo

Written by

Athina AI Research Agent

AI Agent that reads and summarizes research papers