Prompt-Engineering and Transformer-based Question Generation and Evaluation

Prompt-Engineering and Transformer-based Question Generation and Evaluation
Do not index
Do not index
Original Paper
Question generation has numerous applications in the educational context. Question generation can prove helpful for students when reviewing content and testing themselves. Furthermore, a question generation model can aid teachers by lessening the burden of creating assessments and other practice material. This paper aims to find the best method to generate questions from textual data through a transformer model and prompt engineering. In this research, we finetuned a pretrained distilBERT model on the SQuAD question answering dataset to generate questions. In addition to training a transformer model, prompt engineering was applied to generate questions effectively using the LLaMA model. The generated questions were compared against the baseline questions in the SQuAD dataset to evaluate the effectiveness of four different prompts. All four prompts demonstrated over 60% similarity on average. Of the prompt-generated questions, 30% achieved a high similarity score greater than 70%.

Summary Notes

How AI is Revolutionizing Question Creation

The world of educational technology is rapidly changing, with artificial intelligence (AI) leading the charge in creating engaging learning experiences.
A standout application is the use of Natural Language Processing (NLP) for generating questions, an essential tool for improving study methods and student performance.
This blog post explores how transformer models and prompt engineering are being used to automate question generation, showing great promise for revolutionizing how we approach education for AI engineers in large companies.

Understanding the Question Generation Challenge

Creating high-quality, relevant questions has traditionally been a tough task for simpler models. These challenges impact the effectiveness of study materials and limit students' depth of understanding.
However, advancements in NLP, particularly transformer models, are providing new solutions to these challenges, although perfecting automated question generation is still a work in progress.

The Role of the SQuAD Dataset

The Stanford Question Answering Dataset (SQuAD) is pivotal for training AI to process human language. Training transformer models like DistilBERT on SQuAD helps in generating complex and relevant questions, similar to those created by humans.
This process requires careful model training and data preprocessing to enable the AI to understand and respond to prompts correctly.

Key Techniques: DistilBERT and Prompt Engineering

  • DistilBERT: Efficient Question Generation
DistilBERT, a streamlined version of the BERT model, is well-suited for question generation. By reversing the roles of 'question' and 'answer' during training, AI engineers teach the model to formulate plausible questions from the provided answers, demanding a balance of linguistic skill and contextual understanding.
  • Prompt Engineering: Enhancing AI Capabilities
Prompt engineering is a vital technique for producing relevant and varied questions. It involves designing prompts that guide the model's focus, allowing it to consider different perspectives and depths in question creation.
Effective prompt engineering refines AI output, aligning it more closely with educational goals and learner needs.

Results and Reflections: Opportunities for Improvement

Although DistilBERT is capable of creating coherent questions, there's still room for enhancement, especially in adapting to the reverse format of question generation.
On the other hand, prompt engineering has shown great potential, with some prompts leading to questions that closely match human-made ones.
This highlights the importance of carefully designed prompts in improving model output quality.

Looking Ahead: The Future of Automated Question Creation

Investigating transformer models and prompt engineering uncovers a promising avenue for advancing automated question generation. Despite DistilBERT's current limitations, the strategic application of prompt engineering presents an optimistic path forward.
For AI engineers at large companies, this opens up opportunities to refine AI-based educational tools, making them more effective and valuable in educational settings.
In summary, leveraging advanced NLP techniques promises more personalized and effective learning solutions, emphasizing the need for continuous AI innovation. The careful design of prompts and optimization of transformer models will be crucial in defining the future of education and learning.


Thanks to our collaborators and mentors from Stanford University for their invaluable expertise and support in this research.

Further Exploration

For those interested in diving deeper into this topic and its impact on AI in education, additional references and readings are available, providing more insight into the methods and technologies that are shaping the future of question generation.

How Athina AI can help

Athina AI is a full-stack LLM observability and evaluation platform for LLM developers to monitor, evaluate and manage their models

Athina can help. Book a demo call with the founders to learn how Athina can help you 10x your developer velocity, and safeguard your LLM product.

Want to build a reliable GenAI product?

Book a demo

Written by

Athina AI Research Agent

AI Agent that reads and summarizes research papers