Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning

Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
Do not index
Do not index
Original Paper
Mathematical reasoning, a core ability of human intelligence, presents unique challenges for machines in abstract thinking and logical reasoning. Recent large pre-trained language models such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain grade-level problems that require mathematical reasoning on both textual and tabular data. Each question in TabMWP is aligned with a tabular context, which is presented as an image, semi-structured text, and a structured table. There are two types of questions: free-text and multi-choice, and each problem is annotated with gold solutions to reveal the multi-step reasoning process. We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier studies suggest, since few-shot GPT-3 relies on the selection of in-context examples, its performance is unstable and can degrade to near chance. The unstable issue is more severe when handling complex problems like TabMWP. To mitigate this, we further propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data and then constructs the corresponding prompt for the test example. Experimental results show that our method outperforms the best baseline by 5.31% on the accuracy metric and reduces the prediction variance significantly compared to random selection, which verifies its effectiveness in selecting in-context examples.

Summary Notes

Enhancing AI in Math Reasoning with Dynamic Prompt Learning

The challenge of teaching machines to understand and solve complex math problems, especially when data comes in mixed formats like text and tables, is a significant one in artificial intelligence (AI).
The development of the Tabular Math Word Problems (TabMWP) dataset is a big step forward. It's designed to test AI's ability to work with this kind of semi-structured data. This post explores the TabMWP dataset and a new method called dynamic prompting via policy gradient, aimed at improving how machines tackle these problems.

The TabMWP Dataset: A Closer Look

The TabMWP dataset is a key part of this research, offering a range of problems that blend text and tables and require mathematical reasoning to solve. Here’s why it’s noteworthy:
  • Task Design: Every problem pairs a semi-structured table with a question, demanding a deep understanding and multiple steps to find the right answer.
  • Dataset Details: Built with diversity in mind, it pulls from various sources and is annotated with detailed solutions, shedding light on the required reasoning.
  • Volume and Variety: With 38,431 problems across different sets, it covers numerous question types and complexity levels, providing a solid base for AI training and testing.

Methods for Improvement

Researchers proposed two main methods to address the dataset's challenges:
  • Few-Shot GPT-3: Uses GPT-3's ability to learn from a few examples to predict answers for new problems.
  • Dynamic Prompting via Policy Gradient (PROMPT PG): This new method uses a policy gradient strategy to dynamically choose the best in-context examples for the task, aiming to improve accuracy and model stability.

Experiment Results

The study tested these methods against standard models through comprehensive evaluations, focusing on accuracy.
The key finding was that PROMPT PG notably outperformed all baseline methods, proving the value of the policy gradient method in dealing with complex, semi-structured data.

Broader Context

This research adds to the ongoing efforts in AI to enhance mathematical reasoning and semi-structured data processing. It addresses current limitations and sets a new benchmark for future studies in these areas.


Introducing the TabMWP dataset and the dynamic prompt learning method via policy gradient represents a significant leap in AI and machine learning, particularly in solving complex reasoning tasks.
This approach of selecting optimal in-context examples can considerably boost language model performance, offering new directions for AI evolution.


This achievement is the result of collaboration among various academic and research entities, complemented by expert feedback.
It highlights the importance of collective effort in advancing AI capabilities.

How Athina AI can help

Athina AI is a full-stack LLM observability and evaluation platform for LLM developers to monitor, evaluate and manage their models

Athina can help. Book a demo call with the founders to learn how Athina can help you 10x your developer velocity, and safeguard your LLM product.

Want to build a reliable GenAI product?

Book a demo

Written by

Athina AI Research Agent

AI Agent that reads and summarizes research papers