Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models

Do not index

Original Paper

Blog URL

https://blog.athina.ai/chain-of-symbol-prompting-elicits-planning-in-large-langauge-models

Original Paper: https://arxiv.org/abs/2305.10276

By: Hanxu Hu, Hongyuan Lu, Huajian Zhang, Yun-Ze Song, Wai Lam, Yue Zhang

Abstract:

In this paper, we take the initiative to investigate the performance of LLMs on complex planning tasks that require LLMs to understand a virtual spatial environment simulated via natural language and act correspondingly in text. We propose a benchmark named Natural Language Planning and Action (Natala) composed of a set of novel tasks: Brick World, NLVR-based Manipulations, and Natural Language Navigation. We found that current popular LLMs such as ChatGPT still lack abilities in complex planning. This arises a question -- do the LLMs have a good understanding of the environments described in natural language, or maybe other alternatives such as symbolic representations are neater and hence better to be understood by LLMs? To this end, we propose a novel method called CoS (Chain-of-Symbol Prompting) that represents the complex environments with condensed symbolic spatial representations during the chained intermediate thinking steps. CoS is easy to use and does not need additional training on LLMs. Extensive experiments indicate that CoS clearly surpasses the performance of the Chain-of-Thought (CoT) Prompting in all three planning tasks with even fewer tokens used in the inputs compared with CoT on ChatGPT and InstructGPT. The performance gain is strong, by up to 60.8% accuracy (from 31.8% to 92.6%) on Brick World for ChatGPT. CoS also reduces the number of tokens in the prompt obviously, by up to 65.8% of the tokens (from 407 to 139) for the intermediate steps from demonstrations on Brick World. Code and data available at:
this https URL

Summary Notes

Boosting AI's Spatial Understanding with Chain-of-Symbol Prompting

In the field of artificial intelligence (AI), technologies like Large Language Models (LLMs) such as ChatGPT have significantly advanced our ability to generate human-like text.

However, these models often struggle with spatial reasoning and planning—key for understanding and interacting with the physical world. This is particularly evident in tasks that require knowledge of space and object manipulation.

Chain-of-Symbol Prompting (CoS) is an innovative approach aimed at improving this aspect. By converting detailed natural language into simple, symbolic forms, CoS enhances LLMs' ability to reason about space. This blog examines CoS's mechanics, its implications for AI, and its potential impact on spatially dependent industries.

Understanding Spatial Tasks

Spatial reasoning covers tasks like navigating environments and manipulating objects following certain rules. We focus on three tasks to test LLMs:

Brick World: Involves manipulating bricks based on specific instructions.

NLVR-based Manipulation: Uses the Natural Language Visual Reasoning dataset to rearrange objects in boxes as per given rules.

Natural Language Navigation: Entails finding the shortest route in a virtual space using landmarks.

These tasks highlight the challenges of spatial reasoning and the limitations of LLMs when relying on natural language alone.

The Chain-of-Symbol Method

Chain-of-Symbol Prompting is a breakthrough in overcoming these challenges. Its key features include:

Enhanced LLM Performance: CoS uses symbolic representations to boost accuracy and reduce computational needs, leading to cost savings.

Versatility: Shows strong performance across different tasks and languages, highlighting its adaptability.

Efficient Representation: CoS's symbols effectively capture spatial relationships, making it easier for LLMs to process and understand.

Testing and Findings

Experiments with models like ChatGPT across various spatial tasks compared different prompting methods: zero-shot Chain-of-Thought (CoT), few-shot CoT, and few-shot CoS. Results showed:

Brick World: CoS performed better than CoT, particularly with complex instructions.

NLVR-based Manipulation & Natural Language Navigation: CoS was more accurate and efficient than CoT.

Spatial QA: On the SPARTUN dataset, CoS surpassed traditional methods, proving its effectiveness in realistic scenarios.

The Takeaway

Chain-of-Symbol Prompting stands out as a potent method to enhance LLMs' spatial reasoning, offering a more cost-effective prompting technique. Its development represents a significant advancement in AI and natural language processing (NLP).

Wider Implications

Beyond academia, CoS's ability to improve AI’s spatial understanding could revolutionize robotics, navigation, and gaming. This advancement could make AI technologies more intuitive, laying the groundwork for innovations that change our lives and industries.

In essence, Chain-of-Symbol Prompting is a major leap toward equipping Large Language Models with a better grasp of the physical world.

Its ongoing refinement holds promise for exciting applications across various fields, envisioning a future where AI seamlessly navigates and interacts with both the spatial and linguistic domains.

How Athina AI can help

Athina AI is a full-stack LLM observability and evaluation platform for LLM developers to monitor, evaluate and manage their models

Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models

Summary Notes

Boosting AI's Spatial Understanding with Chain-of-Symbol Prompting

Understanding Spatial Tasks

The Chain-of-Symbol Method

Testing and Findings

The Takeaway

Wider Implications

How Athina AI can help

Want to build a reliable GenAI product?

Related posts

Efficient Prompting via Dynamic In-Context Learning

The Web Can Be Your Oyster for Improving Large Language Models

TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding

Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models

Summary Notes

Boosting AI's Spatial Understanding with Chain-of-Symbol Prompting

Understanding Spatial Tasks

The Chain-of-Symbol Method

Testing and Findings

The Takeaway

Wider Implications

How Athina AI can help

Want to build a reliable GenAI product?

Related posts

Efficient Prompting via Dynamic In-Context Learning

The Web Can Be Your Oyster for Improving Large Language Models

TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding

Join 2000+ AI engineers