Post-Semantic-Thinking: A Robust Strategy to Distill Reasoning Capacity from Large Language Models

Post-Semantic-Thinking: A Robust Strategy to Distill Reasoning Capacity from Large Language Models
Do not index
Do not index
Original Paper


Chain of thought finetuning aims to endow small student models with reasoning capacity to improve their performance towards a specific task by allowing them to imitate the reasoning procedure of large language models (LLMs) beyond simply predicting the answer to the question. However, the existing methods 1) generate rationale before the answer, making their answer correctness sensitive to the hallucination in the rationale; 2) force the student model to repeat the exact LLMs rationale expression word-after-word, which could have the model biased towards learning the expression in rationale but count against the model from understanding the core logic behind it. Therefore, we propose a robust Post-Semantic-Thinking (PST) strategy to generate answers before rationale. Thanks to this answer-first setting, 1) the answering procedure can escape from the adverse effects caused by hallucinations in the rationale; 2) the complex reasoning procedure is tightly bound with the relatively concise answer, making the reasoning for questions easier with the prior information in the answer; 3) the efficiency of the method can also benefit from the setting since users can stop the generation right after answers are outputted when inference is conducted. Furthermore, the PST strategy loose the constraint against the generated rationale to be close to the LLMs gold standard in the hidden semantic space instead of the vocabulary space, thus making the small student model better comprehend the semantic reasoning logic in rationale. Extensive experiments conducted across 12 reasoning tasks demonstrate the effectiveness of PST.


In the field of artificial intelligence (AI), the drive to develop efficient and powerful models is relentless. For AI engineers in large corporations, the goal is to create compact models capable of sophisticated reasoning without consuming excessive resources. Enter Post-Semantic-Thinking (PST), a novel approach aimed at improving smaller models' reasoning abilities by drawing insights from large language models (LLMs).
PST departs from conventional methods by prioritizing answers over rationales, a change that enhances answer accuracy and simplifies the logic understanding process, marking a leap forward in training AI models.

Key Features of Post-Semantic-Thinking (PST)

PST focuses on enhancing the reasoning of smaller AI models by distilling knowledge from their larger counterparts.
It addresses common limitations of traditional techniques, such as fixed rationale expressions and the generation of rationales before answers, which can cloud the reasoning process. The PST approach includes:
  • Generating answers before rationales to reduce errors.
  • Allowing more flexibility in rationale generation, prioritizing semantic similarity over exact wording.
This method not only boosts efficiency by stopping generation after producing an answer but also helps models understand the core of semantic reasoning logic.

The Role of Chain of Thought (CoT) Finetuning

While CoT finetuning has improved large models' performance on complex reasoning tasks, its effectiveness on smaller models has been limited.
PST proposes a solution by:
  • Reversing the order of rationale and answer generation.
  • Learning rationales in a way that focuses on the logic of semantic reasoning.

PST Methodology

The PST process involves extracting rationales from LLMs and fine-tuning smaller models with these rationales and answers. It emphasizes:
  • A unique token to align semantics.
  • Applying semantic similarity loss to maintain a focus on reasoning logic.

PST's Experimental Success

PST has been tested across 12 reasoning tasks, showcasing its effectiveness by surpassing traditional methods like the prefix mechanism and Pre-Thinking in most tasks.
Its resilience against varied rationale expressions and errors highlights its potential to transform AI model training.

PST's Efficiency and Challenges

PST offers significant benefits, especially in faster inference speeds due to its answer-first approach.
However, it's important to consider its challenges:
  • Possible loss of information when simplifying rationales.
  • Increased training time due to additional computation steps.
  • Slightly higher costs for creating quality rationales.

Ethical Considerations

When applying PST, it's essential to address the potential for inheriting biases from LLMs.
Strategies such as using guided prompts and researching ways to minimize LLM biases are crucial for ethical AI development.


Post-Semantic-Thinking presents a compelling strategy for improving reasoning in smaller AI models efficiently and effectively.
By focusing on answers before rationales and emphasizing semantic logic, PST overcomes many limitations of existing methods.
As AI evolves, approaches like PST will be vital in advancing the capabilities of compact models, ensuring they play an integral role in the AI landscape of large enterprises. Ongoing research into its challenges and ethical implications will be crucial for maximizing PST's impact and ensuring its responsible use.

Athina can help. Book a demo call with the founders to learn how Athina can help you 10x your developer velocity, and safeguard your LLM product.

Want to build a reliable GenAI product?

Book a demo

Written by

Athina AI Research Agent

AI Agent that reads and summarizes research papers