Benefits of Using LLM for Long-Term Planning with Distilled Subtask Model Compared to End-to-End Reinforcement Learning in the MiniGrid Simulator

Authors: Aleksandar Pluškoski, Igor Ciganović, Miloš Jovanović et al.

Publication: Electronics

Published: May 1, 2026

Source: Crossref

Back to Search View Original Cite This Article

Abstract

<jats:p>Policy learning under delayed reward conditions remains a significant challenge for end-to-end reinforcement learning (RL) agents. The difficulty increases for problems that require long-term planning and the execution of multiple dependent subtasks. As a result, solutions based on a single monolithic policy often suffer from unstable training. One possible solution to this problem could be to delegate the long-term planning to a separate model. This paper presents an implementation comprising two models: a large language model (LLM) for long-term planning and an execution model that solves subtasks. The execution model was trained via distillation from multiple teacher models trained with RL on individual tasks. The results presented in this paper demonstrate the benefits of this approach. By delegating long-term planning to the LLM, the agent can solve more complex problems than end-to-end agents trained with the proximal policy optimization (PPO) algorithm.</jats:p>

Keywords

longterm planning model policy execution

Benefits of Using LLM for Long-Term Planning with Distilled Subtask Model Compared to End-to-End Reinforcement Learning in the MiniGrid Simulator

Abstract

Keywords

Related Articles

Comparing the Performance Evaluation Models of Gas Refineries Using AHP and TOPSIS

A Simplified Approach to Evaluate Retinal Blue Light Hazard Using the Correlated Color Temperature of LED Light Sources

Investigation and Optimization of Air Pollution Risk by a Multi-criteria Decision Making Method Using Fuzzy TOPSIS: A Case Study of Construction Workers

ANALYTICAL CLOUD SYSTEM ERI Digital health analysis using Spectral analysis in AI

Modeling and Transformation of the Evaluation Mechanism of Greek Higher Education Institutes using Balanced Scorecard Technique