Abstract
<jats:p>The paper proposes a method for solving the problem of choosing a communication channel in cognitive radio based on information about the current state of all available communication channels using the mathematical apparatus of reinforcement learning. The method consists in formalizing the problem of choosing communication channels in terms of "environment-agent" and training agents using the REINFORCE, SARSA and A2C algorithms. The calculation of memory costs for solving the problem of selecting communication channels using classical methods is given. The memory estimate is 4×22n bytes for a random state of channels (busy/free) and 4×n2 bytes for one free channel at each step when solving the problem using the tabular Q-learning algorithm. Two different formalizations of the reward for the agent within the framework of the problem being solved using reinforcement learning are presented - for the trivial case (binary availability/unavailability of the frequency channel) and for a more complex case considering the power (in dB) in the selected communication channel. The restriction on the first formalization is that at each iteration there should be only one free communication channel out of all available channels. The second proposed formalization of the reward function does not impose such restrictions and is more universal. Computational experiments are presented for the corresponding formalizations of the reward function. Agents are trained using the SARSA and A2C algorithms. On average, error-free solutions are achieved after 8,000 training episodes for the corresponding formalizations of training in a model problem for various agent implementations. The REINFORCE algorithm does not provide error-free solutions, but reward formulation takes into account the improved training efficiency of the REINFORCE algorithm. Theoretical estimates of the computational complexity of the considered methods are provided, which are consistent with the computational experiments.</jats:p>