Optimizing the YOLO model for NPU operation

Authors: Maksim Aleksandrovich Kukushkin, Peter Alexandrovich Ukhov

Publication: Vestnik of Astrakhan State Technical University. Series: Management, computer science and informatics

Published: Apr 27, 2026

Source: Crossref

Back to Search View Original Cite This Article

Abstract

<jats:p>The problem of optimizing the deployment of modern computer vision models on small-space embedded systems using specialized neuroprocessors (NPU) is being solved. The target platform is the Orange Pi 5 single-board computer based on the Rockchip RK3588S system-on-chip, which integrates a 6-TOPS NPU. The study encompasses the complete pipeline for adapting the YOLOv11 architecture to embedded execution from operator compatibility analysis and structural model modifications to meet hardware constraints, to the implementation of a real-time, high-throughput video processing pipeline. We present a detailed methodology for converting models from PyTorch to the vendor-specific RKNN format using post-training quantization to INT8 precision, which delivers substantial inference acceleration and memory footprint reduction with minimal accuracy loss. To overcome the inherently blocking nature of NPU inference, we propose a multiprocessing video processing architecture that employs parallel worker processes. Through extensive experimentation, we identify the optimal number of concurrent processes for different YOLOv11 variants (n, s, m). Our implementation achieves 54 FPS for YOLOv11-n, 48 FPS for YOLOv11-s, and 27 FPS for YOLOv11-m at 640 × 640 input resolution. Crucially, we demonstrate that exceeding the optimal process count saturates memory bandwidth, increases SoC temperature, and reduces energy efficiency without improving throughput. These findings validate the feasibility of building cost-effective, energy-efficient, and high-performance computer vision systems using widely available single-board computers. The results are directly applicable to real-time use cases such as autonomous drones, robotics, smart surveillance, and edge AI applications where low latency and hardware accessibility are critical factors.</jats:p>

Keywords

computer using vision models embedded

Optimizing the YOLO model for NPU operation

Abstract

Keywords

Related Articles

Introducing a conceptual model of brand orientation within the context of Social Entrepreneurial Businesses

Barriers to Adaptation of Information and Communication Technologies : An Ecological Model Perspective

Optimizing Registered Nurse Roles in the Delivery of Cancer Survivorship Care within Primary Care Settings

USING OF LAB ANIMALS AS A MODEL TO STUDY THE DRUG TOXICITY IN COVID-19

Case-Based Reasoning Augmented Large Language Model Framework for Decision Making in Realistic Safety-Critical Driving Scenarios