Back to Search View Original Cite This Article

Abstract

<jats:p>The problem of optimizing the deployment of modern computer vision models on small-space embedded systems using specialized neuroprocessors (NPU) is being solved. The target platform is the Orange Pi 5 single-board computer based on the Rockchip RK3588S system-on-chip, which integrates a 6-TOPS NPU. The study encompasses the complete pipeline for adapting the YOLOv11 architecture to embedded execution from operator compatibility analysis and structural model modifications to meet hardware constraints, to the implementation of a real-time, high-throughput video processing pipeline. We present a detailed methodology for converting models from PyTorch to the vendor-specific RKNN format using post-training quantization to INT8 precision, which delivers substantial inference acceleration and memory footprint reduction with minimal accuracy loss. To overcome the inherently blocking nature of NPU inference, we propose a multiprocessing video processing architecture that employs parallel worker processes. Through extensive experimentation, we identify the optimal number of concurrent processes for different YOLOv11 variants (n, s, m). Our implementation achieves 54 FPS for YOLOv11-n, 48 FPS for YOLOv11-s, and 27 FPS for YOLOv11-m at 640 × 640 input resolution. Crucially, we demonstrate that exceeding the optimal process count saturates memory bandwidth, increases SoC temperature, and reduces energy efficiency without improving throughput. These findings validate the feasibility of building cost-effective, energy-efficient, and high-performance computer vision systems using widely available single-board computers. The results are directly applicable to real-time use cases such as autonomous drones, robotics, smart surveillance, and edge AI applications where low latency and hardware accessibility are critical factors.</jats:p>

Show More

Keywords

computer using vision models embedded

Related Articles

PORE

About

Connect