YOLO introduction
YOLO is an acronym for "You Only Look Once". It describes an open-source algorithm that uses a neural network for real-time object detection. The algorithm works by taking an image and outputting the object's classification, its confidence, and bounding box. All of the operations are performed in a single iteration, unlike other detection algorithms, which need to iterate multiple times over the same image, for example, algorithms that use Region Proposal Networks. In this article, we compare the detection quality of two types of YOLOv7, which is currently the latest, fastest, and most accurate iteration of the YOLO line. More information on the inner workings of YOLOv7 here.
For the comparison, we chose YOLOv7 and YOLOv7-tiny because YOLOv7 is capable of being run on lightweight hardware (e.g. Jetson) but not at a high frame rate. Our tests help to choose the worthiness of sacrificing detection quality over speed. Different environments and distances were used to find out the shortcomings of YOLOv7-tiny.

Test setup
Testing was done with a Luxonis Depthai OAK-D Lite camera [2], detection with Python YOLOv7_package (version 0.0.11) with pretrained models from PyPI [3], and ROS2 Humble [4] was used for data communication.
Depthai OAK-D lite camera specifications | |
---|---|
DFOV / HFOV / VFOV | 86° / 73° / 58° |
Resolution | 480P (640x480) |
Effective Focal Length | 1/7.5 inch |
Pixel size | 1.3mm |
Lens size | 3µm x 3µm |
Detection quality comparison
Light setting
Detecting a lying person in a well-lit environment was accurate for YOLOv7, scoring above 80% confidence in most of the tests, with some deviations to around 75% in some settings. YOLOv7-tiny scores from 50% to 89%, depending on the number of obstructions used.

With obstruction




From a higher position



Good lighting, hallway, measuring distances
YOLOv7-tiny was worse at detecting parts of a person on the edge of the camera's field of view and at a distance of about 20 meters.




Light setting (shadow), outside, measuring distances
YOLOv7-tiny accuracy starts falling at distances further than about 20 meters, and it was unable to reliably detect further than 30 meters. At 15 meters, YOLOv7 and YOLOv7-tiny detection accuracy has negligible differences.





Good lighting, backlight hallway


Dark setting
Detection accuracy falls drastically in the dark. YOLOv7 was noticeably better at detecting objects in darkness than YOLOv7-tiny. The lying person was rarely detected, while the sitting person was detected either at low accuracy or every few frames, depending on conditions like camera position, lighting, and obstructions. Some positions are easier to detect than others.




From a higher position








Poor lighting, backlit hallway
If the camera has poor dynamic range, backlit settings can wash out the detection object and lose accuracy, which was fixed by having more even lighting. We can see that YOLOv7 performs better in these conditions than YOLOv7-tiny.



Movement
Movement increases motion blur, which decreases detection confidence. Having more light (or a better camera) reduces motion blur.


Speed and hardware
Lenovo Legion 5 | Jetson Xavier NX | |
Power mode | High-performance | Custom 20W power mode for maximum performance |
GPU | GeForce RTX 3050 Mobile
|
NVIDIA Volta
|
CPU | AMD Ryzen 5 5600H
|
NVIDIA Carmel Arm v8.2
|
RAM | DDR4
|
LPDDR4x
|
YOLOv7 | 33 FPS | 12 FPS (GPU bottleneck) |
YOLOv7-tiny | 67 FPS | 26 FPS (CPU bottleneck) |
Summary
- YOLOv7-tiny detected faster and was less resource-intensive at the cost of accuracy when compared to YOLOv7. While using the laptop, both videos were running at 30 FPS (limited by the camera), but the detection itself for YOLOv7-tiny took 0.015 ms or 67 FPS, and for YOLOv7, the detection took 0.030 ms or 33 FPS. Tests were also performed on a Jetson Xavier NX: YOLOv7 was limited by its GPU performance (GPU utilization was at 100%) running at 12 FPS, YOLOv7-tiny was limited by its CPU performance (GPU utilization was around 50% while a CPU core was at 100%) running at 26 FPS (having a faster reaction time).
- Dim environments decrease the camera's shutter speed and increase ISO, which accordingly increase motion blur and image noise, making both algorithms unsatisfactory for detecting a lying or fallen person (even if something was detectable, YOLOv7-tiny had worse confidence). Good lighting conditions (or better cameras) are necessary for better accuracy, especially while camera movement is involved.
- In a well-lit environment, detection confidence was lower for someone lying down when compared to being upright.
- In an ideal environment (good lighting, distance between 2-20 meters, upright person), both detection algorithms have a similar detection accuracy.
- In a harsh environment (low light, too close or further distances, lying person), YOLOv7 outperforms YOLOv7-tiny.
- Given good lighting conditions and distances of less than 20 meters, YOLOv7-tiny could be used for faster detection if speed is needed and hardware is limited. YOLOv7 should be used in harsher conditions or when the hardware is capable of running it.
Literature
[1] Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark, "Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors", 2022
[2] B. Gilles, S. McLaughlin, V. Vyskcil, B. Dillon, T. Rassavong, "Luxonis", DepthAI Hardware
[3] Python Software Foundation, PyPI, M. Volkovskiy, "yolov7-package"
[4] S. Macenski, T. Foote, B. Gerkey, C. Lalancette, W. Woodall, “Robot Operating System 2: Design, architecture, and uses in the wild,” Science Robotics vol. 7, May 2022.