Table of Contents
URL: https://www.progressiverobot.com/yolov8/
Object detection has become one of the most popular and practical uses of AI. A breakthrough came in 2015 with the release of YOLO (You Only Look Once) by Joseph Redmon and his team, which introduced real-time object detection in a single pass. Since then, the YOLO models have continued to improve and inspire further research in deep learning-based detection.
In this article, we’ll go back to the basics, look at what’s new with YOLOv8 from Ultralytics—and show you how to fine-tune a custom YOLOv8 model using Roboflow and GPU cloud servers with the updated Ultralytics API. By the end, you’ll be able to train YOLOv8 on your own labeled image dataset in no time.
Prerequisites
- Python: Basic understanding of Python programming.
- Deep Learning: Familiarity with neural networks, particularly CNNs and object detection.
- PyTorch or TensorFlow: Knowledge of either framework for implementing YOLOv9.
- OpenCV: Understanding of image processing techniques.
- CUDA: Experience with GPU acceleration and CUDA for faster training.
- RoboFlow account: Familiarity with object detection dataset resource RoboFlow.
- Basic Git: For managing code and version control.
How does YOLO work?
(Source)
To start, let's discuss the basics of how YOLO works. Here is a short quote breaking down the sum of the model's functionality from the original YOLO paper:
“A single convolutional network is used to predict multiple bounding boxes along with their class probabilities. Unlike traditional object detection methods, YOLO is trained on entire images and directly optimizes detection performance. This unified approach offers significant advantages in both speed and accuracy.”
As stated above, the model is capable of predicting the location and identifying the subject of multiple entities in an image, provided it has been trained to recognize these features before. It does this in a single stage by separating the image into N grids, each of size s*s. These regions are simultaneously parsed to detect and localize any objects contained within. The model then predicts bounding box coordinates, B, in each grid with a label and prediction score for the object contained within.
Combining these capabilities, YOLO emerges as a powerful technology that can perform object classification, object detection, and image segmentation. Since the core architecture of YOLO remains consistent across versions, this holds true for YOLOv8 as well. For a more detailed explanation of how YOLO functions, you can refer to our previous articles covering YOLOv7 and the original YOLO research paper.
What's new in YOLOv8?
YOLOv8 introduces several significant improvements over its predecessors, making an advancement to the previous version of YOLO. Developed by Ultralytics, YOLOv8 is built with a redesigned architecture that offers better accuracy and speed across various computer vision tasks, including object detection, instance segmentation, pose estimation, and image classification. It features a modular and scalable design, improved training workflows, and support for dynamic input shapes. YOLOv8 also integrates native support for export to popular deployment formats such as ONNX, TensorRT, and CoreML, enabling seamless deployment across diverse platforms. With its focus on ease of use, performance optimization, and compatibility with modern ML pipelines, YOLOv8 sets a new standard for real-time vision models.
Architecture
Credit to the creator: RangeKing
According to the official release, YOLOv8 features a new backbone network, an anchor-free detection head, and a loss function. Github user RangeKing has shared this outline of the YOLOv8 model infrastructure, showing the updated model backbone and head structures. According to a comparison of this diagram with a comparable examination of YOLOv5, RangeKing identified the following changes in their post:
The C2f module, credit to RoboFlow (Source)
- They replaced the
C3module with theC2fmodule. InC2f, all the outputs from theBottleneck(the two 3×3convswith residual connections) are concatenated, but inC3only the output of the lastBottleneckwas used. (Source)
The first Conv of each version. Credit to RangeKing
- They replaced the first
6x6 Convwith a3x3 Convblock in theBackbone - They deleted two of the
Convs (No.10 and No.14 in the YOLOv5 config)
Comparison of the two model backbones. Credit to RangeKing
- They replaced the first
1x1 Convwith a3x3 Convin theBottleneck. - They switched to using a decoupled head, and deleted the
objectnessbranch
Accessibility
In addition to the old methodology of cloning the Github repo and setting up the environment manually, users can now access YOLOv8 for training and inference using the new Ultralytics API. Check out the Training your model section below for details on setting up the API.
Anchor free bounding boxes
YOLOv8 now features the anchor free bounding boxes. In the previous iterations of YOLO, users were required to manually identify these anchor boxes to facilitate the object detection process. These predefined bounding boxes of predetermined size and height capture the scale and aspect ratio of specific object classes in the data set. Calculating the offset from these boundaries to the predicted object helps the model better identify the location of the object.
With YOLOv8, these anchor boxes are automatically predicted at the center of an object.
Stopping the Mosaic Augmentation before the end of training
At each epoch during training, YOLOv8 sees a slightly different version of the images it has been provided. These changes are called augmentations. One of these, Mosaic augmentation, is the process of combining four images, forcing the model to learn the identities of the objects in new locations, partially blocking each other through occlusion, with greater variation on the surrounding pixels. It has been shown that using this throughout the entire training regime can be detrimental to prediction accuracy, so YOLOv8 can stop this process during the final epochs of training. This allows for the optimal training pattern to be run without extending to the entire run.
Efficiency and accuracy
The main reason we are all here is the big boost to performance accuracy and efficiency during both inference and training. The authors at Ultralytics have provided us with some useful sample data that we can use to compare the new release with other versions of YOLO. We can see from the plot above that YOLOv8 outperforms YOLOv7, YOLOv6-2.0, and YOLOv5-7.0 in terms of mean Average Precision, size, and latency during training.
In their respective Github pages, we can find the statistical comparison tables for the different-sized YOLOv8 models. As we can see from the table above, the mAP increases as the size of the parameters, speed, and FLOPs increase. The largest YOLOv5 model, YOLOv5x, achieved a maximum mAP value of 50.7. The 2.2 unit increase in mAP represents a significant improvement in capabilities. This is observed across all model sizes, with the newer YOLOv8 models consistently outperforming YOLOv5, as shown by the data below.
| Model | size (pixels) | mAPval 50-95 | mAPval 50 | Speed CPU b1 (ms) | Speed V100 b1 (ms) | Speed V100 b32 (ms) | params (M) | FLOPs @640 (B) |
|---|---|---|---|---|---|---|---|---|
| YOLOv5n | 640 | 28.0 | 45.7 | 45 | 6.3 | 0.6 | 1.9 | 4.5 |
| YOLOv5s | 640 | 37.4 | 56.8 | 98 | 6.4 | 0.9 | 7.2 | 16.5 |
| YOLOv5m | 640 | 45.4 | 64.1 | 224 | 8.2 | 1.7 | 21.2 | 49.0 |
| YOLOv5l | 640 | 49.0 | 67.3 | 430 | 10.1 | 2.7 | 46.5 | 109.1 |
| YOLOv5x | 640 | 50.7 | 68.9 | 766 | 12.1 | 4.8 | 86.7 | 205.7 |
Overall, we can see that YOLOv8 represents a significant step up from YOLOv5 and other competing frameworks.
Fine-tuning YOLOv8
The process for fine-tuning a YOLOv8 model can be broken down into three steps: creating and labeling the dataset, training the model, and deploying it. In this tutorial, we will cover the first two steps in detail and show how to use our new model on any incoming video file or stream.
Following this demo
In order to follow along, we need to use a GPU powered machine. We recommend accessing one on the cloud, like the cloud provider's GPU Droplets. Once the GPU machine is accessible, we are going to operate under the assumption that the user is working in a Jupyter Notebook. This makes it far easier to execute the code sequentially in this tutorial format.
To follow this demo, clone the following repo using the code snippet below and launch the Jupyter environment.
git clone https://github.com/gradient-ai/YOLOv8-Ballhandler
cd YOLOv8-Ballhandler
jupyter lab
Setting up your dataset
We are going to be recreating the experiment we used for YOLOv7 to compare the two models, so we will be using the Basketball dataset on Roboflow. Since we are using a previously made dataset, we just need to pull the data in for now. Below is the command used to pull the data into a Notebook environment. Use this same process for your own labeled dataset, but replace the workspace and project values with your own to access your dataset in the same manner.
Be sure to change the API key to your own if you want to use the script below to follow the demo in the Notebook.
!pip install roboflow
from roboflow import Roboflow
rf = Roboflow(api_key="")
project = rf.workspace("james-skelton").project("ballhandler-basketball")
dataset = project.version(11).download("yolov8")
!mkdir datasets
!mv ballhandler-basketball-11/ datasets/
Training your model
With the new Python API, we can use the ultralytics library to facilitate all of the work within a Jupyter Notebook environment. We will build our YOLOv8n model from scratch using the provided config and weights. We will then fine-tune it using the dataset we just loaded into the environment using the model.train() method.
from ultralytics import YOLO
# Load a model
model = YOLO("yolov8n.yaml") # build a new model from scratch
model = YOLO("yolov8n.pt") # load a pretrained model (recommended for training)
# Use the model
results = model.train(data="datasets/ballhandler-basketball-11/data.yaml", epochs=10) # train the model
Testing the model
results = model.val() # evaluate model performance on the validation set
We can set our new model to evaluate the validation set using the model.val() method. This will output a nice table showing how our model performed in the output window. Seeing as we only trained here for ten epochs, this relatively low mAP 50-95 is to be expected.
From there, it's simple to submit any photo. It will output the predicted values for the bounding boxes, overlay those boxes to the image, and upload to the 'runs/detect/predict' folder.
from ultralytics import YOLO
from PIL import Image
import cv2
# from PIL
im1 = Image.open("assets/samp.jpeg")
results = model.predict(source=im1, save=True) # save plotted images
print(results)
display(Image.open('runs/detect/predict/image0.jpg'))
We are left with the predictions for the bounding boxes and their labels, printed like this:
[Ultralytics YOLO <class 'ultralytics.yolo.engine.results.Boxes'> masks
type: <class 'torch.Tensor'>
shape: torch.Size([6, 6])
dtype: torch.float32
+ tensor([[3.42000e+02, 2.00000e+01, 6.17000e+02, 8.38000e+02, 5.46525e-01, 1.00000e+00],
[1.18900e+03, 5.44000e+02, 1.32000e+03, 8.72000e+02, 5.41202e-01, 1.00000e+00],
[6.84000e+02, 2.70000e+01, 1.04400e+03, 8.55000e+02, 5.14879e-01, 0.00000e+00],
[3.59000e+02, 2.20000e+01, 6.16000e+02, 8.35000e+02, 4.31905e-01, 0.00000e+00],
[7.16000e+02, 2.90000e+01, 1.04400e+03, 8.58000e+02, 2.85891e-01, 1.00000e+00],
[3.88000e+02, 1.90000e+01, 6.06000e+02, 6.58000e+02, 2.53705e-01, 0.00000e+00]], device='cuda:0')]
These are then applied to the image, like the example below:
As we can see, our lightly trained model shows that it can recognize the players on the court from the players and spectators on the side of the court, with one exception in the corner. More training is almost definitely required, but it's easy to see that the model very quickly gained an understanding of the task.
If we are satisfied with our model training, we can then export the model in the desired format. In this case, we will export an ONNX version.
success = model.export(format="onnx") # export the model to ONNX format
Closing thoughts
In this tutorial, we explored the key updates introduced in Ultralytics’ YOLOv8 model. We examined how its architecture has evolved from YOLOv5 and tested its easy-to-use Python API with our Ballhandler dataset. The results showed that YOLOv8 greatly simplifies the process of fine-tuning object detection models. We also demonstrated its effectiveness in real-world tasks, such as identifying which player holds the ball in an NBA game using a single in-game photo. To run these experiments smoothly, we recommend using GPU cloud servers, which offer the computing power needed to train and deploy YOLOv8 models efficiently.