The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results. MLflow Tracking lets you log and query experiments using Python, REST, R API, and Java API APIs.
In most cases, if the model is created using well-known libraries, it should be simple to integrate MLflow by just calling the method:
Integration with YOLOv5
YOLOv5 is an open-source library that is available at: https://github.com/ultralytics/yolov5
As YOLOv5 is already integrated with commercial tracking engines such as W&B (wandb), we wanted to experience its integration with freely available MLflow and compare the differences.
Unfortunately, in the case of YOLOv5, we were not able to use autologging as the automatic logging for PyTorch requires the model to be trained using PyTorch Lightning. This was not the case for YOLOv5, which meant we had to get deeper into the code and manually call MLflow APIs.
YOLOv5 is written pretty much from scratch and only uses torch abilities in certain areas. This helped us here, as it was easy to pinpoint the location where we had to call our MLflow APIs.
- Step 1: Start and Stop MLflow tracking
- Step 2: When MLflow is started, log the necessary params
- Step 3: When MLflow is started, log the necessary metrics for each number of steps. The metrics are shown later in a diagram with the steps as its axis. In this case, each epoch is considered as one step.
- Step 4: When MLflow is started, log the model, if necessary.
- Note: scikit-learn, torch and keras models can be logged by log_model, but this was not possible with YOLOv5. Instead, we log the models as MLflow artifacts.
MLflow Tracking Server
By default, MLflow will place the tracked information in a local folder (based on the working directory) named “mlruns”, and a local tracking server can be started using these data by running this command:
However, this was causing issues in our case, as multiple developers had worked on the same repository and multiple trainings were carried out simultaneously. This meant that each training would modify MLflow files within the project, without considering any other training experiments, and ultimately messing with the mlrun structure.
On the other hand, we could use a unified tracking server (available via a domain or IP) and then link our trackers to the server concerned. An independent MLflow tracking server can be easily run with the following command (Here):
mlflow server --backend-store-uri sqlite:///mlflow.db --serve-artifacts --host 0.0.0.0 --port 8888
- Step 5: Before MLflow is started, set its tracking server.
MLflow is a very handy tool when it comes to monitoring and comparing the results of our training experiments. The remote tracking server is easy to set up, does not require considerable resources, and is very powerful.
On the other hand, by unifying the experiment names, several teams can collaborate on the same remote tracking server, which is of course very useful for our purposes here.