YoloV5 hangs in a container

30 views Asked by At

I'm stuck on a problem combining docker, python (3.11) and YOLOV5.

I have reduced the problem to the the following script, which works as expected when debugging under windows 10, but hangs (output copied below) whenever it is run in docker. When run in the debugger it prints finished and awaits ctrl+c, as you would expect. If I omit the infinite loop at the end, it runs to completion as normal, it carries on and prints first sleepy time then finished. It only hangs, part way through model loading if I have the loop afterwards. This is a problem for me, because it should be part of a larger system, which waits for new files to appear and runs the model on them as and when they do (using the watchdog.observers library).

To complicate matters, python isn't really my first language, and the model wasn't created by me.

from time import sleep

import torch
import os
import pathlib


def start_watching():
    print("test no watchdog, yes loop")    

    try:        
        if os.name == 'nt':
            pathlib.PosixPath = pathlib.WindowsPath
        #hardcoded paths for testing only
        model = torch.hub.load('./yolov5-master', 'custom', source='local', path='./best.pt', force_reload=True) 
        print("got model")
        results = model('./240206-154354_wheelset_149_22241687_Image_02.jpg')        
        labels, cord = results.xyxyn[0][:, -1], results.xyxyn[0][:, :-1]
        print(labels,cord)
    except Exception as ex:
        print(ex)
    
    print("sleepy time") 
    sleep(100)
    print("Finished")
    
    waitForFiles = True
    try:
        while waitForFiles:
            sleep(10)
    except KeyboardInterrupt:
        waitForFiles = False
        print("going down")


start_watching()

When I run this in a container however, I get the following output:

test no watchdog, yes loop
YOLOv5  2024-3-5 Python-3.11.6 torch-2.2.1+cpu CPU

Fusing layers...
Model summary: 157 layers, 7012822 parameters, 0 gradients, 15.8 GFLOPs
Adding AutoShape...

It never (no matter how long I leave it) gets beyond Adding AutoShape which is emitted part way through loading the model.

My dockerfile is:

FROM yolobase
#Copy the code in
COPY . .
RUN echo "min not working"

ENTRYPOINT python3 test.py

And the yolobase image referred to above is built from the docker file included with the version of yolo used to build the model (I was told this was important).

# YOLOv5  by Ultralytics, AGPL-3.0 license
# Builds ultralytics/yolov5:latest-cpu image on DockerHub https://hub.docker.com/r/ultralytics/yolov5
# Image is CPU-optimized for ONNX, OpenVINO and PyTorch YOLOv5 deployments

# Start FROM Ubuntu image https://hub.docker.com/_/ubuntu
FROM ubuntu:23.10

# Downloads to user config dir
ADD https://ultralytics.com/assets/Arial.ttf https://ultralytics.com/assets/Arial.Unicode.ttf /root/.config/Ultralytics/

# Install linux packages
# g++ required to build 'tflite_support' and 'lap' packages, libusb-1.0-0 required for 'tflite_support' package
RUN apt update \
    && apt install --no-install-recommends -y python3-pip git zip curl htop libgl1 libglib2.0-0 libpython3-dev gnupg g++ libusb-1.0-0
# RUN alias python=python3

# Remove python3.11/EXTERNALLY-MANAGED or use 'pip install --break-system-packages' avoid 'externally-managed-environment' Ubuntu nightly error
RUN rm -rf /usr/lib/python3.11/EXTERNALLY-MANAGED

# Install pip packages
COPY requirements.txt .
RUN python3 -m pip install --upgrade pip wheel
RUN pip install --no-cache -r requirements.txt albumentations gsutil notebook \
    coremltools onnx onnx-simplifier onnxruntime 'openvino-dev>=2023.0' \
    # tensorflow tensorflowjs \
    --extra-index-url https://download.pytorch.org/whl/cpu

# Create working directory
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app

# Copy contents
COPY . /usr/src/app


# Usage Examples -------------------------------------------------------------------------------------------------------

# Build and Push
# t=ultralytics/yolov5:latest-cpu && sudo docker build -f utils/docker/Dockerfile-cpu -t $t . && sudo docker push $t

# Pull and Run
# t=ultralytics/yolov5:latest-cpu && sudo docker pull $t && sudo docker run -it --ipc=host -v "$(pwd)"/datasets:/usr/src/datasets $t

I do however get the same problem if I use FROM ultralytics/yolov5:latest-cpu

The requirements used by the second dockerfile (i.e. yolobase) are:

# YOLOv5 requirements

# Usage: pip install -r requirements.txt

# Base ------------------------------------------------------------------------
gitpython>=3.1.30
matplotlib>=3.3
numpy>=1.23.5
opencv-python>=4.1.1
Pillow>=9.4.0
psutil  # system resources
PyYAML>=5.3.1
requests>=2.23.0
scipy>=1.4.1
thop>=0.1.1  # FLOPs computation
torch>=1.8.0  # see https://pytorch.org/get-started/locally (recommended)
torchvision>=0.9.0
tqdm>=4.64.0
ultralytics>=8.0.232
# protobuf<=3.20.1  # https://github.com/ultralytics/yolov5/issues/8012

# Logging ---------------------------------------------------------------------
# tensorboard>=2.4.1
# clearml>=1.2.0
# comet

# Plotting --------------------------------------------------------------------
pandas>=1.1.4
seaborn>=0.11.0

# Export ----------------------------------------------------------------------
# coremltools>=6.0  # CoreML export
# onnx>=1.10.0  # ONNX export
# onnx-simplifier>=0.4.1  # ONNX simplifier
# nvidia-pyindex  # TensorRT export
# nvidia-tensorrt  # TensorRT export
# scikit-learn<=1.1.2  # CoreML quantization
# tensorflow>=2.4.0,<=2.13.1  # TF exports (-cpu, -aarch64, -macos)
# tensorflowjs>=3.9.0  # TF.js export
# openvino-dev>=2023.0  # OpenVINO export

# Deploy ----------------------------------------------------------------------
setuptools>=65.5.1 # Snyk vulnerability fix
# tritonclient[all]~=2.24.0

# Extras ----------------------------------------------------------------------
# ipython  # interactive notebook
# mss  # screenshots
# albumentations>=1.0.3
# pycocotools>=2.0.6  # COCO mAP

Note the comments are as downloaded from github, I haven't commented anything out, or added anything in.

1

There are 1 answers

0
HOBE On

This script, adapted from a solution I found stackoverflow, uses the watchdog library to monitor a folder for changes. When a new image is detected, it executes a command to process the image with the YOLOv5 model. The process is supposed to be straightforward: upon detecting a new image, the script should run the detection model and then wait for the next image.

#!/usr/bin/python
import time
import os
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class MyHandler(FileSystemEventHandler):
    def run_yolo(self, path):
        if path.endswith('jpg'):
            cmd = f'python3 detect.py --source {path} --weights yolov5s.pt'
            print("Running command: ", cmd)
            os.system(cmd)

    def on_created(self, event):
        print(f'event type: {event.event_type}  path : {event.src_path}')
        self.run_yolo(event.src_path)

    def on_modified(self, event):
        pass
    
    def on_moved(self, event):
        pass


if __name__ == "__main__":
    event_handler = MyHandler()
    observer = Observer()
    observer.schedule(event_handler, path='imgs/', recursive=False)
    observer.start()

    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

For testing purposes, I downloaded an image using wget(wget https://predictivehacks.com/wp-content/uploads/2019/10/cycling001-1024x683.jpg), which successfully triggered the script. However, I encountered an unexpected behavior: the script executed the detection process three times for the same image. The output clearly shows that the detection command was triggered multiple times, each time detecting the same objects in the image and saving the results to a new directory.

event type: modified  path : imgs
event type: modified  path : imgs/cycling001-1024x683.jpg
Running command:  python3 detect.py --source imgs/cycling001-1024x683.jpg --weights yolov5s.pt
detect: weights=['yolov5s.pt'], source=imgs/cycling001-1024x683.jpg, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5  2024-3-26 Python-3.11.6 torch-2.2.1+cpu CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
image 1/1 /usr/src/app/imgs/cycling001-1024x683.jpg: 448x640 1 person, 2 bicycles, 1 car, 42.0ms
Speed: 0.3ms pre-process, 42.0ms inference, 0.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp2
event type: modified  path : imgs/cycling001-1024x683.jpg
Running command:  python3 detect.py --source imgs/cycling001-1024x683.jpg --weights yolov5s.pt
detect: weights=['yolov5s.pt'], source=imgs/cycling001-1024x683.jpg, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5  2024-3-26 Python-3.11.6 torch-2.2.1+cpu CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
image 1/1 /usr/src/app/imgs/cycling001-1024x683.jpg: 448x640 1 person, 2 bicycles, 1 car, 36.0ms
Speed: 0.3ms pre-process, 36.0ms inference, 0.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp3
event type: modified  path : imgs
event type: modified  path : imgs/cycling001-1024x683.jpg
Running command:  python3 detect.py --source imgs/cycling001-1024x683.jpg --weights yolov5s.pt
detect: weights=['yolov5s.pt'], source=imgs/cycling001-1024x683.jpg, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5  2024-3-26 Python-3.11.6 torch-2.2.1+cpu CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
image 1/1 /usr/src/app/imgs/cycling001-1024x683.jpg: 448x640 1 person, 2 bicycles, 1 car, 36.9ms
Speed: 0.3ms pre-process, 36.9ms inference, 0.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs/detect/exp4

This repetition was not intended, and I'm puzzled about why the script reacted multiple times to a single image modification. The issue seems to be related to how the watchdog event is being triggered or handled, rather than the YOLOv5 model itself. I'm considering adjusting the script to ensure it only processes each image once, possibly by implementing a check to ignore subsequent detections of the same image unless it has been modified again.

Edits: The unintended repetition seems to be tied to the behavior of wget during the image download process, which likely creates temporary files. This behavior might have inadvertently triggered multiple events. Upon experimenting with moving the file into the folder using mv, the script appears to function as expected, processing each image only once.

+Edits: Switching to the on_created event handler in the code, make it safe when using wget to download images into the monitored directory.

Would employing a similar approach to the one I've tested help in addressing the main issue I'm experiencing with YOLOv5 hanging in the Docker container, or might there be a more effective strategy to ensure smooth and single-instance processing of new images?