Bounding box challenges while applying yolo project on videos (from coursera)

323 views Asked by At

i tried to process a video file instead of the image, at the end of the car detection program, from the coursera course on CNN. Unfortunately the bounding boxes are not in sync with the actual car locations and are offset by a few points on both X and Y axes... and seems to me that am messing up somewhere with the frame_width and height when i freeze a 'currentFrame' and feed it to the pre-processing, if at all. any thoughts on what could be wrong here? Didnt want to paste the entire project code so am pasting just the part where am replacing the predict function with code to iterate to frames of a video.

import cv2 
from tqdm import tqdm
import imghdr 
from numpy import expand_dims
from keras.preprocessing.image import img_to_array

video_out = ("nb_images/out1.mp4")
video_reader = cv2.VideoCapture("nb_images/road_video_trim2.mp4")
nb_frames = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))
frame_h = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))
frame_w = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))        
video_writer = cv2.VideoWriter(video_out,
                       cv2.VideoWriter_fourcc(*'MPEG'), 
                       50.0, 
                       (frame_w, frame_h))

batch_size  = 1
images      = []
start_point = 0 #%
show_window = False
for i in tqdm(range(nb_frames)):
    _, image = video_reader.read()
    #blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (frame_w, frame_h), swapRB=True, crop=False)    
    cv2.imwrite("currentFrame.jpg", image)
    image, image_data = preprocess_image("currentFrame.jpg", model_image_size = (608, 608))
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict={yolo_model.input:image_data, K.learning_phase():0})

    #out_scores, out_boxes, out_classes, output_image = predict2(sess,"currentFrame.jpg")
    colors = generate_colors(class_names)
    #draw_boxes(img, out_scores, out_boxes, out_classes, class_names, colors)

    draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
    #video_writer.write(images[i]) 
    imshow(image)
    video_writer.write(np.uint8(image))

images = []
if show_window: cv2.destroyAllWindows()
video_reader.release()
video_writer.release()       
#else: # do detection on an image or a set of images
image_paths = []

So, i figured what was going wrong here ... this code snippet that does the initialization has a different image shape going. First i changed that.

class_names = read_classes("model_data/coco_classes.txt")
anchors = read_anchors("model_data/yolo_anchors.txt")
#image_shape = (720., 1280.)
image_shape=(608., 608.)

Then the yolo calls ...

yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))
scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)

And now - within my code i made these small changes ...

for i in tqdm(range(nb_frames)):
    _, image = video_reader.read()
    #blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (frame_w, frame_h), swapRB=True, crop=False)    
    image = cv2.resize(image, (608, 608))
    cv2.imwrite("currentFrame.jpg", image)
    image, image_data = preprocess_image("currentFrame.jpg", model_image_size = (608, 608))
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict={yolo_model.input:image_data, K.learning_phase():0})
    colors = generate_colors(class_names)    
    draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
    image = cv2.resize(np.array(image), (frame_w,frame_h))
    video_writer.write(np.uint8(image))
    imshow(image)

I think the initialization of the shape to 608,608 and the above resize is what made it work. The final frame came out like this. finalFrame

1

There are 1 answers

0
atom On BEST ANSWER

just to logically close loop here.

The answer is the second half of my edited message above. From the place where i start with - i figured what was going wrong. To be precise, i had to

class_names = read_classes("model_data/coco_classes.txt")
anchors = read_anchors("model_data/yolo_anchors.txt")
#image_shape = (720., 1280.)
image_shape=(608., 608.)

And then i had to "resize" both before i sent it for processing and before i wrote it back as part of the updated video. The other code snippets i saw actually didn't have this so i dont really know why I need this "patch" fix. This works :)

for i in tqdm(range(nb_frames)):
    _, image = video_reader.read()
    #blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (frame_w, frame_h), swapRB=True, crop=False)    
    image = cv2.resize(image, (608, 608))
    cv2.imwrite("currentFrame.jpg", image)
    image, image_data = preprocess_image("currentFrame.jpg", model_image_size = (608, 608))
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict={yolo_model.input:image_data, K.learning_phase():0})
    colors = generate_colors(class_names)    
    draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
    image = cv2.resize(np.array(image), (frame_w,frame_h))
    video_writer.write(np.uint8(image))
    imshow(image)