I am working on a Virtual Dressing platform. I want to get the measurements of a person from an image. I have implemented OpenPose and am able to get the skeleton of a person however I have no clue as to how to get the measurements of individual body parts?
Here's the code to get Skeleton using OpenPose, OpenCV
get_skeleton_op.py
import cv2
import time
import numpy as np
protoFile = "pose/coco/pose_deploy_linevec.prototxt"
weightsFile = "pose/coco/pose_iter_440000.caffemodel"
nPoints = 18
POSE_PAIRS = [[1, 0], [1, 2], [1, 5], [2, 3], [3, 4], [5, 6], [6, 7],
[1, 8], [8, 9], [9, 10], [1, 11], [11, 12], [12, 13],
[0, 14], [0, 15], [14, 16], [15, 17]]
frame = cv2.imread("./fatguy.jpg")
frameCopy = np.copy(frame)
frameWidth = frame.shape[1]
frameHeight = frame.shape[0]
threshold = 0.1
net = cv2.dnn.readNetFromCaffe(protoFile, weightsFile)
t = time.time()
# input image dimensions for the network
inWidth = 368
inHeight = 368
inpBlob = cv2.dnn.blobFromImage(frame, 1.0 / 255, (inWidth, inHeight),
(0, 0, 0), swapRB=False, crop=False)
net.setInput(inpBlob)
output = net.forward()
print(output)
print("time taken by network : {:.3f}".format(time.time() - t))
H = output.shape[2]
W = output.shape[3]
# Empty list to store the detected keypoints
points = []
for i in range(nPoints):
# confidence map of corresponding body's part.
probMap = output[0, i, :, :]
# Find global maxima of the probMap.
minVal, prob, minLoc, point = cv2.minMaxLoc(probMap)
# Scale the point to fit on the original image
x = (frameWidth * point[0]) / W
y = (frameHeight * point[1]) / H
if prob > threshold:
cv2.circle(frameCopy, (int(x), int(y)), 8, (0, 255, 255),
thickness=-1,
lineType=cv2.FILLED)
cv2.putText(frameCopy, "{}".format(i), (int(x), int(y)),
cv2.FONT_HERSHEY_SIMPLEX,
1, (0, 0, 255), 2,
lineType=cv2.LINE_AA)
# Add the point to the list if the probability
# is greater than the threshold
points.append((int(x), int(y)))
else:
points.append(None)
# Draw Skeleton
for pair in POSE_PAIRS:
partA = pair[0]
partB = pair[1]
if points[partA] and points[partB]:
cv2.line(frame, points[partA], points[partB], (0, 255, 255), 2)
cv2.circle(frame, points[partA], 8, (0, 0, 255),
thickness=-1,
lineType=cv2.FILLED)
# cv2.imshow('Output-Keypoints', frameCopy)
cv2.imshow('Output-Skeleton', frame)
cv2.imwrite('Output-Keypoints.jpg', frameCopy)
cv2.imwrite('Output-Skeleton.jpg', frame)
print("Total time taken : {:.3f}".format(time.time() - t))
cv2.waitKey(0)
Can anyone tell me how to move forward?
Actually, your question is not trivial.
In general, you will have multiple options to do it, I will just describe you int abstract steps, how you could achieve this. Some the methods are more work loaded, some are less precise. I successfully used Variant A so far.
Variant A)
Setup:
You use 1x camera and your person is directly in front of a flat 2D-surface. Your camera should have always the same fixed distance and angle to the 2d surface (background). We have to assume that the person is flat and use the pin hole camera concept. You could do the following processing steps
Processing:
Step A1) do a camera calibration via a printed 2D Pattern (chessboard or others..) Its import that your pattern is always as flat as possible on your background. Generate multiple images on different points of your background and try to cover the complete visible space. Use the opencv example for camera_calibration for your pose estimation (estimates position and distance to your camera) and your lens correction link to code example. You should edit the config xml file beforehand, defining what pattern you are using and what square size in mm or cm you are using.
StepA2) make a picture of your person, make a lens correction
StepA3) calculate the "body-points" via Open-Pose-Framework
StepA4) use an inverse homography to make a projection of your points from “pixel space” into "real world" space using your camera-calibration data from step A1) Now calc the eucldian distance in mm / or cm (defined in calibration xml file). This step assumes that we project points on a 100 % percent flat 2D surface, because our z-dimension is set to zero here, otherwise the calculation is far more complex, but possible to do too. I added a small code example to my github account as example
Variant B:
Use an easy to detect “object” of a known geometry inside your picture, which determines the size as some kind of comparator. You will have to know also some camera parameter like focal length. I found a good step-by-step tutorial here, which also includes some math background.
Variant C:
Setup:
Use 2 or more Cameras and 3D-reconstruction. This leads might to higher accuracy. Also the person could stand now everywhere in your camera field.
Steps:
StepC1) See a good calibration walk-trough here
StepC2) Use 3D reconstruction for distance calculation. Here is the detailed idea and some code
Variant D:
Use an 3D-Scanner or Kinect-System ( a paper which shows the kinect way )