I am working on an algorithm to estimate the height of detected people in a video, and I'm stuck.
The part that I have working is the detection of people using the HoG algorithm, so I have a bounding box for every person in the frame. And I have calibrated the camera, so I have my intrinsic and extrinsic camera parameters.
The problem is that now I have a formula for the perspective projection with 2 unknowns: height of the object and the distance from the object to the camera. I am using one mono web camera to detect people so I have no information about the distance from the object to the camera. And the height is what I'm trying to estimate, so I don't have that as well.
I know this problem is solvable if I use a kinect or a stereo camera in order to get the distance, but I'm limited to only one mono web camera.
Does anyone have an idea on how to approach this problem? I have read about using reference objects but I can't figure out how to use them to help my problem.