I would like to map the 3D skeleton in CameraSpace back to 2D color image in Kinect without using Coordinate mapper. Do I need to transform from Depth Camera to Color Camera?
In my opinion, The skeleton joints are gotten from Depth image. Thus, In order to get the joints position in Color image I should go through these 3 steps:
1) 3D joints is on the depth camera 3D space, I need to transform to color camera 3D space (rotation/translation). I dont know how to get this transformation matrix!
2) Find the Color camera intrinsic parameter(using matlab calibration toolbox) to map 3D back to 2D
3) Multiply with Distortion Co-eficients.
I found one similar question in here: How to convert points in depth space to color space in Kinect without using Kinect SDK functions? However, The question about How to find out the transformation matrix to map Depth-camera to Color-camera is not answered.
Edit: After implementation, I think Color 2D, and depth 2D, share the same 3D Camera-space (in some ways, actually depth and color are two different camera --> they should have different 3D camera-space). Thus, I successfully maped 3D points into 2D color without coordinate mapping function (I used the projection matrix which found out from Matlab toolbox). 3D cameraspace -> project back 2D pixel color
At the beginning, I think 3D points are in 3D depth space, 3D depth and 3D color space are different. (input: 3D points in 3D depth space, output: 2D color pixel), I need to transform 3D depthcamera ->3D colorcamera -> project back to 2D pixel color ). However, the step 3D depthcamera ->3D colorcamera did not need to implement.
Yes, you absolutely need to transform from Depth Camera Space to Color Camera Space. Kinect has two cameras - IR and RGB and three "spaces" - camera, depth and color. Depth and IR images share the same color space. Skeleton joints come in Camera Space in 3D, so in order to have a corresponding point in Depth or Color space, you will need to convert appropriately (naturally, you will loose a Z coordinate in the process).
There is a very good reason why the process of coordinate mapping is abstracted away behind an API call. It is dependent on FOV (field of view) of the cameras and the physical distance between them. These can vary between each model of the sensor. Therefore, it is vise to check these values for your particular sensor. They are constant so if you can guarantee that your code will only be running on that particular Kinect model - you will be fine.
After that, see the following post on the calculations you will need to do: Manual coordinate mapping