I am writing a calculator in C++ "Hand_Landmark_Write_To_File_Calculator" to write in a file the normalized position of the hand mark (x, y, z) as a function of the handedness.
The inputs to my calculator are :
input_stream: "HANDEDNESS:handedness" Handedness of the detected hand (i.e. is the hand left or right). (
std::vector<ClassificationList>)input_stream: "LANDMARKS:landmarks" Collection of detected/predicted hands, each represented as a list of landmarks. (
std::vector<NormalizedLandmarkList>)
My calculator has been added to the "hand_trancking_desktop_live.pbtxt" graph provided in the Mediapipe example, so that both inputs come from the output of the "HandLandmarkTrackingCpu" node.
Graph
# CPU image. (ImageFrame)
input_stream: "input_video"
# CPU image. (ImageFrame)
output_stream: "output_video"
# Generates side packet cotaining max number of hands to detect/track.
node {
calculator: "ConstantSidePacketCalculator"
output_side_packet: "PACKET:num_hands"
node_options: {
[type.googleapis.com/mediapipe.ConstantSidePacketCalculatorOptions]: {
packet { int_value: 2 }
}
}
}
# Detects/tracks hand landmarks.
node {
calculator: "HandLandmarkTrackingCpu"
input_stream: "IMAGE:input_video"
input_side_packet: "NUM_HANDS:num_hands"
output_stream: "LANDMARKS:landmarks"
output_stream: "HANDEDNESS:handedness"
output_stream: "PALM_DETECTIONS:multi_palm_detections"
output_stream: "HAND_ROIS_FROM_LANDMARKS:multi_hand_rects"
output_stream: "HAND_ROIS_FROM_PALM_DETECTIONS:multi_palm_rects"
}
node {
calculator: "HandLandmarkWriteToFileCalculator"
input_stream: "LANDMARKS:landmarks"
input_stream: "HANDEDNESS:handedness"
}
# Subgraph that renders annotations and overlays them on top of the input
# images (see hand_renderer_cpu.pbtxt).
node {
calculator: "HandRendererSubgraph"
input_stream: "IMAGE:input_video"
input_stream: "DETECTIONS:multi_palm_detections"
input_stream: "LANDMARKS:landmarks"
input_stream: "HANDEDNESS:handedness"
input_stream: "NORM_RECTS:0:multi_palm_rects"
input_stream: "NORM_RECTS:1:multi_hand_rects"
output_stream: "IMAGE:output_video"
}
At the moment I am only trying to display the position of the hands in the standard output according to the Handedness label.
Calculators code :
#include <string>
#include <vector>
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/port/canonical_errors.h"
#include "mediapipe/tasks/cc/components/containers/landmark.h"
#include "mediapipe/framework/formats/classification.pb.h"
namespace mediapipe {
class HandLandmarkWriteToFileCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("LANDMARKS").Set<std::vector<NormalizedLandmarkList>>();
cc->Inputs().Tag("HANDEDNESS").Set<std::vector<ClassificationList>>();
return absl::OkStatus();
}
absl::Status Open(CalculatorContext* cc) final { return absl::OkStatus(); }
absl::Status Process(CalculatorContext* cc) final {
const auto& input_landmarks =
cc->Inputs().Tag("LANDMARKS").Get<std::vector<NormalizedLandmarkList>>();
const std::vector<ClassificationList>& classifications =
cc->Inputs().Tag("HANDEDNESS").Get<std::vector<ClassificationList>>();
std::string label;
for (int i = 0; i < classifications.size(); i++)
{
label = classifications[i].classification(0).label();
if (label.compare("Right") == 1) {
std::cout << "Right : " << input_landmarks[0].landmark(0).x() << std::endl;
} else {
std::cout << "Left : " << input_landmarks[0].landmark(0).x() << std::endl;
}
}
return absl::OkStatus();
}
};
REGISTER_CALCULATOR(HandLandmarkWriteToFileCalculator);
} // namespace mediapipe
If only one hand appears on the webcam, then this code correctly displays the x-coordinate with the correct hand label.
However, if both hands appear on the webcam, the index 0 of the input_landmarks vector corresponds to the first hand that could be detected. So if my right hand is detected before my left hand the standard output will show for "Right: " & "Left: " the same x coordinate corresponding to my right hand. Conversely, if my left hand is detected first, the index 0 will now correspond to my left hand and the standard output will only display the x-coordinate of my left hand.
How can I match the Handedness to the coordinates of the corresponding hand, when two hands are detected ?
I have solved my problem.
The issue was with the graph
hand_tracking_desktop_live.pbtxt, where the output streamLANDMARKS:landmarksis of typestd::vector<NormalizedLandmarkList>. This vector contains the landmarks of each hand, but the handedness associated with each index changes depending on the first hand detected.To solve this problem, i moved the
Hand_Landmark_Write_To_File_Calculatornode into theHandLandMarkTrackingCpucalculator graph.In this graph, each hand is first processed separately, and then the handedness and landmarks of each hand are collected and grouped within the vector.
So, I added the
Hand_Landmark_Write_To_File_Calculatornode after the detection of the landmarks and the handedness of each isolated hand and before the collection of this data into vectors. As follows :Graph hand_landmark_tracking_cpu.pbtxt:
I have adapted my C++ code like this:
I hope this answer will help others.