libtorch: Why does my Tensor change value when returned from a method into another method?

695 views Asked by At

I'm debugging this error:

Unhandled exception at 0x00007FFA0B7D3E49 in AudioPluginHost.exe: Microsoft C++ exception: c10::Error at memory location 0x00000044B4DABDB0.

I am trying to train a neural network, mostly based off of this example.

Here's what I'm doing:

torch::Tensor TrainingSample::getRatingTensor()
{
    c10::DeviceType deviceType;
    if (torch::cuda::is_available()) {
        deviceType = torch::kCUDA;
    }
    else {
        deviceType = torch::kCPU;
    }
    float ratingArray[1][3] = { {0} };
    ratingArray[0][(int)waveform.rating] = 1;

    ostringstream os0;
    for (int i = 0;i<(sizeof(ratingArray[0])/sizeof(ratingArray[0][0]));i++) {
        os0 << ratingArray[0][i];
        os0 << ",";
    }
    DBG("ratingArray: \n" + os0.str());

    auto options = torch::TensorOptions().dtype(torch::kFloat32).device(deviceType);
    torch::Tensor ratingTensor = torch::from_blob(ratingArray, { 1, 3 }, options);

    ostringstream os1;
    os1 << ratingTensor[0];
    DBG("ratingTensor: \n" + os1.str());

    return ratingTensor;
}

The result from that is:

ratingArray: 
1,0,0,
ratingTensor: 
 1
 0
 0
[ CPUFloatType{3} ]

So, everything's good so far. I call this method from another method in the same class. That method has this code:

...
        // Execute the model on the input data.
        auto prediction = net->forward(trainingSample.sampleTensor);
        auto target = trainingSample.getRatingTensor();

        std::ostringstream os_tensor0;
        os_tensor0 << target[0];
        DBG("target_val: \n" + os_tensor0.str());

        std::ostringstream os_tensor1;
        os_tensor1 << prediction[0];
        DBG("prediction_val: \n" + os_tensor1.str());

        // Compute a loss value to judge the prediction of our model.
        torch::Tensor loss = torch::nll_loss(prediction, target);
...

I get the error on the last line there (torch::Tensor loss = torch::nll_loss(prediction, target);).

The output in the console from that code is:

target_val: 
-4.0784e-07
 9.5288e-44
-3.3012e-34
[ CPUFloatType{3} ]
prediction_val: 
-4.2455e+17
-4.6908e+17
 0.0000e+00
[ CPUFloatType{3} ]
Exception thrown at 0x00007FFA0B7D3E49 in AudioPluginHost.exe: Microsoft C++ exception: c10::Error at memory location 0x00000044B4DABDB0.
Unhandled exception at 0x00007FFA0B7D3E49 in AudioPluginHost.exe: Microsoft C++ exception: c10::Error at memory location 0x00000044B4DABDB0.

So, in addition to the error, I also see the Tensor values for the target are being changed between inside getRatingTensor() and after the value is returned. What could be causing this change? I was thinking it may be related to the cause of this error I'm getting.

I'm trying to run this from a JUCE project, so I'm linking the libraries in projucer and compiling with Visual Studio. I'm not sure if the errors are stemming from bad linking or from a coding error.

My settings in projucer are:

External Libraries to Link:

E:\Programming\Downloads\libtorch\lib\c10.lib
E:\Programming\Downloads\libtorch\lib\c10_cuda.lib
E:\Programming\Downloads\libtorch\lib\caffe2_nvrtc.lib
E:\Programming\Downloads\libtorch\lib\torch.lib
E:\Programming\Downloads\libtorch\lib\torch_cpu.lib
E:\Programming\Downloads\libtorch\lib\torch_cuda.lib

Header Search Paths:

E:\Programming\Downloads\libtorch\include\
E:\Programming\Downloads\libtorch\include\torch\csrc\api\include

Extra Library Search Paths:

E:\Programming\Downloads\libtorch\lib
1

There are 1 answers

0
trialNerror On BEST ANSWER

I would say this is because of your use of torch::from_blob though I cannot be sure because I have no computer to test it on at the moment.

Basically, torch::from_blob does not take ownership of the underlying data you are giving it. It means that you have to ensure that the data lives for at least as long as the tensor created by from_blob. Here, when you are returning rating_tensor, you are leaving the function, thus all variables are cleared, thus ratingArray (which holds ownership of rating_tensor's underlying data) is destroyed. Consequently, you have a memory/pointer error.

This should be solved by simply cloning rating_tensor before returning it :

return rating_tensor.clone();