Inconsistent output between Python and LibTorch C++ when exporting for iOS

768 views Asked by At

I've trained the HuggingFace RoBERTa model for my data (it's a very particular usage — hence the small model/vocabulary!) and tested successfully on Python. I exported the traced model to LibTorch for iOS, but prediction results on device do not match those in Python (giving different argmax token indices). My conversion script:

# torch = 1.5.0
# transformers = 3.2.0

config = RobertaConfig(
    vocab_size=858,
    max_position_embeddings=258,
    num_attention_heads=6,
    num_hidden_layers=4,
    type_vocab_size=1,
    torchscript=True,
)

model = RobertaForMaskedLM(config=config).from_pretrained('./trained_RoBERTa')
model.cpu()
model.eval()

example_input = torch.LongTensor(1, 256).random_(0, 857).cpu()
traced_model = torch.jit.trace(model, example_input)
traced_model.save('./exports/trained_RoBERTa.pt')

I have had problems in the past with another (vision) model that I trained in Python+GPU and converted to LibTorch for iOS, which were solved by adding map_location={'cuda:0': 'cpu'} to the torch.load() call in my conversion script. So I'm wondering whether: 1) that makes sense as a possible explanation in this situation?, and 2) how I can add the map_location option when loading using the .from_pretrained() syntax?

Just in case my Obj-C++ handling of the prediction results is to blame, here's the Obj-C++ code run on device:

- (NSArray<NSArray<NSNumber*>*>*)predictText:(NSArray<NSNumber*>*)tokenIDs {
    try {
        long count = tokenIDs.count;
        long* buffer = new long[count];
        for(int i=0; i < count;  i++) {
            buffer[i] = tokenIDs[i].intValue;
        }
        at::Tensor tensor = torch::from_blob(buffer, {1, (int64_t)count}, at::kLong);
        torch::autograd::AutoGradMode guard(false);
        at::AutoNonVariableTypeMode non_var_type_mode(true);
        auto outputTuple = _impl.forward({tensor}).toTuple();

        auto outputTensor = outputTuple->elements()[0].toTensor();
        auto sizes = outputTensor.sizes();
        // len will be tokens * vocab size -- sizes[1] * sizes[2] (sizes[0] is batch_size = 1)
        auto positions = sizes[1];
        auto tokens = sizes[2];
        float* floatBuffer = outputTensor.data_ptr<float>();
        if (!floatBuffer) {
            return nil;
        }
        // MARK: This is probably a slow way to create this 2D NSArray
        NSMutableArray* results = [[NSMutableArray alloc] initWithCapacity: positions];
        for (int i = 0; i < positions; i++) {
            NSMutableArray* weights = [[NSMutableArray alloc] initWithCapacity: tokens];
            for (int j = 0; j < tokens; j++) {
                [weights addObject:@(floatBuffer[i*positions + j])];
            }
            [results addObject:weights];
        }
        return [results copy];
    } catch (const std::exception& exception) {
        NSLog(@"%s", exception.what());
    }
    return nil;
}

Note that my init code in iOS does call eval() on the TorchScript model.

UPDATE: One observation; the way I've attempted to use my config when loading the trained model above results in the torchscript flag not being set — I assume it's ignoring my config entirely and getting it from the pretrained file. So I've moved it to from_pretrained('./trained_RoBERTa', torchscript=True), as outlined in the docs. Same problem with output on iOS, mind you...

UPDATE 2: I thought I'd try testing the traced model in Python. Not sure it's expected that this should work, but the output does match the same test in the original model:

traced_test = traced_model(input)
pred = torch.argmax(traced_test[0], dim=2).squeeze(0)
pred_str = tokenizer.decode(pred[1:-1].tolist())
print(pred_str)

Which makes me think there's something going with the iOS Obj-C++ execution. The code that loads the traced model/export does call .eval() on the model, btw (I realize that comes up as a possible explanation for differing outputs):

- (nullable instancetype)initWithFileAtPath:(NSString*)filePath {
    self = [super init];
    if (self) {
        try {
            auto qengines = at::globalContext().supportedQEngines();
            if (std::find(qengines.begin(), qengines.end(), at::QEngine::QNNPACK) != qengines.end()) {
                at::globalContext().setQEngine(at::QEngine::QNNPACK);
            }
            _impl = torch::jit::load(filePath.UTF8String);
            _impl.eval();
        } catch (const std::exception& exception) {
            NSLog(@"%s", exception.what());
            return nil;
        }
    }
    return self;
}

UPDATE 3: Uhhhmmm... This is definitely a face-palm moment (following a wasted weekend)... I decided to return a flat NSArray from Obj-C and do the 2D array reshape in Swift, and aside from a shift of one token (I think it's just the [CLS]), the output is now correct. I guess my Obj-C really is that rusty. Sadly, I still don't see the issue, but it's working now so I'm going to surrender.

0

There are 0 answers