Corrupt contents in NodeJS ref-struct upon garbage collection

305 views Asked by At

Nesting a ref-struct instance within another, one of the properties of the nested object is corrupted upon manual garbage collection.

See this minimal code reproduction: https://github.com/hunterlester/minimum-ref-struct-corruption

Notice on the 3rd line of log output that the value of name is not corrupted:

Running garbage collection...
authGranted object afte gc:  { name: '�_n9a\u0002', 'ref.buffer': <Buffer@0x00000261396F3910 18 86 6c 39 61 02 00 00> }
Unnested access container entry after gc:  { name: 'apps/net.maidsafe.examples.mailtutorial', 'ref.buffer': <Buffer@0x00000261396F3B10 60 68 6e 39 61 02 00 00> }
Globally assigned values after gc:  apps/net.maidsafe.examples.mailtutorial  _publicNames
1

There are 1 answers

2
vsenko On BEST ANSWER

While ref, ref-struct and ref-array are powerful, but fragile things, their combination can behave really obscure.

There are two nuances with your sample:

  1. Calling makeAccessContainerEntry twice overwrites your global cache - CStrings cached (global.x0 and global.x1) during the makeAuthGrantedFfiStruct call will be overwritten by the second directmakeAccessContainerEntry call.

  2. It seems that you should cache each ContainerInfoArray too.

This code should work fine:

const ArrayType = require('ref-array');
const ref = require('ref');
const Struct = require('ref-struct');
const CString = ref.types.CString;

const ContainerInfo = Struct({
  name: CString
});

const ContainerInfoArray = new ArrayType(ContainerInfo);

const AccessContainerEntry = Struct({
  containers: ref.refType(ContainerInfo)
});

const AuthGranted = Struct({
  access_container_entry: AccessContainerEntry
});

const accessContainerEntry = [
  {
    "name": "apps/net.maidsafe.examples.mailtutorial",
  },
  {
    "name": "_publicNames",
  }
];

const makeAccessContainerEntry = (accessContainerEntry) => {
  const accessContainerEntryCache = {
    containerInfoArrayCache: null,
    containerInfoCaches: [],
  };
  accessContainerEntryCache.containerInfoArrayCache = new ContainerInfoArray(accessContainerEntry.map((entry, index) => {
    const name = ref.allocCString(entry.name);
    accessContainerEntryCache.containerInfoCaches.push(name);
    return new ContainerInfo({ name });
  }));
  return {
    accessContainerEntry: new AccessContainerEntry({
      containers: accessContainerEntryCache.containerInfoArrayCache.buffer,
    }),
    accessContainerEntryCache,
  };
};

const makeAuthGrantedFfiStruct = () => {
  const ace = makeAccessContainerEntry(accessContainerEntry);
  return {
    authGranted: new AuthGranted({
      access_container_entry: ace.accessContainerEntry,
    }),
    authGrantedCache: ace.accessContainerEntryCache,
  };
}

const authGranted = makeAuthGrantedFfiStruct();
const unNestedContainerEntry = makeAccessContainerEntry(accessContainerEntry);

if(global.gc) {
  console.log('Running garbage collection...');
  global.gc();
}

console.log('authGranted object afte gc: ', authGranted.authGranted.access_container_entry.containers.deref());
console.log('Unnested access container entry after gc: ', unNestedContainerEntry.accessContainerEntry.containers.deref());

As you can see, I added cache to makeAccessContainerEntry output, you should keep it somewhere as long as you need the data to be held from garbage collection.

Edit: some background

JS implements high-level Memory Management where objects are referenced by references and memory gets released whenever there are no more references to the specific object.

In C there are no references and GC, but there are pointers which are simply memory addresses which point to the location where a specific structure or memory block is located.

ref uses the following technique to bind these two: C pointer is a Buffer which stores the memory address where the actual data is located in memory. Actual data is usually is represented as a Buffer too.

ref-struct is an addon to ref which implements the ability to interpret underlying memory blocks (Buffers) as structures - user defines types and how they are located in memory, ref-struct attempts to read the corresponding portion of a memory block and obtain the value.

ref-array is an addon to ref which implements the ability to interpret underlying memory blocks (Buffers) as arrays - user defines types and how they are located in memory, ref-array attempts to read the corresponding portion of a memory block and obtain the array item.

This way if you allocate a Buffer for something, then obtain a ref reference to it (a new Buffer that simply holds the memory address of the original Buffer) and lose the JS reference to the original Buffer, then the original Buffer could get released by GC like this:

function allocateData() {
  const someData = Buffer.from('sometext');
  return ref.ref(data);
}

const refReference = allocateData();
// There are no more direct JS references to someData - they are all left in the scope of allocateData() function.

console.log(refReference.deref());
global.gc(); // As long as there are no more JS references to someData, GC will release it and use its memory for something else.
console.log(refReference.deref());

Do not hurry to test this code - both console.log(refReference.deref()); will print the same output because ref holds a hidden reference to the referenced data in the refReference.

ref-struct and ref-array are aware of such situations and usually correctly hold hidden references to the referenced data too. But a combination of ref-struct and ref-array reveals a bug or an underlying incompatibility and hidden references sometimes get lost. A workaround is to cache references by yourself - that is the approach I suggested to use.