How can I retrieve heap memory for string objects and allocated string values in V8/Node app?

388 views Asked by At

I come from web dev, and my understanding of embedding V8 in C++ is limited (null). So I'm coming here, hoping for some assistance.

My task is to peak into V8's heap as my Node appis executing insecure user code using Node's VM api. Currently I am calling[getHeapStatistics()][1] api directly from my Node app to access used_heap_size but I need something more granular.

Specifically, I would like to keep track of all strings created in the user code being executed inside Node's VM sandbox. Many of the strings will be concatenated within while loops at runtime. For this, I've arrive at embedding V8 within C++ app. Questions are as follows:

  1. I have heard primitives (like strings) are kept in the stack, but for dynamically allocated objects like concatenated strings, are they stored in the heap?
  2. Is my problem of accessing strings feasible by embedding V8 in C++? When I say access, I mean, given the JS code, let globalVar = "Hello World";, I would like to gain how much memory String objects in the entire source code take and also retrieve the value of all strings (e.g., "Hello World").
  3. V8's embed doc alludes to running the JS code through C++ app, but is it possible to have the Node app executing user code and have a C++ app peak inside the Node app's heap independently?
  4. I've come across terms like tracing and garbage collector, but because I am new to this, I don't think I'm formulating my question properly. Is there a common term or problem statement that I should be searching?
1

There are 1 answers

3
jmrk On BEST ANSWER

my Node app is executing insecure user code using Node's VM api

The first paragraph on https://nodejs.org/api/vm.html says, in bold:

The vm module is not a security mechanism. Do not use it to run untrusted code.

Code running on Node can e.g. delete or corrupt or infect your files, or steal your passwords and security credentials such as saved cookies, etc. Do you understand this warning?

As for your other questions:

  1. I have heard primitives (like strings) are kept in the stack, but for dynamically allocated objects like concatenated strings, are they stored in the heap?

All strings are stored on the heap.

  1. Is my problem of accessing strings feasible by embedding V8 in C++?

Not easily, but with the HeapProfiler API it should be possible to build something to that effect.

  1. is it possible to have the Node app executing user code and have a C++ app peak inside the Node app's heap independently?

No. Also, again, running untrusted code in Node is a very bad idea. You'll have to build your own embedding application. Which has the benefit that it also gives you a chance to build it securely by not giving it access to your file system, or the network, etc. On the flip side, "JavaScript" code often implies that it assumes to be run in a browser, where there are things like window and document and XMLHTTPRequest, none of which exist in V8, so if you need to support all of that you'll have quite some work ahead of you.

  1. I've come across terms like tracing and garbage collector, but because I am new to this, I don't think I'm formulating my question properly. Is there a common term or problem statement that I should be searching?

Try "HeapProfiler".