According to the Memory{base} help page for R 4.1.0 Documentation, R keeps two separate memory areas for "fixed" and "variable" sized objects. As I understand, variable-sized objects are those the user can create in the work environment: vectors, lists, data frames, etc. However, when referring to fixed-sized objects the documentation is rather obscure:
[Fixed-sized objects are] allocated as an array of cons cells (Lisp programmers will know what they are, others may think of them as the building blocks of the language itself, parse trees, etc.)[.]
Could someone provide an example of a fixed-sized object that is stored in a cons cell? For further reference, I know the function memory.profile() gives a profile of the usage of cons cells. For example, in my session this appears like:
> memory.profile()
NULL symbol pairlist closure environment promise language
1 23363 623630 9875 2619 13410 200666
special builtin char logical integer double complex
47 696 96915 16105 107138 10930 22
character ... any list expression bytecode externalptr
130101 2 0 50180 1 42219 3661
weakref raw S4
1131 1148 1132
What do these counts stand for, both numerically and conceptually? For instance, does the logical: 16105 make reference to 16,105 logical objects (bytes?, cells?) that are stored in the source code/binaries of R?
My purpose is to gain more understanding about how R manages memory in a given session. Finally, I think I do understand what a cons cell is, both in Lisp and R, but if the answer to this question needs to address this concept first I think it won't hurt starting from there maybe.
Background
At C level, an R object is just a pointer to a block of memory called a "node". Each node is a C struct, either a
SEXPRECor aVECTOR_SEXPREC.VECTOR_SEXPRECis for vector-like objects, including strings, atomic vectors, expression vectors, and lists.SEXPRECis for every other type of object.The
SEXPRECstruct has three contiguous segments:The
VECTOR_SEXPRECstruct has segments (1) and (2) above, followed by:The
VECTOR_SEXPRECstruct is followed by a block of memory spanning at least8+n*sizeof(<type>)bytes, wherenis the length of the corresponding vector. The block consists of an 8-byte leading buffer, the vector "data" (i.e., the vector's elements), and sometimes a trailing buffer.In summary, non-vectors are stored as a node spanning 32 or 56 bytes, while vectors are stored as a node spanning 28 or 36 bytes followed by a block of data of size roughly proportional to the number of elements. Hence nodes are of roughly fixed size, while vector data require a variable amount of memory.
Answer
R allocates memory for nodes in blocks called Ncells (or cons cells) and memory for vector data in blocks called Vcells. According to
?Memory, each Ncell is 28 bytes on 32-bit systems and 56 bytes on 64-bit systems, and each Vcell is 8 bytes. Thus, this line in?Memory:is actually referring to nodes and vector data, not R objects per se.
memory.profilegives the number of cons cells used by all R objects in memory, stratified by object type. Hencesum(memory.profile())will be roughly equal togc(FALSE)[1L, "used"], which gives the total number of cons cells in use after a garbage collection.When you assign a new R object, the number of Ncells and Vcells in use as reported by
gcwill increase. For example:You might be wondering why the number of Vcells in use increased, given that
xis a language object, not a vector. The reason is that nodes are recursive: they contain pointers to other nodes, which may very well be vector nodes. Here, Vcells were allocated in part because each symbol inxpoints to a string (+to"+",ato"a", and so on), and each of those strings is a vector of characters. (That said, it is surprising that ~125000 Vcells were required in this case. That may be an artifact of theReduceandlapplycalls, but I'm not really sure at the moment.)References
Everything is a bit scattered:
?Memory,?`Memory-limits`,?gc,?memory.profile,?object.size.