"Empty" array\vector members c++

1.9k views Asked by At

I have to fill an array of 1000 objects by reading data from disk. However, not every objects exits.

Once I declare an array, memory will be reserved for 1000 objects. As I read them one by one, I set the memory to corresponding values. However, there might not be an object for member #276 and it's memory will remain set to whatever was there when the array was declared.

How do I keep the information that certain member of the array is invalid/doesn't exist?

I could somehow set all the bytes of the member to zero, but that might be a valid object.

The obvious solution is adding another array of bytes that will be set to 1 or 0 depending whether the object at that index exists, but it doesn't seem very elegant.

Could this be done with a vector instead? Can it somehow store an empty value?

3

There are 3 answers

3
gsamaras On BEST ANSWER

Could this be done with a vector instead?

No.

Except of course if you use some extra space to store that information (exists or not), or a sentinel value for the non-existed objects. An std::vector has the powerful ability to resize itself according to the number of elements it stores; so if it could satisfy your request, it would lose that ability.

I would use an std::unordered_map, where every key would be the object's index (e.g. #276) and the value would be the actual object. If an object doesn't exist, don't insert that key in the map.

Or an std::map, if you need to iterate over your data efficiently. Choosing between std::map and std::unordered_map.


Either think really hard to find a sentinel value that will mark a cell of your array as empty. For example, if you already have the data somewhere in memory (which I think is not your case though), then you could use an array of pointers, instead of an array that stores whole objects. Then it would be obvious that a NULL pointer would be used for cells that are empty


Another option would be to use an array of pairs, like this: std::pair<myClass, bool>, where the second operand indicates whether the corresponding cell is empty or not.

Moreover, you could use a std::vector<bool> instead, which is very memory efficient (if you decide to follow an approach of an extra data structure), as mentioned in Why does std::vector<bool> has no .data()?. It will however lack in index performance.

0
templatetypedef On

Logically speaking, you need to keep track both of the values that are present and which ones actually have data stored in them. There's no one best way to do this and the choice you make will depend on what you're doing.

In some cases - and it seems like your implementation isn't one of them - you can reserve some special value like nullptr or -1 as a sentinel and use that to mark empty slots. You've already mentioned that this option doesn't exist here, though, so we'll rule that one out.

Another very reasonable option is to store either a parallel bitvector or some auxiliary data per slot marking whether the slot is being used. If you use a bitvector, the extra memory required here is very small compared to what you'll be using for the elements themselves.

The disadvantage to the two above approaches is that if you have a truly huge array - say, millions of elements - you'll be using a ton of memory for unused slots, both for the slots themselves and for any extra bookkeeping. Another option would be to use a sparse data structure like a std::map or std::unordered_map that keys from indices to elements, then to only load elements into the sparse structure that are actually used. The performance cost of looking up an individual element is a bit slower this way, but the memory gains can be significant.

0
Chris Uzdavinis On

First, be sure you really are worrying about enough memory to bother with optimizing away. 1000 objects isn't that many, unless they are huge and you expect them to be sparse. Does their index matter? That is, if you load 2 objects can they be put into element 0,1 of the array, or is their location in the array important, and each object has a specific array index it must use? If that's the case, you may end up with large holes in the array and would need an indicator of which elements are used or not (so I'd not recommend this.) Instead, you might consider an array of pointers that are initialized to null, and then the elements that are used are allocated and the corresponding pointers are set to them at the proper index. If you can compact the array, you might as well use a vector.

Another option is to not put the items in an array, but something like a tree-map, which only holds the elements you insert, but can still be found using a key similar to an array index.

(Note: std::unordered_map is faster than std::map, but hash tables over-allocate memory (frequently if 70% of their allocated space is used they are considered highly loaded) and the entire purpose of the question is reducing memory usage.)