Can I reinterpret std::vector<char> as a std::vector<unsigned char> without copying?

11.6k views Asked by At

I have a reference to std::vector<char> that I want to use as a parameter to a function which accepts std::vector<unsigned char>. Can I do this without copying?

I have following function and it works; however I am not sure if a copy actually takes place - could someone help me understanding this? Is it possible to use std::move to avoid copy or is it already not being copied?

static void showDataBlock(bool usefold, bool usecolor,
            std::vector<char> &chunkdata)  
{
  char* buf = chunkdata.data();                      
  unsigned char* membuf = reinterpret_cast<unsigned char*>(buf); 
  std::vector<unsigned char> vec(membuf, membuf + chunkdata.size()); 
  showDataBlock(usefold, usecolor, vec);   
} 

I was thinking that I could write:

std::vector<unsigned char> vec(std::move(membuf),
                               std::move(membuf) + chunkdata.size());  

Is this overkill? What actually happens?

6

There are 6 answers

3
WhiZTiM On BEST ANSWER

...is it possible to use std::move to avoid copy or is it already not being copied

You cannot move between two unrelated containers. a std::vector<char> is not a std::vector<unsigned char>. And hence there is no legal way to "move ~ convert" the contents of one to another in O(1) time.

You can either copy:

void showData( std::vector<char>& data){
    std::vector<unsigned char> udata(data.begin(), data.end());
    for(auto& x : udata)
        modify( x );
    ....
}

or cast it in realtime for each access...

inline unsigned char& as_uchar(char& ch){
    return reinterpret_cast<unsigned char&>(ch);
}

void showDataBlock(std::vector<char>& data){
    for(auto& x : data){
        modify( as_uchar(x) );
    }
}
3
javaLover On

I guess you coded another overloaded function :-

showDataBlock(usefold, usecolor, std::vector<unsigned char> & vec);  

You try to convert from std::vector<T> to another std::vector<T2>.

There is no way to avoid the copying.

Each std::vector has its own storage, roughly speaking, it is a raw pointer.
The main point is : you can't share such raw pointer among multiple std::vector.
I think it is by design.
I think it is a good thing, otherwise it would waste CPU to keep track.

The code ...

std::move(membuf)

... move the raw pointer = actually do nothing. (same as passing as membuf)

To optimize, you should verify the reason : why you want to convert from std::vector<char> to std::vector<unsigned char> in the first place.

Is it a better idea if you create a new class C that can represent as both char and unsigned char? (e.g. C::getChar() and C::getUnsignedChar(), may be ... store only char but provide converter as its non-static function)

If it doesn't help, I suggest creating a new custom data-structure.
I often do that when it is needed.

However, in this case, I don't think it need any optimization.
It is OK for me, except it is a performance critical code.

10
bolov On

If you have a v1 of type std::vector<T1> and need a v2 of type std::vector<T2> there is no way around copying the data, even if T1 and T2 are "similar" like char and unsigned char.

Use standard library:

std::vector<unsigned char> v2;
std::copy(v1.begin(), v1.end(), std::back_inserter(v2));

The only possible way around it is to somehow work with only one type: either obtain std::vector<T2> from the start if possible, or work with std::vector<T1> from now on (maybe add an overload that deals with it). Or create generic code (templates) that can deal with any [contigous] container.


I think reinterpret_cast and std::move should make it possible to avoid copy
no, it can't
please elaborate - why not?

A vector can steal resources (move data) only from another vector of the same type. That's how it's interface was designed.

To do what you want you would need a release() method that would release the vector ownership of the underlying data and return it as a (unique) pointer and a move constructor/assignment that would acquire the underlying data from a (unique) pointer. (And even then you would still require an reinterpret_cast which is... danger zone)

std::vector has none of those. Maybe it should have. It just doesn't.

4
serup On

I ended up doing something like this :

static void showDataBlock(bool usefold,bool usecolor, std::vector<char> chunkdata)
{                                                                                                                           
    std::vector<unsigned char>&cache = reinterpret_cast<std::vector<unsigned char>&>(chunkdata);                                              
    showDataBlock(usefold, usecolor, cache);    
}                                                                             

static bool showDataBlock(bool usefold,bool usecolor, std::vector<unsigned char> &chunkdata)   
{
    // showing the data
}

This solution allowed me to pass vector as ref or as normal it seems to be working - if its the best solution I do not know, however you all came with some really good suggestions - thank you all

I agree I can not avoid the copy, so I let the copy be done with normal parameter passing

Please if you find this solution wrong, then provide a better one in comment, not just downvote

1
m8mble On

As others already pointed out, there is no way around the copy without changing showDataBlock.

I think you have two options:

  1. Extend showDataBlock to work on both signed char and unsigned char (ie. make it a template) or
  2. Don't take the container as argument but an iterator range instead. You could then (in case of value_type being char) use special iterators converting from signed char to unsigned char elementwisely.
0
ichidan On

while unsigned char and char are unrelated types. I think they're similar enough in this case (same size pods) to get away with a reinterpret_cast of the entire templated class.

static void showDataBlock(bool usefold, bool usecolor,
            std::vector<char> &chunkdata)  
{
  showDataBlock(usefold, usecolor, reinterpret_cast< std::vector<unsigned char>&>(chunkdata));   
}

However, I tend to find these problems are due to not designing the best architecture. Look at the bigger picture of what it is that this software is supposed to be doing to identify why you need to work wit both signed and unsigned char blocks of data.