C++ remove_if without iterating through whole vector

733 views Asked by At

I have a vector of pointers, pointing to approx 10MB of packets. In that, from first 2MB, I wanna delete all those that matches my predicate. The problem here is remove_if iterates through the whole vector, even though its not required in my use case. Is there any other efficient way?

fn_del_first_2MB
{
    uint32 deletedSoFar = 0;
    uint32 deleteLimit = 2000000;

    auto it = std::remove_if (cache_vector.begin(), cache_vector.end(),[deleteLimit,&deletedSoFar](const rc_vector& item){
    if(item.ptr_rc->ref_count <= 0) {
        if (deletedSoFar < deleteLimit) {
            deletedSoFar += item.ptr_rc->u16packet_size;
        delete(item.ptr_rc->packet);    
        delete(item.ptr_rc);
            return true;
        }
        else    
            return false;
    }
    else
        return false;
    });
    cache_vector.erase(it, cache_vector.end());
}

In the above code, once the deletedSoFar is greater than deleteLimit, any iteration more than that is unwanted.

3

There are 3 answers

2
Jarod42 On BEST ANSWER

You may use your own loop:

void fn_del_first_2MB()
{
    const uint32 deleteLimit = 2000000;

    uint32 deletedSoFar = 0;
    auto dest = cache_vector.begin();
    auto it = dest

    for (; it != cache_vector.end(); ++it) {
        const auto& item = *it;

        if (item.ptr_rc->ref_count <= 0) {
            deletedSoFar += item.ptr_rc->u16packet_size;
            delete(item.ptr_rc->packet);    
            delete(item.ptr_rc);
            if (deletedSoFar >= deleteLimit) {
                ++it;
                break;
            }
        } else if (dest != it) {
            *dest = std::move(*it);
            ++dest;
        }
    }
    cache_vector.erase(dest, it);
}
1
Dietmar Kühl On

There is no need for std::remove_if() to pass the .end() iterator as the second argument: as long as the first argument can reach the second argument by incrementing, any iterators can be passed.

There is somewhat of a complication as your condition depends on the accumulated size of the elements encountered so far. As it turns out, it looks as if std::remove_if() won't be used. Something like this should work (although I'm not sure if this use of std::find_if() is actually legal as it keeps changing the predicate):

std::size_t accumulated_size(0u);
auto send(std::find_if(cache_vector.begin(), cache_vector.end(),
                              [&](rc_vector const& item) {
        bool rc(accumulated_size < delete_limit);
        accumulated_size += item.ptr_rc->u16packet_size;
        return rc;
    });
std::for_each(cache_vector.begin(), send, [](rc_vector& item) {
       delete(item.ptr_rc->packet);    
       delete(item.ptr_rc);
    });
cache_vector.erase(cache_vector.begin(), send);

The std::for_each() could be folded into the use of std::find_if() as well but I prefer to keep things logically separate. For a sufficiently large sequence there could be a performance difference when the memory needs to be transferred to the cache twice. For the tiny numbers quoted I doubt that the difference can be measured.

1
AudioBubble On

Instead of cache_vector.end() put your own iterator marker myIter. With the remove_if option you should follow the erase-remove idiom. Here is an example that affects only the first 4 elements:

#include <iostream>
#include <vector>
#include <algorithm>

int main()
{
    std::vector<int> vec = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    size_t index = 4; // index is something you need to calculate
    auto myIter = vec.begin() + index; // Your iterator instead of vec.end()
    vec.erase(std::remove_if(vec.begin(), myIter, [](int x){return x < 3; }), myIter);
    // modified vector:
    for (const auto& a : vec)
    {
        std::cout << a << std::endl;
    }
    return 0;
}