split a string by a vector of strings

Question

split a string by a vector of strings

84 views Asked by markt1964 At 01 August 2023 at 06:51

I have an input string, and I also have a vector of separators.

I want to output the vector of strings that are not in the separators and as an additional entry, tje separator that was found

So for example given

"AB<sep1>DE<sep1><sep2>" where "<sep1>" and "<sep2>" are separators

The output would be the vector

"AB", "<sep1>", "DE", "<sep1>", "<sep2>"

Is there an efficient way to implement this?

Original Q&A

There are 2 answers

Radovm On 01 August 2023 at 08:06

It depends on how fast you want your implementation and what is the intended behavior in corner cases. This is an example of an implementation:

std::string s = "AB<sep1>DE<sep1><sep2>test";
std::vector<std::string> dels = {"<sep1>", "<sep2>"};

size_t pos = 0;
std::vector<std::string> tokens;

for(auto & del : dels)
{
    while ((pos = s.find(del)) != std::string::npos) 
    {
        tokens.push_back(s.substr(0, pos));
        s.erase(0, pos + del.length());
    }
}
if(s.size() != 0)
{
    tokens.push_back(s.substr(0, pos));    
}

for(auto & token : tokens)
{
    std::cout << token << std::endl;
}

The output is:

AB
DE

test

So this particular algorithm counts as a token the nothing between <sep1> and <sep2>.

**chrysante** · Accepted Answer · 2023-08-01T08:47:00+00:00

Here is a possible implementation.

#include <algorithm>
#include <string>
#include <utility>
#include <vector>

std::vector<std::string> split(std::string const& str,
                               std::vector<std::string> const& seps) {
    std::vector<std::pair<size_t, size_t>> sepPos;
    for (auto& sep: seps) {
        if (sep.empty()) {
            continue;
        }
        size_t pos = 0;
        while ((pos = str.find(sep, pos)) != std::string::npos) {
            sepPos.push_back({ pos, pos + sep.size() });
            pos += sep.size();
        }
    }
    std::sort(sepPos.begin(), sepPos.end());
    std::vector<std::string> result;
    size_t pos = 0;
    for (auto [begin, end]: sepPos) {
        result.push_back(str.substr(pos, begin - pos));
        result.push_back(str.substr(begin, end - begin));
        pos = end;
    }
    result.push_back(str.substr(pos, str.size() - pos));
    return result;
}

And a minimal test.

#include <iostream>
#include <string>
#include <vector>

#include "Split.h"

int main() {
    auto tokens = split("AB<sep1>DE<sep2>X<sep1>Y", { "<sep1>", "<sep2>" });
    for (auto& token: tokens) {
        std::cout << token << std::endl;
    }
}

It first finds and collects the positions of all instances of seperators in the string. Then it sorts them so we can iterate over the positions and extract all the substrings.

Note that this breaks down if one seperator contains another seperator as a substring, but I assume that would be a pathological case. It also breaks if a seperator is specified twice.

TechQA.

split a string by a vector of strings

There are 2 answers

Related Questions in C++

Related Questions in STDVECTOR

Related Questions in STDSTRING

Popular Questions

Trending Questions