Complex algorithm to extract numbers/number range from a string

3.2k views Asked by At

I am working on a algorithm where I am trying the following output:

Given values/Inputs:

char *Var = "1-5,10,12,15-16,25-35,67,69,99-105"; int size = 29;

Here "1-5" depicts a range value, i.e. it will be understood as "1,2,3,4,5" while the values with just "," are individual values.

I was writing an algorithm where end output should be such that it will give complete range of output as:

int list[]=1,2,3,4,5,10,12,15,16,25,26,27,28,29,30,31,32,33,34,35,67,69,99,100,101,102,103,104,105;

If anyone is familiar with this issue then the help would be really appreciated. Thanks in advance!

My initial code approach was as:

if(NULL != strchr((char *)grp_range, '-'))
{
    int_u8 delims[] = "-";
    result = (int_u8 *)strtok((char *)grp_range, (char *)delims);

    if(NULL != result)
    {
        start_index = strtol((char*)result, (char **)&end_ptr, 10);
        result = (int_u8 *)strtok(NULL, (char *)delims);
    }

    while(NULL != result)
    {
        end_index = strtol((char*)result, (char**)&end_ptr, 10);
        result = (int_u8 *)strtok(NULL, (char *)delims);
    }

    while(start_index <= end_index)
    {
        grp_list[i++] = start_index;
        start_index++;
    }
}

else if(NULL != strchr((char *)grp_range, ','))
{
    int_u8 delims[] = ",";
    result = (unison_u8 *)strtok((char *)grp_range, (char *)delims);

    while(result != NULL)
    {
        grp_list[i++] = strtol((char*)result, (char**)&end_ptr, 10);
        result = (int_u8 *)strtok(NULL, (char *)delims);
    }
}

But it only works if I have either "0-5" or "0,10,15". I am looking forward to make it more versatile.

7

There are 7 answers

3
csnate On BEST ANSWER

You're issue seems to be misunderstanding how strtok works. Have a look at this.

#include <string.h>
#include <stdio.h>

int main()
{
        int i, j;
        char delims[] = " ,";
        char str[] = "1-5,6,7";
        char *tok;
        char tmp[256];
        int rstart, rend;

        tok = strtok(str, delims);

        while(tok != NULL) {
                for(i = 0; i < strlen(tok); ++i) {
                        //// range
                        if(i != 0 && tok[i] == '-') {
                                strncpy(tmp, tok, i); 
                                rstart = atoi(tmp);
                                strcpy(tmp, tok + i + 1); 
                                rend = atoi(tmp);
                                for(j = rstart; j <= rend; ++j)
                                        printf("%d\n", j); 
                                i = strlen(tok) + 1;
                        }   
                        else if(strchr(tok, '-') == NULL)
                                printf("%s\n", tok);
                }   

                tok = strtok(NULL, delims);
        }   

        return 0;
}
3
James Kanze On

Stop and think about it: what you actually have is a comma separated list of ranges, where a range can be either a single number, or a pair of numbers separated by a '-'. So you probably want to loop over the ranges, using recursive descent for the parsing. (This sort of thing is best handled by an istream, so that's what I'll use.)

std::vector<int> results;
std::istringstream parser( std::string( var ) );
processRange( results, parser );
while ( isSeparator( parser, ',' ) ) {
    processRange( results, parser );
}

with:

bool
isSeparator( std::istream& source, char separ )
{
    char next;
    source >> next;
    if ( source && next != separ ) {
        source.putback( next );
    }
    return source && next == separ;
}

and

void
processRange( std::vector<int>& results, std::istream& source )
{
    int first = 0;
    source >> first;
    int last = first;
    if ( isSeparator( source, '-' ) ) {
        source >> last;
    }
    if ( last < first ) {
        source.setstate( std::ios_base::failbit );
    } 
    if ( source ) {
        while ( first != last ) {
            results.push_back( first );
            ++ first;
        }
        results.push_back( first );
    }
}

The isSeparator function will, in fact, probably be useful in other projects in the future, and should be kept in your toolbox.

4
Pete Becker On

Don't search. Just go through the text one character at a time. As long as you're seeing digits, accumulate them into a value. If the digits are followed by a - then you're looking at a range, and need to parse the next set of digits to get the upper bound of the range and put all the values into your list. If the value is not followed by a - then you've got a single value; put it into your list.

2
Adam Burry On

One approach:

You need a parser that identifies 3 kinds of tokens: ',', '-', and numbers. That raises the level of abstraction so that you are operating at a level above characters.

Then you can parse your token stream to create a list of ranges and constants.

Then you can parse that list to convert the ranges into constants.

Some code that does part of the job:

#include <stdio.h>

// Prints a comma after the last digit. You will need to fix that up.
void print(int a, int b) {
  for (int i = a; i <= b; ++i) {
    printf("%d, ", i);
  }
}

int main() {
  enum { DASH, COMMA, NUMBER };
  struct token {
    int type;
    int value;
  };

  // Sample input stream. Notice the sentinel comma at the end.
  // 1-5,10,
  struct token tokStream[] = {
    { NUMBER, 1 },
    { DASH, 0 },
    { NUMBER, 5 },
    { COMMA, 0 },
    { NUMBER, 10 },
    { COMMA, 0 } };

  // This parser assumes well formed input. You have to add all the error
  // checking yourself.
  size_t i = 0;
  while (i < sizeof(tokStream)/sizeof(struct token)) {
    if (tokStream[i+1].type == COMMA) {
      print(tokStream[i].value, tokStream[i].value);
      i += 2;  // skip to next number
    }
    else { // DASH
      print(tokStream[i].value, tokStream[i+2].value);
      i += 4;  // skip to next number
    }
  }

  return 0;
}
2
Oleg Olivson On

First divide whole string into numbers and ranges (using strtok() with "," delimiter), save strings in array, then, search through array looking for "-", if it present than use sscanf() with "%d-%d" format, else use sscanf with single "%d" format.

Function usage is easily googling.

10
Neil Kirk On

Here is a C++ solution for you to study.

#include <vector>
#include <string>
#include <sstream>
#include <iostream>
using namespace std;

int ConvertString2Int(const string& str)
{
    stringstream ss(str);
    int x;
    if (! (ss >> x))
    {
        cerr << "Error converting " << str << " to integer" << endl;
        abort();
    }
    return x;
}

vector<string> SplitStringToArray(const string& str, char splitter)
{
    vector<string> tokens;
    stringstream ss(str);
    string temp;
    while (getline(ss, temp, splitter)) // split into new "lines" based on character
    {
        tokens.push_back(temp);
    }
    return tokens;
}

vector<int> ParseData(const string& data)
{
    vector<string> tokens = SplitStringToArray(data, ',');

    vector<int> result;
    for (vector<string>::const_iterator it = tokens.begin(), end_it = tokens.end(); it != end_it; ++it)
    {
        const string& token = *it;
        vector<string> range = SplitStringToArray(token, '-');
        if (range.size() == 1)
        {
            result.push_back(ConvertString2Int(range[0]));
        }
        else if (range.size() == 2)
        {
            int start = ConvertString2Int(range[0]);
            int stop = ConvertString2Int(range[1]);
            for (int i = start; i <= stop; i++)
            {
                result.push_back(i);
            }
        }
        else
        {
            cerr << "Error parsing token " << token << endl;
            abort();
        }
    }

    return result;
}

int main()
{
    vector<int> result = ParseData("1-5,10,12,15-16,25-35,67,69,99-105");
    for (vector<int>::const_iterator it = result.begin(), end_it = result.end(); it != end_it; ++it)
    {
        cout << *it << " ";
    }
    cout << endl;
}

Live example

http://ideone.com/2W99Tt

2
P0W On

This is my boost approach :

This won't give you array of ints, instead a vector of ints

Algorithm used: (nothing new)

  • Split string using ,

  • Split the individual string using -

  • Make a range low and high

  • Push it into vector with help of this range

Code:-

#include<iostream>
#include<vector>
#include <boost/algorithm/string.hpp>
#include <boost/lexical_cast.hpp>


int main(){

    std::string line("1-5,10,12,15-16,25-35,67,69,99-105");
    std::vector<std::string> strs,r;
    std::vector<int> v;
    int low,high,i;
    boost::split(strs,line,boost::is_any_of(","));

for (auto it:strs)
{
    boost::split(r,it,boost::is_any_of("-"));

    auto x = r.begin();
    low = high =boost::lexical_cast<int>(r[0]);
    x++;
    if(x!=r.end())
        high = boost::lexical_cast<int>(r[1]);
    for(i=low;i<=high;++i)
      v.push_back(i);
}

for(auto x:v)
  std::cout<<x<<" ";

    return 0;

}