Reordering vector of vectors based on input vector

135 views Asked by At

In a small application, I've been using a std::vector of std::vector<std::string> to temporarily store some data (pulled from a non-SQL database) before processing it and uploading it to a SQL database. Unfortunately, the API from which I am extracting the data does not necessarily return fields in the order specified by a query; e.g. if my query requests fields x, y, z, the data may be returned as y, x, z, or z, y, x, etc... Obviously this is problematic because if the target SQL table's columns are x, y, z, then the data being inserted needs to reflect this.

To account for this random field ordering, I wrote a small function that takes (1) the input data, as returned by the API; and (2) a std::vector<std::string> representing the desired column ordering, as defined in the SQL table - and reorders the elements of each subvector accordingly. Since the first row of the input data is a vector of field names, I'm able to compare it to the correctly ordered vector and determine how each subvector should be reordered:

void fix_order(std::vector<std::vector<std::string>>& data, const std::vector<std::string>& correct) {

  std::size_t width = data[0].size();
  std::vector<int> order_idx(width);

  for (std::size_t i = 0; i < width; i++) {
    std::string tmp(data[0].at(i));
    auto pos = std::find(correct.begin(), correct.end(), tmp);
    order_idx[i] = std::distance(correct.begin(), pos);
  }

  for (std::size_t i = 0; i < data.size(); i++) {
    if (!data[i].empty()) {
      std::vector<std::string> q(width);

      for (unsigned int j = 0; j < width; j++) {
        int new_pos = order_idx[j];
        q[new_pos] = data[i].at(j);
      }
      std::swap(data[i], q);
    }
  }
}

In action, if the input data fields were ordered as second, fourth, first, third, and I passed a vector specifying the correct order as first, second, third, fourth, the transformation looks like this:

Before:
    second  fourth  first   third
    2nd     4th     1st     3rd
    2nd     4th     1st     3rd

After:
    first   second  third   fourth
    1st     2nd     3rd     4th
    1st     2nd     3rd     4th

Although the function produces the desired result, my mixture of loops and STL algorithms feels sloppy and just not very readable in general. In other situations I've typically been able to use std::sort with a custom comparator function for nonstandard sorting, but I was not able to figure out how to adapt this approach here, where the "sorting" is determined by a predefined input, rather than some type of comparison-based logic. Is there a more idiomatic way to accomplish this - i.e. making better use of STL algorithms (not necessarily std::sort) or other C++ idioms?


Here's an online demo to reproduce the situation.

1

There are 1 answers

2
Barry On BEST ANSWER

If you transpose the data, it's as easy as sorting the vectors by the index of the first element in them. This will be slower than your solution but may be more readable:

void fix_order(std::vector<std::vector<std::string>>& data, const std::vector<std::string>& correct) {
    // setup index map, e.g. "first" --> 0
    std::unordered_map<std::string, size_t> idx;
    for (size_t i = 0; i < correct.size(); ++i) {
        idx.insert(std::make_pair(correct[i], i));
    }

    // transpose for efficient sorting 
    auto tp = transpose(std::move(data));

    // sort based on index map
    std::sort(tp.begin(), tp.end(), [&](const std::vector<std::string>& lhs, const std::vector<std::string>& rhs){
        return idx[lhs[0]] < idx[rhs[0]];
    });

    // transpose back to get the form you wanted  
    data = transpose(std::move(tp));
}

Where transpose is just:

std::vector<std::vector<std::string>> transpose(std::vector<std::vector<std::string>>&& data)
{
    std::vector<std::vector<std::string>> result(data[0].size(),
           std::vector<std::string>(data.size()));

    for (size_t i = 0; i < data[0].size(); ++i) {
        for (size_t j = 0; j < data.size(); ++j) {
            result[i][j] = std::move(data[j][i]);
        }
    }

    return result;
}