Concatenate the rows based on number (Google Refine, Excel/Google Spreadsheet)

327 views Asked by At

I have large amount of rows on a csv file, which look like:

name a,1
name b,1
name c,1
name d,2
name e,2

I need to concatenate the rows based on number. Result should be:

name a|name b|name c
name d|name e

How can I do it in Google Refine or in Excel/Google Spreadsheet?

I am thinking of it, but with no solution.

Thank you a lot!

2

There are 2 answers

1
Ettore Rizza On BEST ANSWER

Here is a proposal with Open refine. The only Grel formula i used is:

row.record.cells['myColumn'].value.join('|')

screencast

And here is the JSOn, assuming that your first column is named "myColumn" and the second "number" :

[
  {
    "op": "core/column-addition",
    "description": "Create column test at index 2 based on column number using expression grel:value",
    "engineConfig": {
      "mode": "row-based",
      "facets": [
        {
          "omitError": false,
          "expression": "isBlank(value)",
          "selectBlank": false,
          "selection": [
            {
              "v": {
                "v": false,
                "l": "false"
              }
            }
          ],
          "selectError": false,
          "invert": false,
          "name": "ee",
          "omitBlank": false,
          "type": "list",
          "columnName": "ee"
        }
      ]
    },
    "newColumnName": "test",
    "columnInsertIndex": 2,
    "baseColumnName": "number",
    "expression": "grel:value",
    "onError": "set-to-blank"
  },
  {
    "op": "core/column-move",
    "description": "Move column test to position 0",
    "columnName": "test",
    "index": 0
  },
  {
    "op": "core/blank-down",
    "description": "Blank down cells in column test",
    "engineConfig": {
      "mode": "row-based",
      "facets": [
        {
          "omitError": false,
          "expression": "isBlank(value)",
          "selectBlank": false,
          "selection": [
            {
              "v": {
                "v": false,
                "l": "false"
              }
            }
          ],
          "selectError": false,
          "invert": false,
          "name": "ee",
          "omitBlank": false,
          "type": "list",
          "columnName": "ee"
        }
      ]
    },
    "columnName": "test"
  },
  {
    "op": "core/column-addition",
    "description": "Create column concatenation at index 2 based on column myColumn using expression grel:row.record.cells['myColumn'].value.join('|')",
    "engineConfig": {
      "mode": "row-based",
      "facets": [
        {
          "omitError": false,
          "expression": "isBlank(value)",
          "selectBlank": false,
          "selection": [
            {
              "v": {
                "v": false,
                "l": "false"
              }
            }
          ],
          "selectError": false,
          "invert": false,
          "name": "ee",
          "omitBlank": false,
          "type": "list",
          "columnName": "ee"
        }
      ]
    },
    "newColumnName": "concatenation",
    "columnInsertIndex": 2,
    "baseColumnName": "myColumn",
    "expression": "grel:row.record.cells['myColumn'].value.join('|')",
    "onError": "set-to-blank"
  }
]
0
arkottke On

If you can use Python it would be pretty easy to do this manipulation. In the the code below, the name and group are read from "input.csv", and the grouped names (along with the group) are written to "output.csv". A defaultdict is used to create empty lists to store the group members.

import collections
import csv

grouped = collections.defaultdict(list)
with open('input.csv') as fp:
    reader = csv.reader(fp)
    for row in reader:
        name, group = row
        grouped[group].append(name)


with open('output.csv', 'w', newline='') as fp:
    writer = csv.writer(fp, delimiter='|')
    for key in sorted(grouped.keys()):
        writer.writerow([key] + grouped[key])