openrefine/googlerefine - reconcile two datasets

392 views Asked by At

i'm in a situation with two json files: file A contains different datas, one of them contains numeric id. File B contains all the possible numeric ids linked with a vat number.In the rdf final file i'd like to replace the column of ids of the first file or add another column with the associated vat number. Any suggest is welcome, thank you

EDIT: file A structure (it's an array of data structured like this). The field 'suppliers' contains always a single value which is the id param i mentioned before

{
    "coupon_number": 25422,
    "url": "xxx",
    "title": "Lorem ipsum dolor sit amet, duo ei accusam aliquando rationibus, sed id dolor sensibus delicatissimi.",
    "suppliers": [
        3043
    ],
}

file B structure (another array)

{
    "id": 3043,
    "vatNumber": "03918590401",
}

I need to link 'suppliers' with the vat number or replace it with the vat number

1

There are 1 answers

0
Ettore Rizza On BEST ANSWER

Basically, you have to create two projects based on your Json files, then perform a kind of Vlookup between them.

You mentioned a blog post that explains how to make a join in OpenRefine using the cell.cross() function, but this method is not used much anymore. Most users have downloaded the Vib-Bit plugin (the first one that can be downloaded from this page) that allow you to join them visually.

Just unzip the plugin into the webapps\extensions folder of your OpenRefine directory, restart OpenRefine, and check "Edit Columns" -> "Add column(s) from another project".

Important detail: before making a join, transform the common columns containing numbers (in green) into strings (in black).

The following screencast shows the operations.

enter image description here