Is it possible to make summarized table on openrefine?

1.1k views Asked by At

I have be wondering if is it possible to create an aggregation and summary of values on OpenRefine on the same way as it is done on python and R? Example:

Table of medical appoints with 300k records Id-patient | Age | Id-appointment | value

The result of aggregating and summarizing by patient would be: Id-patient | last-age | mean-value

I hope to be clear enough, if that function works on Openrefine it would of a great help.

1

There are 1 answers

1
Ettore Rizza On BEST ANSWER

The answer is "yes but"... It's possible, but a bit complicated. Let's take an example.

Id-patient,Age,Id-appointment,score
1,25,1-1,456
1,26,2-1,895
1,27,3-1,872
1,28,4-1,12
1,29,5-1,87
2,45,1-2,542
2,46,2-2,524
2,52,3-2,78
2,89,4-2,45
2,90,5-2,371

In order to do aggregate calculations per patient, you must first transform each patient into a record. To do this, move the "Id_patient" column to the beginning and use "blank down" (The id must be sorted beforehand with "Sort..." and "Reoder rows permanently").

After that, you can perform calculations on all the values of each record, considered as an array.

All this will be clearer with a screencast :

enter image description here

The formulas used in the demo are:

GREL:

sort(row.record.cells.Age.value)[-1]

GREL:

sum(row.record.cells.score.value) / length(row.record.cells.score.value)

Python/Jython:

def avg(l):
    return sum(l, 0.0) / len(l)

return avg([x for x in row['record']['cells']['score']['value']])

As you can see, you can do a lot of things with Open Refine, especially using Pyhon/Jython. BUT calculations is not its main purpose. Open Refine is designed primarily to explore, clean and enrich data. It's not a spreadsheet software. You could do the same much more easily with Pivot Tables in Excel. Just as you can clean up some messy data with Excel, even if it's not the best tool for that.