Openrefine: text facet by counting

917 views Asked by At

I've a huge file primary composed of book metadata (author, title, date, url). My problem is that I want to operate on author names (which are often repeated: an author can have hundreds of records) and I want to operate on the subset of these authors that have more than X records.

For example, I have 200 records related to "William Shakespeare", but only one 1 record of "John Black", etc. The point is, being this a classic power law, I have hundred thousands authors, the majority of them with 1-2 records.

Using "Text facet" > "count" is impossible, because my computer freezes.

Is there a query to have the text facet of just some records, based on their count?

1

There are 1 answers

0
magdmartin On BEST ANSWER

Create a custom text facet with the following GREL expression (replace COLUMNS_NAME by your actual column name):

facetCount(value, "value", "COLUMN_NAME") > 100

You can edit the comparison (in the example every count great than 100).

To display only exact count match you need to use two == like this:

facetCount(value, "value", "COLUMN_NAME") == 100

More details on this video + tutorail on facet by facet count