I have an original data set with over 4,500 row entries representing land transactions. One of the existing columns is "Place," and in each cell value I've put the locality name usually as Some Neighborhood
, but sometimes when there is a place I don't know, I'll add additional info from the notary records, e.g., Some Neighborhood, Some County
or even as Some Neighborhood, Some County (nicknamed)
.
I want to create two new columns using the info from this one column. The first column would contain the upper-level administrative division, and the second column the administrative division above that -- kinda like zooming out.
I think I would need to create a new column based on existing column using a sort of logical test or if-if-if- test where I can enumerate the values of Some Neighborhood
that belong to County 1
or County 2
or City 1
, etc. Then, based on this new column, I'd create another that would have State 1
, State 2
, etc., presumably using the same sort of test.
How would I write this using regular expressions in Open Refine GREL? How can I string together multiple possible values that are found in the cells? I tried using cells
and value.contains
but I do not know how to string together multiples like this.
So again to recap, I want to script a way to create a new column that contains new cell values based on a column of existing values, something like
"if Street1
,Street2
, Street5,
but not Street3
, or Street4
, then County1
"
OR
"if Street1
or Street2
or Street5,
then County1
AND THEN IF Street3
or Street4
then County2
"
EDIT: Here's some of the data:
land sale 0.350 carreau 350 gourdes Bullet
land sale 1.000 carreau 700 gourdes Campèche
land sale 0.200 carreau 220 gourdes Bremont
land sale 0.500 carreau 150 gourdes Pierrette
land sale 5.000 carreau 225 gourdes Lagenivrée
land sale 0.125 carreau 200 gourdes Bullet
land sale 1.000 carreau 300 gourdes Tozin
land sale 0.125 carreau 100 gourdes Dufort
land sale 0.250 carreau 135 gourdes Charitte, Savann Brute
land sale 0.500 carreau 300 gourdes Ravines des Roches
land sale 0.500 carreau 80 gourdes Isidore (Nègre Libre)
land sale 0.500 carreau 215 gourdes Nordette (Boures)
land sale 0.250 carreau 200 gourdes Bullet (Morne Montègue)
And here's an example of the expected result with two new columns, let's say "Commune" and "Section":
land sale 0.350 carreau 350 gourdes Limonade Bwadlans Bullet
land sale 1.000 carreau 700 gourdes Limonade Bwadlans Campèche
land sale 0.200 carreau 220 gourdes Limonade Bwadlans Bremont
land sale 0.500 carreau 150 gourdes Limonade Roucou Pierrette
land sale 5.000 carreau 225 gourdes Limonade Roucou Lagenivrée
land sale 0.125 carreau 200 gourdes Limonade Bwadlans Bullet
land sale 1.000 carreau 300 gourdes Quart_Mor Sablé Tozin
land sale 0.125 carreau 100 gourdes Limonade Bwadlans Dufort
land sale 0.250 carreau 135 gourdes Limonade Bwadlans Charitte, Savann Brute
land sale 0.500 carreau 300 gourdes Limonade Bwadlans Ravines des Roches
land sale 0.500 carreau 80 gourdes Limonade Bwadlans Isidore (Nègre Libre)
land sale 0.500 carreau 215 gourdes Limonade Bwadlans Nordette (Boures)
land sale 0.250 carreau 200 gourdes Limonade Bwadlans Bullet (Morne Montègue)
There are a number of approaches and the one you choose may depend on how many values you have in the initial Place column.
I recommend the first approach as a starting point
Approach 1 - use facets:
Approach 2 - use a lookup: If you have a mapping of Places -> Commune & Sections already, you can create this as a separate OpenRefine project, then use 'cross' to lookup a Commune/Section for each Place
cell.cross('place mapping project','Place')[0].cells["Commune"].value
cell.cross('place mapping project','Place')[0].cells["Section"].value
Approach 3 - use conditionals: This is I think the approach you asked for in the first place, but I don't think it is necessarily the best option
if(or(value=="Bullet",value=="Campéche"),"Bwadlans","")
if(or(cells["Place"].value=="Pierrette",cells["Place"].value=="Lagenivrée"),"Roucou")
Note that to use more than two conditions you have to nest 'or' statements e.g.:
or(or(value=="Bullet",value=="Campéche"),value=="Bremont")
Approaches 2 and 3 can be made easier by installing OpenRefine extensions which help with this process (recommend the VIB-BITS extension for Approach 2, and looking at the gokbutils extension with the 'inArray' function for Approach 3 - which would avoid complex 'or' statements)