I have two datasets about cities' names, states (uf), and years. The main problem is that their names are written in different forms in these datasets. The only correct thing between them is the year and the state.
year uf municipality
2013 RO Ariquemes
2018 RO Ariquemes
2020 RO Ariquemes
2017 RO Ariquemes
2015 RO Ariquemes
2019 RO Ariquemes
2016 RO Ariquemes
2014 RO Ariquemes
2018 RO Cabixi
2017 RO Cabixi
2019 RO Cabixi
2013 RO Cabixi
2020 RO Cabixi
2016 RO Cabixi
2014 RO Cabixi
2015 RO Cabixi
2019 RO Cacoal
2018 RO Cacoal
2017 RO Cacoal
count year uf municipality
2 2015 ES Vara de Infância e Juventude - COLATINA
9 2016 ES 1ª Vara da Infância e Juventude - VILA VELHA
3 2014 ES Vara de Infância e Juventude - LINHARES
11 2014 ES 1ª Vara da Infância e Juventude - SERRA
2 2013 ES 2ª Vara - IBIRAÇU
3 2013 ES Vara de Infância e Juventude - ITAPEMIRIM
1 2013 ES 2ª Vara da Comarca de Afonso Cláudio
3 2017 ES Vara de Infância e Juventude - CACHOEIRO DE ITAPEMIRIM
1 2015 ES 2ª Vara - CONCEIÇÃO DA BARRA
1 2013 ES Vara de Infância e Juventude - LINHARES
4 2015 ES Vara de Infância e Juventude - CACHOEIRO DE ITAPEMIRIM
1 2015 ES Vara Única - JAGUARÉ
1 2013 ES 2ª Vara - ALEGRE
1 2013 ES 2ª Vara - PANCAS
2 2014 ES 2ª Vara - PANCAS
11 2018 ES 1ª Vara da Infância e Juventude - SERRA
4 2021 MG 2 VARA CIVEL, CRIMINAL E DA INFANCIA E DA JUVENTUDE DA COMARCA DE GUANHAES
I want to do the following: Using R, I would like to merge these datasets by municipality uf and year, but there must be a way to approximate the names of the municipalities, which are written differently. I know that it would be something like:
base <- merge(dataset1, dataset2, by=c("year", "municipality", "uf"))
However, as the names in "municipality" are not exact the same, I keep getting an error. How do I solve this issue?