agrep string matching in R

1.1k views Asked by At

I have two list of some product names. My problem is "Operating system" is matching with "system", "cooling system",etc. But it has to match only with "Operating","OS". Another example is "Key Board" should be matched with "key" or "KB" but not with "Mother Board" or just "Board".

How to give importance to first word than second word?

I used agrep() in R. It matches "system" and "cooling system" also for first example. How to avoid that matches?

And is there any function/method to match "key board" with "KB" and "operating system" with "OS"?

Thanks in advance.

1

There are 1 answers

3
rahul On BEST ANSWER

I have written a function for this, not the most optimized way to do it but this will do the task. the inputs are vectors not lists, hope this helps

stringMatch<-function(search.string,inputstring,pattern=" "){
stringsplit<-unlist(str_split(search.string,pattern))

firstletter<-c()
for(i in seq(1,length(stringsplit))){firstletter<-paste(firstletter,
substring(stringsplit[i],1,1),sep="")}
search.string.l<-tolower(search.string)
firstletter.l<-tolower(firstletter)

matchstring<-grep(paste("\\b",search.string.l,"\\b","|","\\b",firstletter.l,"\\b"
,sep=""),tolower(inputstring))
return(matchstring)
}

test1<-c('hello p','helbbo','hello test','HP')
search.string<-'HP'
[1] 4