I have two data frames df1 (4x4) and df2 (4x1). In each, first variable (i.e. Original_items and Reordered) is string. In df1, V2:V4 are numeric. You can see that in df1 and df2, data in the first variable is arranged in a different order. I need to do the following.

Take 1st element of the df2 'Reordered' variable (i.e. Enjoy holidays.), then search through elements of df1 'Original_items' variable to find the exact match.

When match is found, I need to take the entire row of data associated with the matched element in df1 'Original_items' (i.e."Enjoy holidays.", 4,1,3), and append it beside the same element of df2 'Reordered' variable (i.e. "Enjoy holidays"). I need this output in the new data frame, called df_desired, which should be: "Enjoy holidays.", "Enjoy holidays.", 4, ,1 ,3. Please see below illustration of this example.

When this is done, I would like to repeat this process for each element of the df2 'Reordered'variable, so the final result looks like df_desired table below.

Context of the problem. I have around 2,000 items and 1,000 data points associated with each item. As I need to match items and append data in a predefined way, I am trying to think of an efficient solution.

EDIT It was suggested that I could simply rename items in the "Original Variable". While this is true, it is inconvenient to do for a data frame of more than 2,000 items.

Also, it was mentioned that this question maybe only related to merging. I believe merging is needed here only for elements that have been identified as identical across df1 and df2. Therefore, there are two key questions: 1) how to match string variables in this particular case? 2) how to merge/append rows conditionally, i.e. if they have been matched. Thank you for your input and I would be grateful for your help please

I will mention what I tried and figured out so far. I realised

 df1[,1] == df2 [,1] # gives me true or false if rows in column 1 are the 
                       same in both data frames. I tried to set up a double loop, but unsuccessfully   
 for (i in 1:nrow(df1)) {                                                 
      for (j in 1:nrow(df2)){
    if (i==j){
         c <- merge(a,b)        
        } else 
              print("no result")
           }
                 }  

I feel that in the loop I'm not able to specify that I am only working with row values from a single variable "Original_item" in df1

# df1 (4x4 matrix)
  Original_items        V2    V3   V4
  Love birds.            1     5    3
  Eat a lot of food.     2     5    5
  Love birthdays.        2     2    4
  Enjoy holidays.        4     1    3

# df2  (4x1 matrix)
  Reordered                   
  Enjoy holidays. 
  Eat a lot of food.
  Love birds. 
  Love birthdays.      

# df_desired (4x5 matrix)
  Reordered             Original_items            V2   V3   V4    
  Enjoy holidays.       Enjoy holidays.           4     1    3
  Eat a lot of food.    Eat a lot of food.        2     5    5 
  Love birds.           Love birds.               1     5    3
  Love birthdays.       Love birthdays.           2     2    4

1 Answers

1
ianjd On Best Solutions

If i understand correctly, you first want to sort df1$original_items to be in the same order as df2 reorder, then apply that same sorting pattern to the rest of df1 variables. First get your vector of indices of df1 in the sequential order that you desire those rows of df1 to end up in.

#initialize an object to capture the above output
indices <- NULL
for (i in 1:nrow(df1)) {
    indices[i] <- which(df1$Original_items == df2$Reordered[i]))
}

Then, just use this list of indices to reorder the all the rows of df1 and create the new df.

df_desired <- cbind(df2$Reordered, df1[indices, ])