Anonymize data for each distinct row in R

545 views Asked by At

Example

Value

15   
15   
15   
4   
37   
37   
37  

There's three distinct values but 7 rows, below is what I want. Since I want to Anonymize my data. I keep getting the error "replacement has 3 rows, data has 7"

This is the code I'm using

final_df$Value <- paste("Value",seq(1:length(unique(final_df$Value))))

Value

Value 1
Value 1   
Value 1   
Value 2   
Value 3   
Value 3   
Value 3  
1

There are 1 answers

2
Andre Elrico On BEST ANSWER

create function that does the job:

anon <- function(x) {
    rl <- rle(x)$lengths
    ans<- paste("Value", rep(seq_along(rl), rl))
    return(ans)
}

call function:

anon(final_df$Value)

result:

# [1] "Value 1" "Value 1" "Value 1" "Value 2" "Value 3" "Value 3" "Value 3"

generalization:

df1 <- mtcars
df1[] <- lapply(df1, anon)
names(df1)    <- paste0("V", seq_along(names(df1)))
rownames(df1) <- NULL

df1