R : Create specific bin based on data range

3k views Asked by At

I am attempting to repeatedly add a "fixed number" to a numeric vector depending on a specified bin size. However, the "fixed number" is dependent on the data range.

For instance ; i have a data range 10 to 1010, and I wish to separate the data into 100 bins. Therefore ideally the data would look like this

Since 1010 - 10 = 1000
And 1000 / 100(The number of bin specified) = 10
Therefore the ideal data would look like this
bin1 - 10 (initial data)  
bin2 - 20 (initial data + 10)  
bin3 - 30 (initial data + 20)  
bin4 - 40 (initial data + 30)    
bin100 - 1010 (initial data + 1000) 

Now the real data is slightly more complex, there is not just one data range but multiple data range, hopefully the example below would clarify

# Some fixed values
start <- c(10, 5000, 4857694)
end <- c(1010, 6500, 4897909)

Ideally I wish to get something like

10  20
20  30
30  40
..   ..
5000  5015
5015  5030
5030  5045
..   ..
4857694   4858096 # Note theoretically it would have decimal places, 
#but i do not want any decimal place
4858096   4858498
..   ..

So far I was thinking along this kind of function, but it seems inefficient because ;
1) I have to retype the function 100 times (because my number of bin is 100)
2) I can't find a way to repeat the function along my values - In other words my function can only deal with the data 10-1010 and not the next one 5000-6500

# The range of the variable
width <- end - start
# The bin size (Number of required bin)
bin_size <- 100
bin_count <- width/bin_size
# Create a function
f1 <- function(x,y){
c(x[1],
x[1] + y[1], 
x[1] + y[1]*2,
x[1] + y[1]*3)
}

f1(x= start,y=bin_count)
f1
[1] 10 20 30 40

Perhaps any hint or ideas would be greatly appreciated. Thanks in advance!

2

There are 2 answers

0
Learner On BEST ANSWER

Aafter a few hours trying, managed to answer my own question, so I thought to share it. I used the package "binr" and the function in the package called "bins" to get the required bin. Please find below my attempt to answer my question, its slightly different than the intended output but for my purpose it still is okay

library(binr)
# Some fixed values
start <- c(10, 5000, 4857694)
end <- c(1010, 6500, 4897909)

tmp_list_start <- list() # Create an empty list

# This just extract the output from "bins" function into a list
for (i in seq_along(start)){
  tmp <- bins(start[i]:end[i],target.bins = 100,max.breaks = 100)
  # Now i need to convert one of the output from bins into numeric value
  s <- gsub(",.*", "", names(tmp$binct))
  s <- gsub("\\[","",s)
  tmp_list_start[[i]] <- as.numeric(s)
}  

# Repeating the same thing with slight modification to get the end value of the bin
tmp_list_end <- list()
for (i in seq_along(end)){
  tmp <- bins(start[i]:end[i],target.bins = 100,max.breaks = 100)
  e <- gsub(".*,", "", names(tmp$binct))
  e <- gsub("]","",e)
  tmp_list_end[[i]] <- as.numeric(e)
}

v1 <- unlist(tmp_list_start)
v2 <- unlist(tmp_list_end)

df <- data.frame(start=v1, end=v2)
head(df)
  start end
1    10  20
2    21  30
3    31  40
4    41  50
5    51  60
6    61  70

Pardon my crappy code, Please share if there is a better way of doing this. Would be nice if someone could comment on how to wrap this into a function..

3
Pierre L On

Here's a way that may help with base R:

bin_it <- function(START, END, BINS) {
  range <- END-START
  jump <- range/BINS
  v1 <- c(START, seq(START+jump+1, END, jump))
  v2 <- seq(START+jump-1, END, jump)+1
  data.frame(v1, v2)
}

It uses the function seq to create the vectors of numbers leading to the ending number. It may not work for every case, but for the ranges you gave it should give the desired output.

bin_it(10, 1010)
      v1   v2
1     10   20
2     21   30
3     31   40
4     41   50
5     51   60

bin_it(5000, 6500)
      v1   v2
1   5000 5015
2   5016 5030
3   5031 5045
4   5046 5060
5   5061 5075

bin_it(4857694, 4897909)
         v1      v2
1   4857694 4858096
2   4858097 4858498
3   4858499 4858900
4   4858901 4859303
5   4859304 4859705
6   4859706 4860107