Generating a pseudorandom binary sequence where the same number does not occur more than twice in a row

Question

Generating a pseudorandom binary sequence where the same number does not occur more than twice in a row

187 views Asked by Simon At 13 March 2024 at 17:27

I want to be able to generate a (pseudo)random binary sequence (for example, a sequence of Rs and Ls like RLLRRLR) which is counterbalanced such that the same item does not occur more than twice in a row in the sequence and if I, for example, have a sequence of 20, I get 10 of each. Is there a function in R that can do something like this?

I tried to write a function like this myself. Here is the attempt:

RL_seq <- function(n_L = 10, n_R = 10, max_consec = 2, initial_seq = NULL) {
  while(n_L > 0 | n_R > 0){
    side <- sample(c("R", "L"), 1)
    
    if (side == "R" & n_R > 0 & length(grep("R", tail(initial_seq, max_consec))) != max_consec) {
      initial_seq <- append(initial_seq, side)
      n_R <- n_R - 1
    } else if (side == "L" & n_L > 0 & length(grep("L", tail(initial_seq, max_consec))) != max_consec) {
      initial_seq <- append(initial_seq, side)
      n_L <- n_L - 1
    }
  }
  print(initial_seq)
}

# The function does not stop with the following seed
set.seed(1)
RL_seq()

However, it's up to chance whether the code gets stuck or not. I was also hoping that I could change the rules for the sequence (for example, allowing for 3 consecutive Rs), but the code tends to breaks if I touch the arguments. At this point I would be happy if I could run it with the default arguments and not have it get stuck.

I have searched around but I cannot find an answer.

Original Q&A

There are 6 answers

Gregor Thomas On 13 March 2024 at 18:11

In this approach, we generate n_try candidate sequences with equal numbers of 1s and 0s (you can change these to Rs and Ls at the end, if you like). We then filter out any sequences that don't meet the maximum consecutive value condition.

With this seed, generating 100 candidate sequences of length 10 gives 13 eligible results. (Consider each column in the result a sequence.)

set.seed(47)
n_try = 1e2
out_length = 10
n_l = out_length / 2
n_r = out_length / 2
max_in_a_row = 2
letter_space = c(rep(1, n_l), rep(0, n_r))
r = replicate(n_try, sample(letter_space, size = out_length, replace = FALSE))

r = r[, apply(r, MARGIN = 2, FUN = \(x) sum(diff(x) == 0) <= max_in_a_row)]
r
#       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
#  [1,]    1    0    0    0    0    1    0    0    0     1     0     0     0
#  [2,]    0    1    1    1    1    1    1    1    1     0     1     1     1
#  [3,]    0    0    1    0    0    0    0    0    0     1     0     1     0
#  [4,]    1    0    0    0    1    1    1    1    1     0     0     0     0
#  [5,]    0    1    1    1    0    0    0    0    1     0     1     1     1
#  [6,]    1    1    0    1    1    0    1    1    0     1     0     0     1
#  [7,]    0    0    0    0    0    1    1    0    1     0     1     1     0
#  [8,]    1    1    1    1    1    0    0    1    0     1     0     0     1
#  [9,]    0    0    0    0    1    1    1    1    1     1     1     0     0
# [10,]    1    1    1    1    0    0    0    0    0     0     1     1     1

For a 20-length result I bumped n_try up to 1e5, which resulted in 108 qualifying sequences in under a second.

Nir Graham On 13 March 2024 at 19:23

OP/Simon noted that my approach was fast but overly restrictive; with forcing checks adversly limiting the range of possibilities at the beginnings of the strings; One of Gregors comments referenced the idea of doing half and then doubling; so my final approach leverages that sort of idea; the forced rebalances will only happen in the latter half of the string under constructions. I had a warning through in case it should fail; I leave the older solution beneath the newer one :

library(tidyverse)

RL_v2 <- function(desired_length, max_consec = 2) {
  stopifnot(desired_length %% 2==0)
  half_length <- desired_length / 2
  possible_tokens <- c("R", "L")
  # later will use the following tokens when building an explain list
  #  explain_tokens <- c("1","2","O","F")
  # 1 - we had to add 1/R to balance 
  # 2 - we had to add 2/L to balance
  # we had to choose opposite to previous to not make too long consecutive
  # we had a free choice 
  rlseq <- character(0)
  explain_seq <- character(0)
  while (length(rlseq) < desired_length) {
    
    current_tail <- tail(rlseq, max_consec)
    suppressWarnings(max_tail <- max(rle(current_tail)[1L]$lengths))
    if (max_tail < max_consec) {
      # we are free to choose either R or L ; 
      # must we choose one or the other to maintain 'parity'? 
      sumdiff <- (sum(rlseq==possible_tokens[[1]])-
                    sum(rlseq==possible_tokens[[2]]))
      if(sumdiff<0 & length(rlseq)>=half_length){
        # more 2 than 1 so must add 1 
        rlseq <- c(rlseq, possible_tokens[[1]])
        explain_seq <- c(explain_seq,"1")
      } else if(
        sumdiff>0  & length(rlseq)>=half_length
      ){
        # more 1 than 2 so must add 2
        rlseq <- c(rlseq, possible_tokens[[2]])
        explain_seq <- c(explain_seq,"2")
      } else {
        
        # a blessed free choice
        rlseq <- c(rlseq, sample(possible_tokens, size = 1L))
        explain_seq <- c(explain_seq,"F")
      }
      
    } else {
      # forced opposite
      rlseq <- c(rlseq, setdiff(possible_tokens,tail(rlseq,1L)))
      explain_seq <- c(explain_seq,"O")
    }
  }
  sumdiff <- (sum(rlseq==possible_tokens[[1]])-
                sum(rlseq==possible_tokens[[2]]))
  if(sumdiff!=0){
    warning(paste0("Wasnt able to balance\t:",sumdiff))
    }
  list(seq=rlseq,
       explain=explain_seq)
}

#is RRLRR inside 20 length sequences ? 
sapply(1:200,\(x){
  (res <- RL_v2(20,max_consec = 2))
  str_detect(paste0(res$seq,collapse=""),
             pattern = fixed("RRLRR"))
}) |> table()
# FALSE  TRUE 
# 104    96 

# about half the time 

#are we beginning with RRLL? 
sapply(1:200,\(x){
  (res <- RL_v2(20,max_consec = 2))
  str_detect(paste0(res$seq,collapse=""),
             pattern = "^RRLL")
}) |> table()
# about 10%

EDITED : My method, works constructively by checking the tail of what was chosen, and if we have freedom we use it, and if we dont we take the forced choice, it should run in linear time , making it scaleable. extended to check if the string is unbalanced, and rebalance as it goes. the return is a list of the result, and an explainer that shows when a free choice vs a forced choice was made under the rules .

RLx <- function(desired_length, max_consec = 2) {
  possible_tokens <- c("R", "L")
# later will use the following tokens when building an explain list
#  explain_tokens <- c("1","2","O","F")
                                      # 1 - we had to add 1/R to balance 
                                      # 2 - we had to add 2/L to balance
                                      # we had to choose opposite to previous to not make too long consecutive
                                      # we had a free choice 
  rlseq <- character(0)
  explain_seq <- character(0)
  while (length(rlseq) < desired_length) {
 
    current_tail <- tail(rlseq, max_consec)
    suppressWarnings(max_tail <- max(rle(current_tail)[1L]$lengths))
    if (max_tail < max_consec) {
      # we are free to choose either R or L ; 
      # must we choose one or the other to maintain 'parity'? 
      sumdiff <- (sum(rlseq==possible_tokens[[1]])-
                   sum(rlseq==possible_tokens[[2]]))
      if(sumdiff<0){
        # more 2 than 1 so must add 1 
        rlseq <- c(rlseq, possible_tokens[[1]])
        explain_seq <- c(explain_seq,"1")
      } else if(
        sumdiff>0
      ){
        # more 1 than 2 so must add 2
        rlseq <- c(rlseq, possible_tokens[[2]])
        explain_seq <- c(explain_seq,"2")
      } else {
        # a blessed free choice
      rlseq <- c(rlseq, sample(possible_tokens, size = 1L))
      explain_seq <- c(explain_seq,"F")
      }
    } else {
      # forced opposite
      rlseq <- c(rlseq, setdiff(possible_tokens,tail(rlseq,1L)))
      explain_seq <- c(explain_seq,"O")
    }
}
  list(seq=rlseq,
       explain=explain_seq)
}

(res <- RLx(300,max_consec = 2))

ThomasIsCoding On 14 March 2024 at 00:26

Idea

Actually you can artificially construct a "pseudo" process using rmultinom, say, x_1 + x_2 + ... + x_k = n for both L and R categories, but with a constraint on x_i, e.g., x_i <= l, where k defines the minimum distance for two consecutive Ls separated by Rs, for example.

Once a set of {x_i} survives from the constrained sampling, you can following the following two steps:

make {x_i} and {y_i} := sample({x_i}) for the distributions of consecutive Ls and Rs, respectively
Interleave L and R along with there respective repetitions to construct the sequences, e.g., [rep('L',x_1), rep('R',y_1), rep('L',x_2), rep('R',y_2),..., rep('L',x_k), rep('R',y_k)]

Code

f <- function(Ntot, l = 2) {
   n <- Ntot / 2
   k <- ceiling(n / l) + 1
   repeat {
      d <- c(rmultinom(1, n, rep(1, k)))
      if (max(d) <= l && min(d) > 0) break
   }
   cnts <- c(rbind(d, sample(d)))
   rep(rep(sample(c("L", "R")), length.out = length(cnts)), cnts)
}

Example Output

For reproducibility, you can set set.seed(0) before running the following tests

> (out <- f(60, 3))
 [1] "L" "L" "L" "R" "R" "L" "L" "L" "R" "R" "R" "L" "L" "L" "R" "R" "R" "L" "L"
[20] "L" "R" "R" "R" "L" "L" "L" "R" "R" "R" "L" "L" "L" "R" "R" "R" "L" "L" "R"
[39] "R" "R" "L" "L" "R" "R" "L" "L" "L" "R" "R" "R" "L" "L" "R" "R" "L" "L" "L"
[58] "R" "R" "R"

> table(out)
out
 L  R
30 30

> max(rle(out)$lengths)
[1] 3

and

> (out2 <- f(82, 5))
 [1] "R" "R" "R" "L" "L" "L" "L" "L" "R" "R" "R" "R" "L" "L" "L" "L" "R" "R" "R"
[20] "L" "L" "L" "L" "R" "R" "R" "L" "L" "L" "L" "L" "R" "R" "R" "R" "R" "L" "L"
[39] "L" "R" "R" "R" "R" "R" "L" "L" "L" "R" "R" "R" "R" "R" "L" "L" "L" "L" "R"
[58] "R" "R" "R" "L" "L" "L" "L" "L" "R" "R" "R" "R" "R" "L" "L" "L" "L" "L" "R"
[77] "R" "R" "R" "L" "L" "L"

> table(out2)
out2
 L  R
41 41

> max(rle(out2)$lengths)
[1] 5

user2554330 On 14 March 2024 at 00:30

I'd use an MCMC approach to this. Define a distribution on permutations which has the highest value for sequences that meet your condition, and lower values for each violation of the condition. Start with a random permutation of the symbols, then run Metropolis-Hastings or similar where the moves are swaps of randomly chosen pairs. It will eventually spend most of its time in sequences that you find acceptable.

The samples drawn won't be independent, but averages of functions of them will converge to the population mean.

Or if you don't care too much about the distribution, just run it from a random permutation until you get one value, then start over again. Those sequences will be independent, but might not be uniformly distributed over the acceptable permutations.

Here's some code to use this to generate 5 sequences:

RL_seq <- function(n_L = 10, n_R = 10, max_consec = 2, n = 1,
                   alpha = 0.1 ) {
  
  # alpha is a tuning parameter.  Smaller values make the chain
  # move faster, but it will also move to sequences you don't
  # want, so you might get slower output.
  
  score <- function(seq) {
    # Compute a penalty score for violating the rule
    # An acceptable solution has score 0
    lens <- rle(seq)$lengths
    lens <- lens - max_consec
    sum(lens[lens > 0])
  }
  X <- sample(c(rep("L", n_L), rep("R", n_R)))
  Xscore <- score(X)
  
  result <- matrix(NA, n, length(X))
  count <- 0
  while (count < n) {
    Y <- X
    # Randomly select a pair of entries that are different.
    repeat {
      i <- sample(length(X), 2)
      if (Y[i[1]] != Y[i[2]])
        break
    }
    # Swap them
    temp <- Y[i[1]]
    Y[i[1]] <- Y[i[2]]
    Y[i[2]] <- temp
    Yscore <- score(Y)

    # Always accept the proposal if it is at least as good, 
    # and sometimes if it is worse, using the Metropolis ratio
    
    if (Xscore >= Yscore || runif(1) < exp(alpha * (Xscore - Yscore))) {
      X <- Y
      Xscore <- Yscore
    } 
    if (Xscore == 0) {
      count <- count + 1
      result[count,] <- X
    }
  }
  result
}

set.seed(1)
RL_seq(10, 10, n = 5)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
#> [1,] "R"  "L"  "R"  "R"  "L"  "R"  "L"  "L"  "R"  "R"   "L"   "L"   "R"   "L"  
#> [2,] "R"  "R"  "L"  "R"  "L"  "R"  "L"  "R"  "L"  "R"   "L"   "L"   "R"   "R"  
#> [3,] "R"  "R"  "L"  "L"  "R"  "L"  "R"  "R"  "L"  "R"   "R"   "L"   "L"   "R"  
#> [4,] "R"  "R"  "L"  "L"  "R"  "L"  "R"  "R"  "L"  "R"   "R"   "L"   "L"   "R"  
#> [5,] "R"  "R"  "L"  "L"  "R"  "R"  "L"  "R"  "L"  "R"   "R"   "L"   "L"   "R"  
#>      [,15] [,16] [,17] [,18] [,19] [,20]
#> [1,] "R"   "R"   "L"   "L"   "R"   "L"  
#> [2,] "L"   "L"   "R"   "R"   "L"   "L"  
#> [3,] "L"   "R"   "L"   "L"   "R"   "L"  
#> [4,] "L"   "L"   "R"   "L"   "R"   "L"  
#> [5,] "L"   "L"   "R"   "L"   "R"   "L"

^{Created on 2024-03-13 with reprex v2.1.0}

In this case, all 5 sequences are different, but they are quite similar.

jblood94 On 22 March 2024 at 20:02

Below is a function, rbinvec2 that efficiently samples uniformly with replacement from the set of valid vectors of length m. It is able to generate 100K valid samples of a vector of length 101 in less than 1 second.

When I have time, I plan on updating the answer to include a derivation of the underlying PMF.

The function

library(RcppAlgos) # for `permuteGeneral`
library(Rfast) # for `colShuffle`

rbinvec2 <- function(n, m) {
  # handle small `m` values with `RcppAlgos`
  if (m == 1) {
    return(sample(0:1, n, 1))
  } else if (m == 2) {
    return(matrix(c(0:1, 1:0), 2, 2)[sample(1:2, n, 1),])
  } else if (m == 3) {
    return(permuteGeneral(0:1, 3, TRUE, NULL, 2, 7)[sample(6, n, 1),])
  }
  
  odd <- m%%2 == 1
  kk <- (m - sqrt(m^2%%8))/2 # maximum number of doubles
  k <- 0:kk
  mk <- m - k
  mk1 <- (mk + 1)%/%2
  mk <- mk%/%2
  m1 <- m - 1
  m2 <- m%/%2
  # calculate the exact (log) PMF of `k`, including the special cases when `m`
  # is odd
  p <- cbind(lchoose(mk1, k%/%2) + lchoose(mk, (k + 1)%/%2), -Inf)
  i <- seq(1, kk + 1, 2)
  if (odd) { # special cases
    p[-i, 1] <- p[-i, 1] + log(2)
    p[i, 2] <- lchoose(mk1[i], -1:(kk%/%2 - 1)) + lchoose(mk[i], 1:(kk%/%2 + 1))
  }
  # sample the PMF
  k <- array(c(rmultinom(1, n, exp(p - max(p[1,])))), dim(p))
  # k[-i,] <- rowSums(k[-i,])
  s <- rowSums(k)
  b <- vector("list", kk + 1) # initialize the list of compositions
  # special case: no doubles
  if (s[1]) b[[1]] <- rep(1, m*s[1])
  s[1] <- 0L
  
  if (odd) { # odd `m`
    for (i in which(s != 0L)) {
      b[[i]] <- if (i%%2L) {
        # even number of doubles
        j1 <- (m - i)/2 # 1 partition length
        j0 <- j1 + 1 # 0 partition length
        i2 <- (i - 1)/2
        d0 <- j0 - i2
        d1 <- j1 - i2
        c(
          if (k[i, 1]) { # balanced doubles--1 more 0 than 1
            rbind(
              # 0 composition
              c(colShuffle(matrix(rep.int(1:2, c(d0, i2)), j0, k[i, 1]))),
              # 1 composition
              c(rbind(
                colShuffle(matrix(rep.int(1:2, c(d1, i2)), j1, k[i, 1])),
                0L
              ))
            )
          } else integer(0)
          ,
          if (k[i, 2]) { # unbalanced doubles--1 more 1 than 0
            rbind( 
              # 0 composition
              c(colShuffle(matrix(rep.int(1:2, c(d0 + 1, i2 - 1)), j0, k[i, 2]))),
              # 1 composition
              c(rbind(
                colShuffle(matrix(rep.int(1:2, c(d1 - 1, i2 + 1)), j1, k[i, 2])),
                0L
              ))
            )
          } else integer(0)
        )
      } else {
        # odd number of doubles
        i2 <- i/2 - 1 # min number of (0,0)
        j <- m2 - i2 # even/odd composition lengths
        idx <- sample(1:2, s[i], 1) # extra 0, or extra 1?
        bb <- cbind(rep.int(1:2, c(j - i2, i2)), # extra 0
                    rep.int(1:2, c(j - i2 - 1, i2 + 1))) # extra 1
        c(
          rbind(
            c(colShuffle(matrix(bb[,idx], j))), # 0 compositions
            c(colShuffle(matrix(bb[,3 - idx], j))) # 1 compositions
          )
        )
      }
    }
  } else { # even `m`
    for (i in which(s != 0L)) {
      i2 <- (i - 1)%/%2 # number of (0,0)
      j <- m2 - i2 # 0 composition length
      b[[i]] <- c(
        if (i%%2L) {
          # even number of doubles--equivalent partition for 0 and 1
          matrix(
            colShuffle(matrix(rep.int(1:2, c(j - i2, i2)), j, 2*s[i])),
            2, j*s[i], 1
          )
        } else { # odd number of doubles--1 more double 1 than double 0
          rbind(
            # 0-composition
            c(colShuffle(matrix(rep.int(1:2, c(j - i2, i2)), j, s[i]))),
            # 1 composition
            c(rbind(
              colShuffle(
                matrix(rep.int(1:2, c(j - i2 - 2, i2 + 1)), j - 1, s[i])
              ), 0L
            ))
          )
        }
      )
    }
  }
  
  abs(
    matrix(
      # convert the compositions to 0's and 1's
      rep.int(rep(0:1, length.out = sum(lengths(b))), unlist(b)), n, m, 1
      # shuffle the n samples and perform a full 0 <-> 1 swap in random vectors
    )[sample(n),] - sample(0:1, n, 1)
  )
}

Example usage

rbinvec2(10, 8) # 10 samples of vector of length 8
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#>  [1,]    0    1    0    1    0    1    0    1
#>  [2,]    0    1    0    1    0    0    1    1
#>  [3,]    0    1    0    0    1    0    1    1
#>  [4,]    0    0    1    1    0    1    1    0
#>  [5,]    0    0    1    0    1    0    1    1
#>  [6,]    1    0    1    0    1    1    0    0
#>  [7,]    1    0    1    0    1    0    0    1
#>  [8,]    1    0    1    0    1    0    1    0
#>  [9,]    1    0    1    0    1    0    1    0
#> [10,]    1    0    0    1    0    0    1    1
rbinvec2(10, 9)# 10 samples of vector of length 9
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#>  [1,]    0    0    1    1    0    0    1    0    1
#>  [2,]    0    1    1    0    1    1    0    0    1
#>  [3,]    1    0    0    1    0    1    1    0    0
#>  [4,]    1    0    0    1    0    1    0    1    0
#>  [5,]    1    1    0    0    1    1    0    0    1
#>  [6,]    1    1    0    1    0    0    1    0    0
#>  [7,]    0    1    0    1    0    0    1    0    1
#>  [8,]    1    0    1    0    0    1    0    0    1
#>  [9,]    1    1    0    0    1    0    1    1    0
#> [10,]    0    0    1    0    0    1    1    0    1

Timing

system.time(rbinvec2(1e6, 10)) # 1M samples of vector of length 10
#>    user  system elapsed 
#>    1.52    0.12    1.75
system.time(rbinvec2(1e5, 101)) # 100K samples of vector of length 101
#>    user  system elapsed 
#>    0.69    0.09    0.82

Verify correct behavior

library(data.table) # for checks

fcheck1 <- function(m) {
  # returns the number of valid vectors of length `m`
  
  # generate all balanced permutations
  x <- if (m%%2) {
    rbind(
      permuteGeneral(0:1, m, TRUE, c((m + 1)%/%2, m%/%2)),
      permuteGeneral(0:1, m, TRUE, c(m%/%2, (m + 1)%/%2))
    )
  } else {
    permuteGeneral(0:1, m, TRUE, c(m%/%2, m%/%2))
  }
  # count the permutations with no more than two sequential repeated values
  sum(!rowSums(x[,-1:-2] == x[,c(-1, -m)] & x[,c(-1, -m)] == x[,-m:(1 - m)]))
}

fcheck2 <- function(n, m) {
  # return the unique samples along with their counts
  setorder(as.data.table(rbinvec2(n, m))[,.(.N), eval(paste0("V", 1:m))])[]
}

fcheck1(6) # the number of valid vectors of length 6
#> [1] 14
fcheck2(1e6, 6)
#>        V1    V2    V3    V4    V5    V6     N
#>     <int> <int> <int> <int> <int> <int> <int>
#>  1:     0     0     1     0     1     1 71195
#>  2:     0     0     1     1     0     1 71249
#>  3:     0     1     0     0     1     1 71237
#>  4:     0     1     0     1     0     1 71588
#>  5:     0     1     0     1     1     0 71469
#>  6:     0     1     1     0     0     1 71482
#>  7:     0     1     1     0     1     0 71242
#>  8:     1     0     0     1     0     1 71368
#>  9:     1     0     0     1     1     0 71638
#> 10:     1     0     1     0     0     1 71606
#> 11:     1     0     1     0     1     0 71645
#> 12:     1     0     1     1     0     0 71867
#> 13:     1     1     0     0     1     0 71107
#> 14:     1     1     0     1     0     0 71307
fcheck1(7)
#> [1] 36 # the number of valid vectors of length 7
fcheck2(1e6, 7)
#>        V1    V2    V3    V4    V5    V6    V7     N
#>     <int> <int> <int> <int> <int> <int> <int> <int>
#>  1:     0     0     1     0     0     1     1 27612
#>  2:     0     0     1     0     1     0     1 27728
#>  3:     0     0     1     0     1     1     0 27451
#>  4:     0     0     1     1     0     0     1 27707
#>  5:     0     0     1     1     0     1     0 27689
#>  6:     0     0     1     1     0     1     1 27460
#>  7:     0     1     0     0     1     0     1 27891
#>  8:     0     1     0     0     1     1     0 27562
#>  9:     0     1     0     1     0     0     1 27805
#> 10:     0     1     0     1     0     1     0 27947
#> 11:     0     1     0     1     0     1     1 27761
#> 12:     0     1     0     1     1     0     0 27893
#> 13:     0     1     0     1     1     0     1 28044
#> 14:     0     1     1     0     0     1     0 27899
#> 15:     0     1     1     0     0     1     1 27756
#> 16:     0     1     1     0     1     0     0 27477
#> 17:     0     1     1     0     1     0     1 27769
#> 18:     0     1     1     0     1     1     0 27756
#> 19:     1     0     0     1     0     0     1 27955
#> 20:     1     0     0     1     0     1     0 27726
#> 21:     1     0     0     1     0     1     1 27931
#> 22:     1     0     0     1     1     0     0 27896
#> 23:     1     0     0     1     1     0     1 27652
#> 24:     1     0     1     0     0     1     0 27720
#> 25:     1     0     1     0     0     1     1 28005
#> 26:     1     0     1     0     1     0     0 28072
#> 27:     1     0     1     0     1     0     1 27802
#> 28:     1     0     1     0     1     1     0 27579
#> 29:     1     0     1     1     0     0     1 27593
#> 30:     1     0     1     1     0     1     0 27846
#> 31:     1     1     0     0     1     0     0 27745
#> 32:     1     1     0     0     1     0     1 27760
#> 33:     1     1     0     0     1     1     0 27840
#> 34:     1     1     0     1     0     0     1 27704
#> 35:     1     1     0     1     0     1     0 27884
#> 36:     1     1     0     1     1     0     0 28083
#>        V1    V2    V3    V4    V5    V6    V7     N

The distribution of each unique vector is consistent with uniform sampling on the full set of valid vectors for both cases.

**Jon Spring** · Accepted Answer · 2024-03-13T18:07:59+00:00

Here, I first determine the mix of elements, then repeatedly shuffle until there's a draw without too many consecutive picks. I'm sure there's a more computationally efficient approach (e.g. where we don't overwrite seq every iteration), but this might suffice. For example, it takes ~0.002 sec for max_consec = 2, or a few seconds to find a draw with no repeats (max_consec = 1), even if that takes 100k+ draws to find.

Any approach (like the two suggested so far) which relies on random draws will perform poorly for larger sequences, since it will become vanishingly unlikely to happen upon a sequence with few enough repeated strings by chance.

RL_seq <- function(n_L = 10, n_R = 10, max_consec = 2) {
  # make a sequence that is all the Ls then all the Rs
  seq = c(rep("L", n_L), rep("R", n_R))
  consec = max_consec + 1 # we know the first attempt won't work
  while(consec > max_consec) {
    # overwrite the sequence with a shuffle of it
    seq = sample(seq, length(seq), replace = FALSE)
    # what is the maximum "run length encoding" (rle) length of the sequence?
    consec = max(rle(seq)[1]$lengths)
  }
  seq
}


RL_seq()

TechQA.

Generating a pseudorandom binary sequence where the same number does not occur more than twice in a row

There are 6 answers

Idea

Code

Example Output

The function

Example usage

Timing

Verify correct behavior

Related Questions in R

Related Questions in ALGORITHM

Related Questions in SEQUENCE

Popular Questions

Trending Questions