how to generate a vector which satisfy some conditions?

608 views Asked by At

everyone! how to generate a vector which satisfy some conditions?
Problem: generate a vector a such that length(a)=400000 which is made up of 8 elements:0, 5, 10, 50, 500, 5000, 50000, 300000. Each element appears a set number of times, namely 290205, 100000, 8000, 1600, 160, 32, 2, 1, respectively. Further, a is blocked into 4,000 "groups" of 100 consecutive elements; call them a_k, k=1,...,4000. These groups must satisfy the following:

  1. The sum of every group exceeds 150, i.e. sum_i a_k_i>150 for all k.
  2. The elements 5, 10 and 50 appear between 25 and 29 times in each group, i.e. for all k, the set {i|a_i_k in (5,10,50)} has magnitude between 25 and 29.
  3. 0 never appears more than 8 times in a row in any group.

I have tried this many times, but it does not seem to work: My current code is as follows:

     T <- 4*10^(5)   # data size  
            x <- c(0, 5, 10, 50, 500, 5000, 50000, 300000)      #seed vector  
            t <- c(290205, 100000, 8000, 1600, 160, 32, 2, 1)   #frequency  
            A <- matrix(0, 4000, 100)    #4000 groups  
            k <- rep(0, times = 8)        #record the number of seeds   
            for(m in 1:4000) {        
            p <- (t - k)/(T - 100*(m - 1))      #seed probability  
            A[, m] <- sample(x, 100, replace = TRUE, prob = p)  #group m   
            sm <- 0         
            i <- 0    
              for(j in 1:92) {  
                  if(sum(A[m,j:j + 8])==0){  
                     if(A[m,j] > 0 & A[m,j] < 500) {i <- i+1}  
                        sm <- sm+A[100*m+j]       
                    }  
                   else j <- 0   
                }                
                       if (sm >= 150 & i > 24 & i < 30 & j != 0) {    
                           m <- m + 1  
                           for (n in seq_len(x)) {  
                               k[n] <- sum(A[, m+1] == x[n]) + k[n]  
                            }  
                        }  
            }  
4

There are 4 answers

4
Pierre L On

I can start it off and maybe someone can help get to the next step. My approach is to start with the constraints and let sample work out the numbers.

set.seed(77)
choose <- c(0,5,10,50,500,5000,50000,300000)
freqs <- c(290205,100000,8000,1600,160,32,2,1)
probs <- freqs/sum(freqs)
check.sum <- function(vec) sum(vec) >= 150
check.interval <- function(vec) abs(sum(vec %in% c(5,10,50))-27)<=2
check.runs <- function(vec, runmax=8) max(rle(vec)$lengths[rle(vec)$values==0]) <= runmax

check.all <- function(vector) {
  logicals <- c(check.sum(vector), 
                check.runs(vector),
                check.runs(vector)
                )
  return(all(logicals))

}

nums <- NULL
res <- list()
for(i in 1:4000) {
  nums <- numeric(100)
  while(!check.all(nums)) {nums <- sample(choose, 100, replace=T,prob=probs)}

  res[i] <- list(nums)
}

str(res)
List of 4000
 $ : num [1:100] 1e+01

So this gets you a list of 4,000 groups of 100 numbers that fit the constraints. It only took about two seconds of system time.

Next step is for someone to get a way to build something similar except eliminate 300000 once it is used, and 50000 once it is used twice and so on.

1
MichaelChirico On

How about just doing it by construction? For example:

amat<-matrix(rep(c(rep(rep(c(0,5),c(8,3)),8),
               rep(c(0,NA),c(8,4))),4000),nrow=100)
amat[97:100,1:2205]<-c(rep(10,3),0)
amat[97:98,2206:4000]<-c(5,5)
amat[99:100,2206:2897]<-c(10,10)
amat[99:100,2898]<-c(5,50)
amat[99:100,2899:3307]<-c(5,50)
amat[99:100,3308:3902]<-c(50,50)
amat[which(is.na(amat))]<-rep(c(10,500,5000,5e4,3e5),c(1,160,32,2,1))

a<-c(amat)

This satisfies all your conditions:

Element counts:

>sapply(c(0,5,10,50,500,5000,50000,300000),function(x)length(which(a==x)))
[1] 290205 100000   8000   1600    160     32      2      1

Group sums:

> table(colSums(amat)>=150)

TRUE 
4000 

5,10,50 frequency:

> table(sapply(1:4000,function(x)abs(sum(amat[,x] %in% c(5,10,50))-27)<=2))

TRUE 
4000 

Runs of 0:

> table(sapply(1:4000,function(x)max(rle(amat[,x])$lengths[rle(amat[,x])$values==0])<=8))
#If this is slow, we can just use max(rle(amax[,x]))<=8
#  because there aren't many valid groups with strings of 9+
#  non-0 elements

TRUE 
4000 

if in fact we're never allowed to have strings of 9 0s, we'll need to make a slight adjustment to groups 2:2206, because, e.g. a[100:108]==0

0
Jacky Zhang On

Thanks for everyone! I have figured out my problem.

rm(list = ls())  
media <- matrix(rep(rep(c(0,5,NA),c(72,25,3)),4000),nrow=100)  
media[98:100,1:2400] <-c(10,10,10)  
media[98:99,2401:3200] <-c(50,10)  
media[98:99,3201:4000] <-c(50,0)  
media[100,2401:4000] <-rep(c(0,500,5000,50000,300000),c(1405,160,32,2,1))  
obj1 <- matrix(0,100L,4000)  
obj2 <-obj1  
grp.cond<-function(x) max(rle(x)$lengths[rle(x)$values==0])<=8  
elts<-c(0,5,10,50,500,5000,50000,300000)  
for(i in 1:4000){  
freq<-c(sapply(elts, function(x) length(which(media[,i]==x))))  
while(!grp.cond(obj1[,i])){obj1[,i]<-c(sample(rep(elts,freq)))}  
i<-i+1  
}  
elts1<-c(1:4000)  
freq1<-rep(1,times=4000)  
a1<-sample(rep(elts1,freq1))  
for(i in 1:4000){obj2[,i]<-obj1[,a1[i]]} 
a <- c(obj2)
3
MichaelChirico On

Inspired by @plafort's approach, I've come up with the following that seems to work very quickly and should be capable of generating all vectors satisfying your conditions:

elts<-c(0,5,10,50,500,5000,50000,300000)
freq<-c(290205,100000,8000,1600,160,32,2,1)
ngrp<-4000L

grp.cond1<-function(x)sum(x)>=150
grp.cond2<-function(x)abs(sum(x %in% c(5,10,50))-27)<=2
grp.cond3<-function(x)max(rle(x)$lengths[rle(x)$values==0])<=8

check.all<-function(mat){
  all(sapply(1:ncol(mat),function(y)grp.cond1(mat[,y])),
      sapply(1:ncol(mat),function(y)grp.cond2(mat[,y])),
      sapply(1:ncol(mat),function(y)grp.cond3(mat[,y])))}

while(!check.all(amat)){amat<-matrix(sample(rep(elts,freq)),ncol=ngrp)}
a<-c(amat)

I've also written the code in a way that should be easy to generalize to other element sets/counts, group numbers, and group-wise conditions.

Unfortunately it seems these conditions are pretty stringent, and it may take a long time to generate an acceptable a. I let the while loop run ~1300 times with no success...