I am using the R programming language. I am interested in knowing if there is a way to estimate the actual run time of a procedure (relative to the "strength" of your computer) without actually running that procedure.
For example, suppose I want to determine how long the below procedure takes to run on my computer :
library(caret)
library(rpart)
#generate data
a = rnorm(80000, 10, 10)
b = rnorm(80000, 10, 5)
c = rnorm(80000, 5, 10)
group <- sample( LETTERS[1:2], 80000, replace=TRUE, prob=c(0.5,0.5))
group_1 <- 1:80000
#put data into a frame
d = data.frame(a,b,c, group, group_1)
d$group = as.factor(d$group)
e <- d
vec1 <- sample(200:300, 5)
vec2 <- sample(400:500,5)
vec3 <- sample(700:800,5)
z <- 0
df <- expand.grid(vec1, vec2, vec3)
df$Accuracy <- NA
for (i in seq_along(vec1)) {
for (j in seq_along(vec2)) {
for (k in seq_along(vec3)) {
# d <- e
d$group_2 = as.integer(ifelse(d$group_1 < vec1[i] , 0, ifelse(d$group_1 >vec1[i] & d$group_1 < vec2[j] , 1, ifelse(d$group_1 >vec2[j] & d$group_1 < vec3[k] , 2,3))))
d$group_2 = as.factor(d$group_2)
TreeFit <- rpart(group_2 ~ ., data = d[,-5])
pred <- predict(
TreeFit,
d[,-5], type = "class")
con <- confusionMatrix(
d$group_2,
pred)
#update results into table
#final_table[i,j] = con$overall[1]
z <- z + 1
df$Accuracy[z] <- con$overall[1]
}
}
}
head(df)
I could just "sandwich" that procedure between the following lines of code and determine how long it took
start_time <- proc.time()
#copy and paste the entire block of code here
proc.time() - start_time
#results
user system elapsed
51.86 0.36 52.22
But suppose it is a really lengthy procedure and I want to roughly estimate how long it will take for my computer to run before actually running it - is this possible?
Thanks
Since you're using nested loops, instead of timing the whole thing, try timing the first of, or small number of, iterations of the loop..
E.g. instead of
try iterating along only the first few elements of each
or whatever makes sense for your use case.
Once you know the timing for a small subset of the data, you could make an educated guess as to how long it may take for larger datasets.