I have a dataset that consists of nearly 7 million observations and I want to take a random sample of the data to analyze just a subset. I know how to take a random sample of the data:
index <- sample(7009728, 50000)
flights <- flight[index, ]
Is there a way to take a random sample but once created in my dataset, to always give me the same random sample? I'm hoping to do this without having to rely on saving my R project.
Simply use
set.seed
just before you create index:It sets random number generator seed and ensure consistent results.