How to get 'n' random points between start and end coordinates in python?

2k views Asked by At

I have a starting coordinate (x1,y1) and an ending coordinate (x2, y2). I want to generate 'n' random points between start and end coordinates without any duplicates. How to do this with python?

I know that a simple way would be to generate 'n' x values and 'n' y values. So we get n*n pairs and I choose 'n' among them with no duplicates. This way I mayn't get a uniform distribution of random points. Any other way to do this?

Edit: I require floating point coordinates in the rectangle formed by the start and end coordinates as opposite corners.

1

There are 1 answers

1
Grismar On BEST ANSWER

TL;DR:

from random import uniform


def gen_coords(x1, y1, x2, y2, n):
    result = set()
    # loops for each addition, avoiding duplicates
    while len(result) < n:
        result.add((uniform(x1, x2), uniform(y1, y2)))
    return result

Arguably, practically:

from random import uniform


def gen_coords(x1, y1, x2, y2, n):
    return [(uniform(x1, x2), uniform(y1, y2)) for _ in range(n)]

Considering that the odds of collisions are tiny.

Assuming that "between start and end coordinates" means in a rectangular section between these two corners in a Cartesian coordinate system (i.e. flat, 2D).

And assuming that a "uniform distribution" is achieved sufficiently ignoring the non-uniform distribution of floating point values. (i.e. not the exact same number of floating point values on any interval of equal length, nor a constant distance between floating point values in a continuum)

There's basically three ways of ensuring the randomly generated points are not duplicated:

  1. pick them from a collection of possible values, removing each pick to avoid picking it again;
  2. generate values within the allowed space, checking each pick against previous picks to avoid adding duplicates (and re-picking values until a new one is generated);
  3. generate values and add to the set until the desired set size, removing duplicates after generation if any and repeating the process until done.

The first option can be a good choice if the space from which values are picked is of similar size to the target set size. However, when picking points with random floating point coordinates in some space, this is unlikely.

The second choice is the most straightforward, but can be expensive to compute if the target set size is large, as every new pick causes more comparisons.

The third choice is a bit more involved, but avoids comparisons until a candidate target set has been completed and certainly the best choice if the odds of collisions are small.

As a variant of the second choice, you could pick a target data structure that simply avoids the addition of duplicates altogether, relying on the language / interpreter to perform the checking more efficiently than any algorithm written in the language would be able to.

In Python, this means using a set instead of a list, which is the fastest way to achieve the result and would likely be the way you'd check for duplicates in the third option anyway - so you may as well use it right away and go with the variant of the second option.

Note that both the 2nd and 3rd option have a major flaw in case you're trying to create a set in the range of the selection function that's larger than the domain of the selection function. But for the given problem that's unlikely except for extremely large 'n'.

A solution (pitting the second option against the third):

from random import uniform
from timeit import timeit


def pick_coords_restricted(x1, y1, x2, y2, n):
    result = set()
    # loops for each addition, avoiding duplicates
    while len(result) < n:
        result.add((uniform(x1, x2), uniform(y1, y2)))
    return result


def pick_coords_checked(x1, y1, x2, y2, n):
    result = []
    # loops once for attempt, checking after each iteration
    while len(set(result)) < n:
        if len(result) > 0:
            result = list(set(result))
            result += [(uniform(x1, x2), uniform(y1, y2)) for _ in range(n - len(result))]
        else:
            result = [(uniform(x1, x2), uniform(y1, y2)) for _ in range(n)]
    return result


print(timeit(lambda: pick_coords_restricted(0, 0, 1, 1, 1000), number=10000))
print(timeit(lambda: pick_coords_checked(0, 0, 1, 1, 1000), number=10000))

Result (on my hardware):

4.3799341
3.9363368000000003

I get consistently, but marginally better results for the pick_coords_checked function - I would favour the clarity of the first implementation.