Combination of Non-overlapping interval PAIRS

91 views Asked by At

I recently did a coding challenge where I was tasked to return the number of unique interval pairs that do not overlap when given the starting points in one list and the ending points in one list. I was able to come up with an n^2 solution and eliminated duplicates by using a set to hash each entry tuple of (start, end). I was wondering if there was a more efficient way of approaching this, or this is the best I could do:

def paperCuttings(starting, ending):
    # Pair each start with its corresponding end and sort
    intervals = sorted(zip(starting, ending), key=lambda x: x[1])
    non_overlaps = set()

    print(intervals)

    # Store valid combinations
    for i in range(len(intervals)):
        for j in range(i+1, len(intervals)):
            # If the ending of the first is less than the starting of the second, they do not overlap
            if intervals[i][1] < intervals[j][0]:                
                non_overlaps.add((intervals[i], intervals[j]))

    return len(non_overlaps)

starting = [1,1,6,7]
ending = [5,3,8,10]
print(paperCuttings(starting, ending)) # should return 4

starting2 = [3,1,2,8,8]
ending2 = [5, 3, 7, 10, 10]
print(paperCuttings(starting2, ending2))  # should return 3

I ask because I timed out in some hidden test cases

2

There are 2 answers

1
Cary Swoveland On BEST ANSWER

This is a O(n*log n) solution in Ruby (n being the number of intervals). I will include a detailed explanation that should make conversion of the code to Python straightforward.

I assume that non-overlapping intervals have no points in common, not even endpoints1.

def paperCuttings(starting, ending)
  # Compute an array of unique intervals sorted by the beginning
  # of each interval
  intervals = starting.zip(ending).uniq.sort

  n = intervals.size
  count = 0

  # Loop over the indices of all but the last interval.
  # The interval at index i of intervals will be referred to
  # below as "interval i" 
  (0..n-2).each do |i|
    # intervals[i] is interval i, an array containing its two
    # endpoints. Extract the second endpoint to the variable endpoint
    _, endpoint = intervals[i]
     
    # Employ a binary search to find the index of the first
    # interval j > i for which intervals[j].first > endpoint,
    # where intervals[j].first is the beginning of interval j
    k = (i+1..n-1).bsearch { |j| intervals[j].first > endpoint }

    # k equals nil if no such interval is found, in which case
    # continue the loop the next interval i
    next if k == nil

    # As intervals i and k are non-overlapping, interval i is
    # non-overlapping with all intervals l, k <=l<= n-1, of which
    # there are n-k, so add n-k to count
    count = count + n - k
  end
  # return count
  count
end

Try it.

starting = [1, 1, 6,  7]
ending   = [5, 3, 8, 10]

paperCuttings(starting, ending)
  #=> 4
starting = [3, 1, 2,  8,  8]
ending   = [5, 3, 7, 10, 10]
paperCuttings(starting, ending)
  #=> 3

Here I will explain the calculation

intervals = starting.zip(ending).uniq.sort

for

starting = [3, 1, 2,  8,  8]
ending   = [5, 3, 7, 10, 10]
a = starting.zip(ending)
  #=> [[3, 5], [1, 3], [2, 7], [8, 10], [8, 10]]
b = a.uniq
  #=> [[3, 5], [1, 3], [2, 7], [8, 10]]
b.sort
  #=> [[1, 3], [2, 7], [3, 5], [8, 10]]

The removal of duplicates is required by the statement of the problem.

The elements of b are sorted by their first elements. Had there been two arrays with the same first element the second element would be used as the tie-breaker, though that's not important here.

The documentation for Ruby's binary search method (over a range) is here. Binary searches have a time complexity of O(log n), which accounts for the log term in the overall time complexity of O(n*log n).

1. If intervals that share only a single endpoint are regarded as non-overlapping, change starting[j] >= endpoint to starting[j] > endpoint.

3
Dave On

Idea: Parse the array in O(max(range, n)), and for each start-of-interval, count the number of already ended intervals. Whether or not to include single-point intersections as overlaps is a matter of switching two lines of code, so I added a parameter for this.

If range >> n then this is worthless, but if range ~ n log n or better then this may be useful.

Ruby Code: NOTE: This doesn't eliminate duplicate intervals; feel free to crib that from Cary's answer (without the sort) or however you like in linear time.

def count_non_overlapping_intervals(start_points, end_points, count_single_point_overlaps=true)
  raise "Arrays must be of equal length" unless start_points.length == end_points.length
  
  n = start_points.length
  range = end_points.max - start_points.min + 1

    startpoint_to_count = Hash.new(0)
    endpoint_to_count = Hash.new(0)
    
    start_points.each { |point| startpoint_to_count[point] += 1 }
    end_points.each { |point| endpoint_to_count[point] += 1 }
    
    good_pairs = 0
    
    cum_starts = 0 # cumulative range starts
    cum_ends = 0 # cumulatve range ends
    
    
    (start_points.min..end_points.max).each do |i|
      new_starts = startpoint_to_count[i] || 0
      new_ends = endpoint_to_count[i] || 0
      cum_starts += new_starts
      
      cum_ends += new_ends if count_single_point_overlaps
      good_pairs += new_starts * cum_ends
      cum_ends += new_ends unless count_single_point_overlaps
      
    end
    
    return good_pairs
end

# Example

start_points = [3, 1, 2,  8]

end_points = [5, 3, 7, 10]

>count_non_overlapping_intervals(start_points, end_points, true)
=> 4
> count_non_overlapping_intervals(start_points, end_points, false)
=> 3

For the example above, let's label our pairs: a=[1,3], b=[3,5], c=[2, 7], d=[8,10].

Then, our good pairs are ad, bd, cd, and possibly ab.