Finding consecutive months in a list

1k views Asked by At

I'm interested in finding the largest set in an unordered list of months, that can be returned as an ordered list of distinct, consecutive months.

For example:

consecutive_months(["December", "January", "February", "April"])

Would output:

"December", "January", "February"

And:

consecutive_months(["February", "December", "January"])

Would output:

"December", "January", "February"

The following works but I'm curious if anyone has ideas for a more elegant way:

MONTHS = ["January", "February", "March", 
          "April", "May", "June",
          "July", "August", "September", 
          "October", "November", "December"] 

def consecutive_months(lst_of_months):
    # create two years of months to handle going from Dec to Jan        
    results = []
    for m in set(lst_of_months):
        results.append((m,MONTHS.index(m)))
        results.append((m,MONTHS.index(m)+12))            
    results = sorted(results, key=lambda x: x[1])
    
    # find the longest series of consecutive months
    this_series = []    
    longest_series = []
    for this_month in results:
        if len(this_series) > 0:
            last_month = this_series[-1]
            if last_month[1] + 1 == this_month[1]:
                this_series.append(this_month)
            else:
                this_series = [this_month]
        else:
            this_series = [this_month]           
        if len(this_series) > len(longest_series):
            longest_series = [m for (m,i) in this_series]

    return longest_series

Here is a pastebin with sample inputs and expected outputs.

3

There are 3 answers

1
Carl Cervone On BEST ANSWER

Here are two working approaches from a friend who also looked at the problem. The first is performant and uses the modulo operator so that the list doesn't need to copied onto itself.

month_names = [
    'January', 'February',
    'March', 'April', 'May',
    'June', 'July', 'August',
    'September', 'October',
    'November', 'December'
]
​
# Looks like: {'January': 0, 'February': 1...}
month_name_to_index = {
  value: index
  for index, value
  in enumerate(month_names)
}
​
​
def consecutive_months(list_of_months_by_name):
  if not list_of_months_by_name:
    # If the list is empty, return None.
    return None
​
  month_was_seen = [False] * 12  # Looks like: [False, False, ...]
  
  for month_name in list_of_months_by_name: 
    month_was_seen[month_name_to_index[month_name]] = True
​
  # Seek to first missing month:
  for start_index in range(12):
    if not month_was_seen[start_index]:
      break
​
  # If there is no missing month, return the whole year.
  if month_was_seen[start_index]:
    return {"from": "January", "to": "December", "length": 12}
​
  # Starting from the first missing month, loop around the year
  # and keep track of the longest run using one boolean and four
  # integers.
  running = False
  longest_run_index = None
  longest_run_length = 0
  current_run_index = None
  current_run_length = None
  for offset in range(1, 13):
    index = (start_index + offset) % 12
    if month_was_seen[index]:
      # Continue a run or begin a run.
      if running:
        current_run_length += 1
        continue
      running = True
      current_run_index = index 
      current_run_length = 1
      continue
    if running:
      # End the run.
      running = False
      if current_run_length > longest_run_length:
        longest_run_index = current_run_index 
        longest_run_length = current_run_length
​
  return {
    "from": month_names[longest_run_index],
    "to": month_names[(longest_run_index + longest_run_length - 1) % 12],
    "length": longest_run_length
  }

The second is a clever one-liner:

MONTH_NAMES = [
    'January', 'February',
    'March', 'April', 'May',
    'June', 'July', 'August',
    'September', 'October',
    'November', 'December'
]
​
def consecutive_months(list_of_months_by_name):
  return max(
    (
      len(segment)-segment.index(":")-1,
      (MONTH_NAMES*2)[
          int(segment[:segment.index(":")])+1
          :
          int(segment[:segment.index(":")]) + len(segment) - segment.index(":")
      ]
    )
    for segment in 
    "".join([
      "x" if month_name in list_of_months_by_name else f",{index}:"
      for index, month_name in enumerate(MONTH_NAMES*2)
    ]).split(",")
    if ":" in segment
  )[1] if set(MONTH_NAMES) - set(list_of_months_by_name) else MONTH_NAMES

Both algorithms return the expected results for the test data. Thanks AV!

3
trincot On

I noted one issue with your code: when all 12 months appear in the input, then the output lists all months twice. This is easy to fix, just do:

return longest_series[:12]

I would go for a solution where the input is translated to a kind of "bitmap" of 12 bits, where a 1 indicates the corresponding month is in the input, while a 0 indicates it is not.

If represented as a string of 12 characters, you can use a regular expression to easily identify the sequences of "1".

I would also do some preprocessing of the months list, so you both have the list and the dictionary version of it, and have the list doubled, so you can slice from it across the 12-boundary.

Here is the suggested code:

import re

months = ["January", "February", "March", 
          "April", "May", "June",
          "July", "August", "September", 
          "October", "November", "December"]

# Also create a dictionary to get a month's index
month_nums = { month: num for num, month in enumerate(months) }
# ... and double the months list, to ease slicing across the 12-boundary
months += months

def consecutive_months(given_months):
    # Deal with boundary case
    if not given_months:
        return []

    # Convert input to 12 bits in string format
    lst = ["0"] * 12
    for m in given_months:
        lst[month_nums[m]] = "1"
    bits = "".join(lst)
    
    # Identify the longest chunk of consecutive "1" in that doubled string
    _, start, end = max((j-i, i, j) 
        for i, j in (match.span(0)
            for match in re.finditer("1+", bits + bits)
        )
    )
 
    # Using the found span, extract the corresponding month names
    return months[start:end][:12]
8
bruce On

The month string is just a symbol, its essence is still the corresponding number behind it, from 1 to 12, month after month.

Strings for two month cannot be compared directly. If you convert them to numbers, the next month's number can be calculated by adding 1 (except for January after December), and the comparison between numbers is certainly greater than string.

My optimized code is as follows:

MONTHS = ["January", "February", "March",
          "April", "May", "June",
          "July", "August", "September",
          "October", "November", "December"]

month_num_dict = {month: num for num, month in enumerate(MONTHS, start=1)}


def consecutive_months(month_list: list) -> list:
    # Deal with boundary case
    if len(month_list) == 0:
        return month_list

    # A list of twice length is required only when the first and last months end to end
    first_month_num = month_num_dict[month_list[0]]
    last_month_num = month_num_dict[month_list[-1]]
    last_month_next_num = last_month_num + 1 if last_month_num != 12 else 1

    month_list = month_list * 2 if last_month_next_num == first_month_num else month_list

    # Initialize list of candidates and longest series
    candidate = [month_list[0], ]
    longest_series = [month_list[0], ]

    for i in range(len(month_list) - 1):
        month = month_list[i]
        month_num = month_num_dict[month]

        next_month = month_list[i + 1]
        next_month_num = month_num_dict[next_month]

        expected_next_month_num = month_num + 1 if month_num != 12 else 1

        if expected_next_month_num == next_month_num:
            candidate.append(next_month)

            # At the end of the traversal, decide whether to update the longest series
            # according to the length of the candidate.
            if i == len(month_list) - 2 and len(candidate) > len(longest_series):
                longest_series = candidate
        else:
            # When the length of the new candidate is greater than the old, update the longest series
            if len(candidate) > len(longest_series):
                longest_series = candidate

            # Generate next candidate month list
            candidate = [next_month, ]
    
    # Deal with all 12 months input list
    if len(longest_series) > 12:
        return MONTHS

    return longest_series

If you are worried that the handwritten MONTHS list may be wrong, you can also get them by time.strftime:

import time
import locale

locale.setlocale(locale.LC_ALL, "en_US.UTF-8")

month_num_dict = {
    time.strftime("%B", time.strptime(str(num), "%m")): num
    for num in range(1, 13)
}

MONTHS = list(month_num_dict.keys())

Of course, in order to set back to the original locale and ensure thread safety, you can add a thread mutex, and the code can refer to this answer, my complete code contains all the test data, as you can see here.