Issue with python's string find method returning -1

39 views Asked by At

I'm trying to write a function that takes a string and returns the token positions. The function works fine when tokens = query_string.split() but if I try to use the string lower method, as shown in my code below, my first tuple returns as [(-1, 2), (5, 6), (8, 8), (10, 13)] rather than the desired output of [(0, 3), (5, 6), (8, 8), (10, 13)]

The string I used for testing is 'This is a test'.

def token_position_list(query_string):
    """
    :param query_string: a string representing a query
    :return: a list of tuples, where each tuple holds the start and end positions of each token
    """
    token_positions = []
    tokens = query_string.lower().split()
    current_position = 0
    for token in tokens:
        start_position = query_string.find(token, current_position)
        end_position = start_position + len(token) - 1
        token_positions.append((start_position, end_position))
        current_position = end_position + 1
    return token_positions

Can anyone explain to me why adding lower does this and how I could fix this?

1

There are 1 answers

1
Barmar On

All your tokens are lowercase, but query_string is still mixed-case. So it won't find the token if the original string has any uppercase letters in that token.

You should convert query_string to lowercase and process that.

def token_position_list(query_string):
    """
    :param query_string: a string representing a query
    :return: a list of tuples, where each tuple holds the start and end positions of each token
    """
    token_positions = []
    query_string = query_string.lower()
    tokens = query_string.split()
    current_position = 0
    for token in tokens:
        start_position = query_string.find(token, current_position)
        end_position = start_position + len(token) - 1
        token_positions.append((start_position, end_position))
        current_position = end_position + 1
    return token_positions