In Python, can you find a substring using a for loop and equivalence (==) ? No regex

5.3k views Asked by At

Here is my problem: Write a program that takes two lines of input, we call the first needle and the second haystack. Print the number of times that needle occurs as a substring of haystack. I am encouraged to use a loop and the equivalence operator.

I'm not making much progress - here is my code after 4 hours...

..after two days I've got this...

needle = 'sses'
haystack = 'assesses'
count = 0                  # initialize the counter
index = haystack.index(needle) # get the first character in the substring
string1 = haystack[index:len(needle) + index] # get the whole substring

for position in range(0,len(haystack)): # loop through the string
    if haystack[position:len(needle) + index] == string1: # match the 1st substring
    count += 1 # iterate the counter
print (count)

...my problem is, how do I get the for loop to count the second occurance of the string?

Thanks

Tim

Finally, the 'correct' answer:

needle = input()
haystack = input()
count = 0
if needle in haystack:
    index = haystack.index(needle)
    string1 = haystack[index:len(needle) + index]
    for position in range(0,len(haystack)):
        if haystack[position:len(needle) + position] == string1:
            count += 1
 print (count)
1

There are 1 answers

1
AudioBubble On

Let's break down your code line by line:

needle = 'sses'
haystack = 'assesses'
count = 0                  # initialize the counter

Fine so far, this is only initialization.

index = haystack.index(needle) # get the first character in the substring

This line already is a problem, index raises a ValueError if it doesn't find the substring. Your program would crash in this case. You should instead use haystack.find(needle) which does the same, but instead of raising a ValueError it return -1 if the substring isn't found.

I do however not understand why you use this line at all. Your following loop will loop through the whole haystack and will also find the first appearance of needle.

string1 = haystack[index:len(needle) + index] # get the whole substring

This line is only valid if needle was found in the previous line. Also guess what string1 will be after this line? You are extracting the part of haystack where you previously found the substring needle. So the result will be string1 == needle, which won't help you in any way.

for position in range(0,len(haystack)): # loop through the string

ok, you loop through all positions in the string.

    if haystack[position:len(needle) + index] == string1: # match the 1st substring

So here I don't get why you want to find the first occurence again, which you already found before. Don't you want to check whether there is a match at position, no matter whether it is the first or second or third... one? So I would guess that haystack[position:len(needle) + index] is supposed to extract the substring of haystack that starts at position position and has length of needle. But why is there + index then? What has the first occurence (saved in index) to do with this? Don't you mean + position here? Finally you are comparing to string1 which as I said will be (if your code makes it to this line) equal to needle. So why not directly compare to needle?

    count += 1 # iterate the counter

This line is wrong indented in your posted code, it should be one deeper than the if-statement.

Finally you have to consider in your for-loop, that if position gets to the end of haystack there might not be a substring with length len(needle) starting at position anymore. So you probably want to stop iterating prior to that. (EDIT: I just notice that the code will run correctly anyway. It is not necessary to address this in python, because using indexes out of bounds of the string is allowed, but it would be in other languages.)

I guess this is an exercise, but if it wasn't there would be a much easier way to do this in python: count = haystack.count(needle). There is a small difference to your proposed algorithm though. string.count(substring) will return the number of non-overlapping matches, while your current code would find the number of non-overlapping and overlapping matches. The exercise as posted by you is unclear which of both is meant. But if you are supposed to find only non-overlapping results you need to consider this in your for-loop, too.

There are also several improvements one could make to the style and performance of your code, but I will not get into this, as you seem to have trouble getting it to work at all.