Python - How to find out how many times the user said the word "the" or "The"

1.1k views Asked by At
sentence2 = raw_input("Enter the sentence on the StringLab3 WS: ")

sentence.split(sentence2)
for word in default_sentence:
    if word == (chr(84)+chr(104)+chr(101)) or (chr(116)+chr(104)+chr(101)):
        words += 1

print "The amounf of times 'the' or 'The' appear is a total of", words, "times."

This is what I have now, the output is currently 961 for the sentence:

This is a day of national consecration. And I am certain that on this day my fellow Americans expect that on my induction into the Presidency, I will address them with a candor and a decision which the present situation of our people impels. This is preeminently the time to speak the truth, the whole truth, frankly and boldly. Nor need we shrink from honestly facing conditions in our country today. This great Nation will endure, as it has endured, will revive and will prosper. So, first of all, let me assert my firm belief that the only thing we have to fear is fear itself, nameless, unreasoning, unjustified terror which paralyzes needed efforts to convert retreat into advance. In every dark hour of our national life, a leadership of frankness and of vigor has met with that understanding and support of the people themselves which is essential to victory. And I am convinced that you will again give that support to leadership in these critical days.

We're supposed to have the user input this. Any advice?

5

There are 5 answers

13
wnnmaw On

I'd recommend this:

map(lambda word: word.lower(), paragraph.split()).count("the")

Output:

>>> paragraph = "This is a day of national consecration. And I am certain that on this day my fellow Americans expect that on my induction into the Presidency, I will address them with a can
dor and a decision which the present situation of our people impels. This is preeminently the time to speak the truth, the whole truth, frankly and boldly. Nor need we shrink from honestly f
acing conditions in our country today. This great Nation will endure, as it has endured, will revive and will prosper. So, first of all, let me assert my firm belief that the only thing we h
ave to fear is fear itself, nameless, unreasoning, unjustified terror which paralyzes needed efforts to convert retreat into advance. In every dark hour of our national life, a leadership of
 frankness and of vigor has met with that understanding and support of the people themselves which is essential to victory. And I am convinced that you will again give that support to leader
ship in these critical days."
>>> map(lambda word: word.lower(), paragraph.split()).count("the")
7

Since my solution may look weird, here's a little explanation from left to right:

map(function, target): This applies the function to all elements of target, thus target must be a list or some other iterable. In this case, we're mapping a lambda function, which can be a little scary, so read below about that

.lower(): Takes the lower case of whatever string its applied to, word in this case. This is done to ensure that "the", "The", "THE", "ThE", and so on are all counted

.split(): This splits a string (paragraph) into a list by the separator supplied in the parenthesis. In the case of no separator (such as this one), a space is assumed to be the separator. Note that sequential separators are lumped when the separator is left out.

.count(item): This counts the instances of item in the list its applied to. Note that this is not the most efficient way to count things (gotta go regex if you about speed)

The scary lambda function:

lambda functions are not easy to explain or understand. Its taken me quite a while to get a grip on what they are and when they're useful. I found this tutorial to be rather helpful.

My best attempt at a tl;dr is lambda functions are small, anonymous functions that can be used for convenience. I know this is, at best, incomplete, but I think it should suffice for the scope of this question

0
Barmar On

The problem is this line:

if word == (chr(84)+chr(104)+chr(101)) or (chr(116)+chr(104)+chr(101)):

Comparisons in most programming languages cannot be abbreviated like they can in English, you can't write "equal to A or B" as short for "equal to A or equal to B", you need to write it out:

if word == (chr(84)+chr(104)+chr(101)) or word == (chr(116)+chr(104)+chr(101)):

What you wrote is parsed as:

if (word == (chr(84)+chr(104)+chr(101))) or (chr(116)+chr(104)+chr(101)):

Since the second expression in the or is always true (it's a string, and all non-empty strings are true), the if always succeeds, so you count all the words, not just the and The.

There's also no good reason to use that verbose chr() syntax, just write:

if word == "the" or word == "The":

There are other bugs in your code. The split line should be:

default_sentence = sentence2.split();
4
Nick Beeuwsaert On

You can do it like this, using regexes:

#!/usr/bin/env python
import re
input_string = raw_input("Enter your string: ");
print("Total occurences of the word 'the': %d"%(len(re.findall(r'\b(T|t)he\b', input_string)),));

and if you want it to be case insensitive the call to re.findall can just be changed to re.findall(r'\bthe\b', input_string, re.I)

1
Chris Barker On

The reason your code isn't working is because you wrote

if word == (chr(84)+chr(104)+chr(101)) or (chr(116)+chr(104)+chr(101)):
# evaluates to: if word == "The" or "the":
# evaluates to: if False or "the":
# evaluates to: if "the":

Instead of

if (word == (chr(84)+chr(104)+chr(101))) or (word == (chr(116)+chr(104)+chr(101))):
# evaluates to: if (word == "The") or (word == "the")

More importantly, as Barmar pointed out, using the string literal 'the' is much more readable.

So you might want something like this:

count = 0
for word in default_sentence.split():
    if word == 'the' or word == 'The':
        count += 1

wnnmaw has an equivalent one-liner which works almost as well. map(lambda word: word.lower()) doesn't quite work, because by OP's spec, we only want to count 'the' and 'The', not 'THE'.

0
abarnert On

The simplest implementation, and probably also the fastest, is:

sentence.lower().split().count('the')

Take the paragraph, turn it into lowercase, split it into words, and count how many of those words are 'the'. Almost a direct translation from the problem description.


The first problem with your attempt is that you read user input into a variable named sentence2, then use it as a separator to split some other variable named sentence, throwing away the result, then loop over yet another variable named default_sentence. That isn't going to work. Python won't guess what you mean just because variable names are kind of similar. You have to write those first three lines line this:

The second problem is that your or expression doesn't mean what you think it does. This has been explained in dozens of other questions; you can start at What's going on with my if else statement and, if that doesn't explain it, see the related links and duplicates from there.

If you solve both of those problems, your code actually works:

sentence = raw_input("Enter the sentence on the StringLab3 WS: ")
default_sentence = sentence.split()
words = 0
for word in default_sentence:
    if word in ((chr(84)+chr(104)+chr(101)), (chr(116)+chr(104)+chr(101))):
        words += 1

print "The amounf of times 'the' or 'The' appear is a total of", words, "times."

I don't know why everyone else is over-complicating this in the name of efficiency, by replacing the count with an explicit sum over a comprehension or using regexps or using map to call lower after the split instead of before or… but they're actually making things slower as well as harder to read. Which is usually the case with micro-optimizations like this… For example:

In [2829]: %timeit paragraph.lower().split().count('the')
100000 loops, best of 3: 14.2 µs per loop
In [2830]: %timeit sum([1 for word in paragraph.lower().split() if word == 'the'])
100000 loops, best of 3: 18 µs per loop
In [2831]: %timeit sum(1 for word in paragraph.lower().split() if word == 'the')
100000 loops, best of 3: 17.8 µs per loop
In [2832]: %timeit re.findall(r'\bthe\b', paragraph, re.I)
10000 loops, best of 3: 38.3 µs per loop
In [2834]: %timeit list(map(lambda word: word.lower(), paragraph.split())).count("the")
10000 loops, best of 3: 49.6 µs per loop