Im stuck with a simple problem in mrjob mareduce framework: I want to get the average number of words in a given parragraph and i got this:
class LineAverage(MRJob):
def mapper(self, _, line):
numwords = len(line.split())
yield "words", numwords
yield "lines", 1
def reducer(self, key, values):
yield key, sum(values)
With this code, i get after reduce process, the total of lines and words in the text, but i dont know how to get the average by doing:
words/TotalOfLines
I am newbie in this model of programming, if anyone can illustrate this example it'll be very appreciated.
In the meantime, thank you so much for your attention and participation
After all, the answer was simple: I actually sended to the reducer a number of values equal to the number of lines. So, in the reducer i just had to count the numer of values for the key.
So the mapper sends for each line a pair ("words", x), the shuffle process will result in a tuple: ("words": x1, x2, x3,..xnumberOfLines) whic is the input for the reducer, then i just have to count the numbber of values for the key and thats it, i got the numer of lines.
Hope it will be helpfull for someone.