Map function fails in mapreduce run in EMR

235 views Asked by At

I am running my own map reduce tasks on Amazon EMR. I see that the map tasks are failing, I am not able to find out the reason for the failed map tasks.

import fileinput
import csv

myDict = {}
csvreader = csv.reader(fileinput.input(mode='rb'), delimiter=',')
for newline in  csvreader:
    #newline =  line.split(',')
    if newline[6] not in myDict.keys():
        #print 'Zipcode: ' + row[6] + ' Hospital code: ' + row[1]
        myDict[newline[6]] = 1
    elif newline[6] in myDict.keys():
        #print 'value in row '+ str(myDict[row[6]])
        myDict[newline[6]] += 1

for key in myDict.keys():
    print '%s\t%s' % (str(key), str(myDict[key]))

The map task is to read the csv file given as input, create key,value pairs using the data in two columns. The reduce task is to aggregate them and print them.

The following is the stderr obtained for the maptask when #!/usr/bin/env python is NOT added at the top of the script. If it is adde,d stderr is blank and yet the maptask fails.:

/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1417478068297_0008/container_1417478068297_0008_01_000005/./map_zip_hospi.py: line 1: import: command not found
/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1417478068297_0008/container_1417478068297_0008_01_000005/./map_zip_hospi.py: line 2: import: command not found
/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1417478068297_0008/container_1417478068297_0008_01_000005/./map_zip_hospi.py: line 3: myDict: command not found
/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1417478068297_0008/container_1417478068297_0008_01_000005/./map_zip_hospi.py: line 4: syntax error near unexpected token `('
/mnt/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1417478068297_0008/container_1417478068297_0008_01_000005/./map_zip_hospi.py: line 4: `csvreader = csv.reader(fileinput.input(), delimiter=',')

'

I can see from the console that the map tasks are failing. Can someone help me find the errors with my code ?

1

There are 1 answers

0
Sushwanth On

I missed a very trivial thing, I also see that this mistake is committed by many others. The following should be the very first line of the python script.

#!/usr/bin/env python