Differences between enumerate(fileinput.input(file)) and enumerate(file)

831 views Asked by At

I'm looking for some help with my code which is rigth below :

for file in file_name :
    if os.path.isfile(file):
        for line_number, line in enumerate(fileinput.input(file, inplace=1)):
            print file
            os.system("pause")
            if line_number ==1:
                line = line.replace('Object','#Object')
                sys.stdout.write(line)

I wanted to modify some previous extracted files in order to plot them with matplotlib. To do so, I remove some lines, comment some others.

My problem is the following :

  • Using for line_number, line in enumerate(fileinput.input(file, inplace=1)): gives me only 4 out of 5 previous extracted files (when looking file_name list contains 5 references !)

  • Using for line_number, line in enumerate(file): gives me the 5 previous extracted file, BUT I don't know how to make modifications using the same file without creating another one...

Did you have an idea on this issue? Is this a normal issue?

2

There are 2 answers

0
Andy Kubiak On

Assuming you're still having trouble, my typical approach is to open a file read-only, read its contents into a variable, close the file, make an edited variable, open the file to write (wiping out original file), and finally write the edited contents.

I like this approach since I can simply change the file_name that gets written out if I want to test my edits without wiping out the original file.

Also, I recommend naming containers using plural nouns, like @Martin Evans suggests.

import os

file_names = ['file_1.txt', 'file_2.txt', 'file_3.txt', 'file_4.txt', 'file_5.txt']
file_names = [x for x in file_names if os.path.isfile(x)] # see @Martin's answer again

for file_name in file_names:
    # Open read-only and put contents into a list of line strings
    with open(file_name, 'r') as f_in:
        lines = f_in.read().splitlines()

    # Put the lines you want to write out in out_lines
    out_lines = []
    for index_no, line in enumerate(lines):
        if index_no == 1:
            out_lines.append(line.replace('Object', '#Object'))
        elif ...
        else:
            out_lines.append(line)

    # Uncomment to write to different file name for edits testing
    # with open(file_name + '.out', 'w') as f_out:
    #     f_out.write('\n'.join(out_lines))

    # Write out the file, clobbering the original
    with open(file_name, 'w') as f_out:
        f_out.write('\n'.join(out_lines))

Downside with this approach is that each file needs to be small enough to fit into memory twice (lines + out_lines).

Best of luck!

1
Martin Evans On

There a number of things that might help you.

Firstly file_name appears to be a list of file names. It might be better named file_names and then you could use file_name for each one. You have verified that this does hold 5 entries.

The enumerate() function is used to help when enumerating a list of items to provide both an index and the item for each loop. This saves you having to use a separate counter variable, e.g.

for index, item in enumerate(["item1", "item2", "item3"]):
    print index, item

would print:

0  item1
1  item2
2  item3

This is not really required, as you have chosen to use the fileinput library. This is designed to take a list of files and iterate over all of the lines in all of the files in one single loop. As such you need to tweak your approach a bit, assuming your list of files is called file_names then you write something as follows:

# Keep only files in the file list
file_names = [file_name for file_name in file_names if os.path.isfile(file_name)]

# Iterate all lines in all files
for line in fileinput.input(file_names, inplace=1):
    if fileinput.filelineno() == 1:
        line = line.replace('Object','#Object')
        sys.stdout.write(line)  

The main point here being that it is better to pre filter any non-filenames before passing the list to fileinput. I will leave it up to you to fix the output.

fileinput provides a number of functions to help you figure out which file or line number is currently being processed.