Get first x lines from file, and match on substring

204 views Asked by At

I have a file I'm downloading via FTP. It's a very large file, so I want to only get the first say, 20 lines to work with right now. I want to write those 20 lines to a new file on my local machine. In the process, I want to match on a value within the line.

The file is pipe-delimited and the the beginning of each line looks like this:

9999-12-31|XX|...

I want to only write to the output file when the value of that second field is XX, otherwise, ignore it.

Here are the basics of my code:

def writeline(line):
    file.write(line + "\n")

file = open(localDir + fileName, "w+")
ftp.retrlines("RETR '" + remotePath + "'", writeline)

All of this code works fine to download the file if I want to output the entire file. I tried to put a while loop into my writeline function, but it would just write each line the number of times I specified in my loop, which makes sense in hindsight. It seems like the while loop needs to be somehow in the retrlines function.

I'm pretty new to Python, so I appreciate any help you can provide and for your patience with my noob question.

Update Ok, it looks like to match on the substring, I can do:

line[11:13]

but that still leaves me with the problem of trying to get only the first x lines to work with.

2

There are 2 answers

0
farhawa On

Try to open your file in an other way, something like:

def writeline(i,line):
    if line[11:13] == 'XX': 
       file.write(line + "\n")
       i+=1
    return i

file_ = open(localDir + fileName).read.splitlines()
i = 0
while i < 20:
    i = writeline(file_[i])
0
Ali SAID OMAR On

I you want to work on your entire file, here the functions to filter file and write to other according a filter pattern:

def get(f, pattern="XX", index=1, sep="|", max=100):
    c = 0
    with open(f) as in_:
        for line in in_:
            if line.split(sep)[index] == pattern:
                c += 1
                yield line
                if c == max:
                   return

def set(outf, inf):
    with open(outf, "w") as out:
        for l in get(inf):
            out.write(l)

set("out.txt", f)