Hey there skbio team.
So I need to allow either DNA or RNA MSAs. When I do the following, if I leave out the alignment_fh.close() skbio reads the 'non header' line in the except block making me think I need to close the file first so it will start at the beginning, but if I add alignment_fh.close() I cannot get it to read the file. I've tried opening it via a variety of methods, but I believe TabularMSA.read() should allow files OR file handles. Thoughts? Thank you!
try:
aln = skbio.TabularMSA.read(alignment_fh, constructor=skbio.RNA)
except:
alignment_fh.close()
aln = skbio.TabularMSA.read(alignment_fh, constructor=skbio.DNA)
You're correct: scikit-bio generally supports reading and writing files using open file handles or file paths.
The issue you're running into is that your first
TabularMSA.read()
call reads the entire contents of the open file handle, so that when the secondTabularMSA.read()
call is hit within theexcept
block, the file pointer is already at the end of the open file handle -- this is why you're getting an error message hinting that the file is empty.This behavior is intentional; when scikit-bio is given an open file handle, it will read from or write to the file but won't attempt to manage the handle's file pointer (that type of management is up to the caller of the code).
Now, when asking scikit-bio to read a file path (i.e. a string containing the path to a file on disk or accessible at some URI), scikit-bio will handle opening and closing the file handle for you, so that's often the easier way to go.
You can use file paths or file handles to accomplish your goal. In the following examples, suppose
aln_filepath
is astr
pointing to your alignment file on disk (e.g."/path/to/my/alignment.fasta"
).With file paths: You can simply pass the file path to both
TabularMSA.read()
calls; noopen()
orclose()
calls are necessary on your part.With file handles: You'll need to open a file handle and reset the file pointer within your
except
block before reading a second time.Note: In both examples, I've used
except ValueError
instead of a "catch-all"except
statement. I recommend catching specific error types (e.g.ValueError
) instead of any exception because the code could be failing in different ways than what you're expecting. For example, with a "catch-all"except
statement, users won't be able to interrupt your program withCtrl-C
becauseKeyboardInterrupt
will be caught and ignored.