Using Biopython SeqIO.convert over an entire directory

535 views Asked by At

I have 51 files with metagenomic sequence data that I would like to convert from fastq to fasta using a Biopython script in Windows. The module SeqIO.convert easily converts an individually specified file, but I can't figure out how to convert the entire directory. It's not really too many files to do individually, but I'm trying to learn.

I'm brand new to Biopython, so please forgive my ignorance. This convo was helpful, but I'm still not able to convert the directory from fastq to fasta.

Here's the code I've been trying to run:

#modules- 
import sys
import re
import os
import fileinput

from Bio import SeqIO

#define directory
Directory = "FastQ”

#convert files  
def process(filename):
  return SeqIO.convert(filename, "fastq", "files.fa", filename + ".fasta",   "fasta", alphabet= IUPAC.ambiguous_dna)
1

There are 1 answers

0
cnluzon On

You need to iterate over the files in the directory and convert them, so assuming your directory is FastQ and that you are calling your script from the proper folder (i.e. the one that your directory is in, since you are using a relative path), you would need to do something like:

def process(directory):
    filelist = os.listdir(directory)
    for f in filelist:
        SeqIO.convert(f, "fastq", f.replace(".fastq",".fasta"), "fasta", alphabet= IUPAC.ambiguous_dna)

then you would call your script in your main:

 my_directory = "FastQ"
 process(my_directory)

I think that should work.