I have 2 fastq files F1.fastq and F2.fastq. F2.fastq is a smaller file which is a subset of reads from F1.fastq. I want reads in F1.fastq which ARE NOT in F2.fastq. The following python code does not seem to work. Can you suggest edits?
needed_reads = []
reads_array = []
chosen_array = []
for x in Bio.SeqIO.parse("F1.fastq","fastq"):
reads_array.append(x)
for y in Bio.SeqIO.parse("F2.fastq","fastq"):
chosen_array.append(y)
for y in chosen_array:
for x in reads_array:
if str(x.seq) != str(y.seq) : needed_reads.append(x)
output_handle = open("DIFF.fastq","w")
SeqIO.write(needed_reads,output_handle,"fastq")
output_handle.close()
You can use sets for accomplishing your requirement , you can convert
list1
toset
and thenlist2
toset
, and then doset(list1) - set(list2)
, it will give items inlist1
that are not inlist2
.Sample code -