The code below extract short sequence in every sequence with the window size 4. How to shift the window by step size 2 and extract 4 base pairs?
Example code
from Bio import SeqIO
with open("testA_out.fasta","w") as f:
for seq_record in SeqIO.parse("testA.fasta", "fasta"):
i = 0
while ((i+4) < len(seq_record.seq)) :
f.write(">" + str(seq_record.id) + "\n")
f.write(str(seq_record.seq[i:i+4]) + "\n")
i += 2
Example Input of testA.fasta
>human1
ACCCGATTT
Example Output of testA_out
>human1
ACCC
>human1
CCGA
>human1
GATT
The problem with this output is that there are one T left out so in this case I hope to include it as well. How can I come out with this output? With a reverse extract as well to include base pairs that are probably left out when extract from start to end. Can anyone help me?
Expected output
>human1
ACCC
>human1
CCGA
>human1
GATT
>human1
ATTT
>human1
CGAT
>human1
CCCG
You can use a
for
loop withrange
, using the thirdstep
parameter forrange
. This way, it's a bit cleaner than using awhile
loop. If the data can not be divided by the chunk size, then the last chunk will be smaller.Output is