Remove sequence by coordinates with Biopython

287 views Asked by At

Hel lo

I have a sequence such as :

record_dict = SeqIO.to_dict(SeqIO.parse("sequence.fasta", "fasta"))

>sequence1 
AAACCCGGGTTTAAACCCGGGTTTGGGTTTGGG

and I know from this sequence how to select specific part with coordinates with :

print(record_dict[sequence1].seq[coordinate_start:coordinate_end])
print(record_dict[sequence1].seq[3:7])

and I get :

CCCGG

but what if I would like to remove this part from the

>sequence1 
AAACCCGGGTTTAAACCCGGGTTTGGGTTTGGG 

and get

>sequence1 
AAACGTTTAAACCCGGGTTTGGGTTTGGG

Does someone have an idea?

Thanks for your help

Here is a better exemple

ACCGCTTTGAATCCGAGCTAG
           ---- ----

and I want to remove 2 parts :

TCCG and GCTA with corresponds to the coordinates

11:14 and 16:19

At the end I would like to remove both and get :

>seq
ACCGCTTTGAAAG
1

There are 1 answers

2
Nathan On BEST ANSWER

You could do this by taking the two parts you want and adding those back together:

sequence_1 = 'AAACCCGGGTTTAAACCCGGGTTTGGGTTTGGG'
sequence_1a = sequence_1[:4]
sequence_1b = sequence_1[8:]
sequence_2 = sequence_1a + sequence_1b
print(sequence_2)

>>> AAACGTTTAAACCCGGGTTTGGGTTTGGG

Do notice that I've added 1 to both your indices in order to cut out the correct part.

If you want to do this for multiple parts, you can do this with looping over the list:

sequence_1 = 'ACCGCTTTGAATCCGAGCTAG'
indexes_to_delete = [(11, 14), (16, 19)]
output_sequence = ''
start_value = 0
for start_delete, end_delete in indexes_to_delete:
    output_sequence += sequence_1[start_value: start_delete]
    start_value = end_delete
output_sequence += sequence_1[start_value:]
print(output_sequence)

>>> ACCGCTTTGAAGAAG