I am a beginner when it comes to programming and python and such. So apologies if this is kind of a simple question.
But I have large files that for example contain lines like this:
10000 7
20000 1
30000 2
60000 3
What I want to have, is a file that also contains the 'missing' lines, like this:
10000 7
20000 1
30000 2
40000 0
50000 0
60000 3
The files are rather large as I am working with whole genome sequence data. The first column is basically a position in the genome and the second column is the number of SNPs I find within that 10kb window. However, I don't think this information is even relevant, I just want to write a simple python code that will add these lines to the file by using if else statements.
So if the position does not match the position of the previous line + 10000, the 'missing line' is written, otherwise the normal occurring line is written.
I just foresee one problem in this, namely when several lines in a row are missing (as in my example). Does anyone have a smart solution for this simple problem?
Many thanks!
How about this:
Instead of the last print statement you would write the values to a new file.
Edit: I ran this code on this sample. Output below.
lines.txt
Output