I have a file with the structure:

N1H3O1 C2H2
C1H4 H201
C1H1N1 N1H3
C2N1O1P1H3 P5

What I am trying to do is to count the sum of coefficients in each of the formulae. Thus, the desire output is:

1+3+1 5 2+2 4
1+4 5 2+1 3
1+1+1 3 3+1 4
2+1+1+1+3 8 5 5

What I did is a simple replacement of each letter with "+" and then deleting the first " +".

I however would like to know how to do it in a more proper way in sed, using branch and flow operators.

1 Answers

Emma On

The problem with your input is the 0 which is used instead of O, which might make it difficult to design a regular expression for it, which you can see here:

enter image description here


Other than that, you might be able to capture the numbers by simply adding ([^A-Z]+).

However, you may not wish to do this task with regular expression, since your data except for that 0 is pretty structured, and you could maybe write a script to do so.