Apologies if this has already been asked and answered but I couldn't find a satisfactory answer.
I have a list of chemical formulas including, in this order: C, H, N and O. And I would like to pull the number after each of these letters. The problem is that not all the formulas contain an N. All contain a C, H and O however. And the number can be either single, double or (in the case of H only) triple digit.
Thus the data looks like this:
- C20H37N1O5
- C10H12O3
- C20H19N3O4
- C23H40O3
- C9H13N1O3
- C14H26O4
- C58H100N2O9
I'd like each element number for the list in separate columns. So in the first example it would be:
20 37 1 5
I've been trying:
=IFERROR(MID(LEFT(A2,FIND("H",A2)-1),FIND("C",A2)+1,LEN(A2)),"")
to separate out the C#. However, after this I get stuck as the H# is flanked by either an O or N.
Is there an excel formula or VBA that can do this?
Use Regular Expressions
This is a good task for regular expressions (regex). Because VBA doesn't support regular expressions out of the box we need to reference a Windows library first.
Add reference to regex under Tools then References
and selecting Microsoft VBScript Regular Expression 5.5
Add this function to a module
Use the function like this in a cell formula
E.g. in cell B2:
=ChemRegex($A2,B$1)
and copy it to the other cellsRecognize also chemical formulas with multiple occurrences of elements like
CH₃OH
orCH₂COOH
Note that the code above cannot count something like
CH3OH
where elements occur more than once. Then only the firstH3
is count the last is omitted.If you need also to recognize formulas in the format like
CH3OH
orCH2COOH
(and summarize the occurrences of the elements) then you need to change the code to recognize these too …Recognize also chemical formulas with 2 letter elements like
NaOH
orCaCl₂
In addition to the change above for multiple occurrences of elements use this pattern:
Note that they need to be in the correct upper/lower letter case.
CaCl2
works but notcacl2
orCACL2
.Note that this doesn't proof if these letter combinations are existing elements of the periodic table. So this will also recognize eg.
Xx2Zz5Q
as fictive elementsXx = 2
,Zz = 5
andQ = 1
.To accept only combinations that exist in the periodic table use the following pattern:
Recognize also chemical formulas with prenthesis like
Ca(OH)₂
Therefore another RegEx is needed to handle the parenthesis and multiply them.
This will also recognize parenthesis. Note that it does not recognize nested parenthesis.
Also have a look at a similar question: Determine total number of atoms in a chemical formula