The context is that I have extracted the 'chemical contents' from different forms into free texts.
The end goal is to organize the extracted files into a structured database like this: final structured data
But the problems is that the extracted text files come in different formats: Some with the chemical & corresponding values lined up row by row (which is good) good example;
Some are by column (I guess not too bad) not too good but okay
Some are also like this (which is kinda headache) : headache example
So my question is: Any suggestions on how to more efficiently read and organize the extracted text into the structured database (as I have shown in the beginning), besides defining all possible templates for reading those extracted texts?
I'm really new to text processing, so any help would e greatly appreciated.