python NLP parsing unstructured data

212 views Asked by At

The context is that I have extracted the 'chemical contents' from different forms into free texts.

The end goal is to organize the extracted files into a structured database like this: final structured data

But the problems is that the extracted text files come in different formats: Some with the chemical & corresponding values lined up row by row (which is good) good example;

Some are by column (I guess not too bad) not too good but okay

Some are also like this (which is kinda headache) : headache example

So my question is: Any suggestions on how to more efficiently read and organize the extracted text into the structured database (as I have shown in the beginning), besides defining all possible templates for reading those extracted texts?

I'm really new to text processing, so any help would e greatly appreciated.

0

There are 0 answers