I am searching for a solution for a long time but couldn't be able to find it. There are more similar qestion-answers but that didn't help me.
Basically
- I have some word documents (xxx.docx) having some images.
- That image is in WMF format (when I am manually checking it) and it basically contains tabular information.
- I need to collect that table.
So the task is reduced to collect the image and get table from text using computer vision.
1 when I am trying to collect the image-- python-docx can't detect that as image , then, I found "aspose.words" library can detect the image (as it is not in an usual image format)as an image object and can write it in EMF format (xxx.emf). [ if anyother way is there please mention ]
[2] Now I have the image (xxx.emf) in a folder. so the next task is to get the content the image contains, which is totally tabular information. Now I can't use this format to read in python.
So, getting emf image and reading is not my target, the target is to get the table from the image in excel. Please help me out in these steps, or please suggest other ways according to the requirement. If anyone needs to get the docx can go to this here in a repo. Thank you.
Word and Excel files are actually just zipped archives. You can unzip them with
7zip
:That gives you the following content:
You can see your WMF file there and copy it to the current directory and rename it for simpler access:
You can then convert that to a PNG with either Inkscape or LibreOffice
I think it has messed up a little on my system because I lack your fonts.