I am searching google for answers but i could not get one module to convert doc/pdf/docx/rtf to text
Is there any python module to convert doc/pdf/docx/rtf formats to text?
I am searching google for answers but i could not get one module to convert doc/pdf/docx/rtf to text
Is there any python module to convert doc/pdf/docx/rtf formats to text?
One module to rule them all!
textract. It supports many file types for text extraction, including all the ones that you specified in your question.
PDF example
http://textract.readthedocs.io/en/latest/python_package.html