Autoformat Text with Machine Learning

391 views Asked by At

I am currently working at an issue regarding optimizing the workflow of an agency.

The agency receives like 30-40 PDF/Word documents, which should be converted into Indesign-Files, which will be print in newspapers. Its always the same pattern: job adverts with a logo, the job position and some text. Weekly the same customers send us their adverts. Our employees usually take the patterns of the existing files and copy-paste the new text.

We apply some fix formating rules like: not words overlapping across lines, distance between the job title and the first paragraph. One important thing is to keep the height as small as possible in order to reduce costs for our clients. Because we have many employees who are new, work in part-time etc. we face a huge fluctuation. therefore we want to standardize the process, in order to only do some little changes for new adverts.... I guess you know what I mean.

Do you see a possibilty to improve the process for example with NLTK? I think of training an algorithm which recognizes the "job title", "bullet-points", logo etc. and automatically propose a formation for the text.

A colleague told me just to write a script which formates the indesign document.

What do you think? Thanks so far.

Here is a brief example:

Example Picture

0

There are 0 answers