How to perform parallel processes for different groups in a folder?

212 views Asked by At

I have a folder containing a lot of images. I have a code which transforms these images into black and white format and then use tesseract to convert them into text files. I have been using the following code to split these files into subgroups:

i=0; for f in *; do d+dir_$(printf %03d $((i/(number of files in each folder+1))); mkdir -p $d; mv "$f" $d' let i++; done

This command works great to split up the files (puts the grouped files into different folders) but because I am planning on using this procedure for many many files I would like to change this process to be less time consuming (it would take a bit too much time to move the files to a folder). Is there a way I can specify the subgroup of files in order to run a process and use & in order to do multiple instances at once? For example, I would like to run a process for the firt 400 files in a folder and then use " & " in order to run that same process for the files that are in the order of 401-800.

Here is the code that I am using for the conversion:

parallel -j 5 convert {} "-resample 200 -colorspace Gray" {.}BW.png ::: *.png ; parallel -j 5 tesseract {} {} -l tla -psm 6 ::: *BW.png ; rm *BW.png

By group I simply mean the first 400 files, the second group would be the following 400 files and so on...

2

There are 2 answers

0
Adrian On BEST ANSWER

So my whole ordeal was with trying to use my code on a directory with a lot of files. In order to get rid of the errer stating that there are too many Arguments, I used this code that I gathered from previous Ole Tange posts:

ls ./ | grep -v '\BW.png' | parallel -j 60 convert {} "-resample 100 -colorspace Gray" {.}BW.png; ls ./ | grep \BW.png | parallel -j 60 tesseract {} {} -l tla -psm 6; find . -name "*BW.png" -print0 | xargs -0 rm;

Thanks to everyone that contributed.

0
Eugeniu Rosca On

I would let Make to take care of multiprocessing, using a Makefile like this:

Makefile:

EXT_IN          := .jpg
EXT_OUT         := .txt
FILES_IN        := $(wildcard *$(EXT_IN))
FILES_OUT       := $(addsuffix $(EXT_OUT), $(basename $(FILES_IN)))

.PHONY: all

$(FILES_OUT):
        @echo Generating $@ from $(addsuffix $(EXT_IN), $(basename $@))
        # Do your conversion here!

all: $(FILES_OUT)
        @echo "Processing finished!"

Running:

$ > make all -j 8
Generating file1.txt from file1.jpg
Generating file2.txt from file2.jpg
Generating file3.txt from file3.jpg
Generating file4.txt from file4.jpg
Generating file5.txt from file5.jpg
Generating file6.txt from file6.jpg
Processing finished!