Processing the same file in multiple threads

257 views Asked by At

I am trying to speed up how long it takes me to process a file in python. My idea is to split the task into n threads.

For example, if I have a file that has 1300 items in it. I want each thread to process every nth item. Each item has no dependancy on any other item so order doesn't matter here

So the workflow would be something like this for each thread:

1) open file
2) iterate through items
3) if nth item then process, otherwise continue

I am using the threading library to do this but I am not seeing any performance improvements.

Here is the pseudocode:

def driver(self):
        threads = []
        # Just picked 10 as test so trying to create 10 threads
        for i in range(0,10):
            threads.append(threading.Thread(target=self.workerFunc, args=(filepath, i, 10)))

        for thread in threads:
            thread.start()

        for thread in threads:
            thread.join()

def workerFunc(self, filepath):
        with open(filepath, 'rb') as file:
                obj = ELFFile(file)
                for item in obj.items:
                        if (item is not nth item):
                                continue
                        else:
                                process this item

Since every thread is just reading the file, it should be able to scan through the file freely without caring about what other threads are doing or getting blocked by them, right?

What am I overlooking here?

The only thing I can think of is that the library I'm using to format these files (pyelftool ELFFile) has something internal that is blocking but I can't find it. Or is there something fundamentally flawed with my plan?

EDIT: just to note, there are 32 cpus on the system I am running this on

0

There are 0 answers