Spring Batch Jobs do not release memory

4.6k views Asked by At

I am running around 18.000 spring jobs in parallel, each with one step. Each step consists of reading from a file, converting and manipulating those values and writing them to a Mongo and MySql database, nothing unusual. After all of the jobs finished, memory consumption stays at 20GB USED and stays there. I construct my spring batch members as follows:

@Autowired
public ArchiveImportManager(final JobRepository jobRepository, final BlobStorageConfiguration blobConfiguration,
        final JobBuilderFactory jobBuilderFactory, final StepBuilderFactory stepBuilderFactory,
        final ArchiveImportSettings settings) {
    this.jobBuilderFactory = jobBuilderFactory;
    this.stepBuilderFactory = stepBuilderFactory;
    this.jobLauncher = new SimpleJobLauncher();
    final ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
    threadPoolTaskExecutor.setCorePoolSize(THREAD_POOL_SIZE);
    threadPoolTaskExecutor.setMaxPoolSize(THREAD_POOL_SIZE);
    threadPoolTaskExecutor.setQueueCapacity(THREAD_POOL_QUEUE);
    threadPoolTaskExecutor.initialize();
    this.jobLauncher.setTaskExecutor(threadPoolTaskExecutor);
    this.jobLauncher.setJobRepository(jobRepository);
}

I create one job as follows:

private Job createImportJob(final ArchiveResource archiveResource, final int current, final int archiveSize) {

    final String name = "ImportArchiveJob[" + current + "|" + archiveSize + "]"
            + new Date(System.currentTimeMillis());
    final Step step = this.stepBuilderFactory
            .get(name)
            .<ArchiveResource, ArchiveImportSaveData> chunk(1)
            .reader(getReader(archiveResource, current, archiveSize))
            .processor(getProcessor(current, archiveSize))
            .writer(getWriter(current, archiveSize))
            .build();

    return this.jobBuilderFactory
            .get(name)
            .flow(step)
            .end()
            .build();

}

And start all jobs in a loop:

private void startImportJobs(final List<ArchiveResource> archives) {
    final int size = archives.size();
    for (int i = 0; i < size; i++) {
        final ArchiveResource ar = archives.get(i);
        final Job j = createImportJob(ar, i, size);
        try {

            this.jobLauncher.run(j, new JobParametersBuilder()
                    .addDate("startDate", new Date(System.currentTimeMillis()))
                    .addString("progress", "[" + i + "|" + size + "]")
                    .toJobParameters());
        } catch (final JobExecutionAlreadyRunningException e) {
            log.info("Already running", e);
        } catch (final JobRestartException e) {
            log.info("Restarted", e);
        } catch (final JobInstanceAlreadyCompleteException e) {
            log.info("ALready completed", e);
        } catch (final JobParametersInvalidException e) {
            log.info("Parameters invalid", e);
        }
    }
}

Do I have to release the memory somehow or delete the jobs after they finished or something? I do not understand why memory consumption stays that high.

Best regards

1

There are 1 answers

0
Jonathan On

Taking that info from htop and deriving anything from it is not such a great idea. This is because of the Java memory management.

Java allocates memory from the OS and manages that memory internally. This is all connected to terms like garbage-collection and generational memory model.

Basically, if you free memory by removing references to these objects within our application, the memory is not set free at once. Only if the memory already allocated by Java is full, a garbage-collection cycle is triggered. That cycle will not (necessarily) release the memory against the OS. It will in a first step make that memory available to your Java program while still holding on to it with respect to the OS.

If a heuristic within the Java VM determines, that you have too much memory allocated, it will release memory towards the OS, but this is something you should not depend on.

That's why you still see 20G reserved by the Java process. And without a closer look inside the application, you will not even know whether that memory is internally freed or filled up with dead objects.

If you want a better understanding of the memory-footprint of your application, I would suggest you do the following: Tools like JConsole or JVisualVM (here you will need the Visual GC Plugin) allow you to inspect the internals of the memory allocated by the Java VM. Within that memory go strictly for a memory area called old or tenured, everything else is not relevant with respect to your question (search the term generational memory management if you are curious). If you want to trigger a garbage collection to remove those objects, which are already dead (but are not yet cleaned up), either explicitly invoke System.gc() in your application or trigger it via JConsole or JVisualVM (both have a button to do so). The memory consumption directly after a garbage collection is the number you are currently looking for.