Serializable or Exernalizable with huge datasets, which one is faster and more practicable?

114 views Asked by At

Alright, before i come to my question i want to point out first that i know the difference between Serializable and Exernalizable so you do not need to give an explanation!

What i am basically trying to do is saving a class with all its data in a file. We already have come to the time where Java 9 is out and the JVM is very fast but there are still people (in whose opinions i belive) that using Serializable on a huge amount of data is very inefficient compared to using Exernalizable.

If i would have only like 10 fields which represent ordinary data types like integers or booleans i would definitely use Serializable.

But now i got a little bit more data to store and load, e.g. a 3-Dimensional byte array which contains around 3.3 Million fields and i think it would be very inefficient to save data like this via the reflection-way implemented by the Serializable class. But since i am not 100% sure about the Exernalizable way being more efficient in storing such huge amount of data i would like to ensure myself first before i start using my program because it does not need to save the data fast but load it very fast (and not only one time, it needs to do some calculations first and then load it during the programm multiple times because depending on what state the programm is at it needs to load different datasets). So basically my idea is that i would load the byte-array via asynchronous multithreading in the Externalizable#readExternal() function.

Please correct me if im wrong with my opinion that using Exernalizable here is not the more efficient way because i want the programm to run as fluent as possible when it is loading the data!

King Regards,

Fabian Schmidt!

1

There are 1 answers

3
Fabian Schmidt On

Basically what i have done now was comparing the time it takes to save/load via reflection/my own implementation.

The code for the test:

Main Class (Comparision.class)

package de.cammeritz.chunksaver.util;

import java.io.File;

/**
 * Created by Fabian / Cammeritz on 20.10.2017 at 03:15.
 */

public class Comparision {

    public static void main(String args[]) {

        long start;
        long end;

        //Preparing datasets

        DataSerializable dataSerializable = createSerializable();
        DataExternalizable dataExternalizable = createExternalizable();

        //Storage files

        File sFile = new File(System.getProperty("user.dir"), "sFile.dat");
        File eFile = new File(System.getProperty("user.dir"), "eFile.dat");

        //Saving via reflection

        start = System.currentTimeMillis();

        FileUtil.save(dataSerializable, sFile);

        end = System.currentTimeMillis();

        System.out.println("Time taken to save via reflection in milliseconds: " + (end - start));

        //Saving via my own code

        start = System.currentTimeMillis();

        FileUtil.save(dataExternalizable, eFile);

        end = System.currentTimeMillis();

        System.out.println("Time taken to save via my own code in milliseconds: " + (end - start));

        //Loading via reflection

        start = System.currentTimeMillis();

        dataSerializable = (DataSerializable) FileUtil.load(sFile);

        end = System.currentTimeMillis();

        System.out.println("Time taken to load via reflection in milliseconds: " + (end - start));

        //Loading via my own code

        start = System.currentTimeMillis();

        dataExternalizable = (DataExternalizable) FileUtil.load(eFile);

        end = System.currentTimeMillis();

        System.out.println("Time taken to save via my own code in milliseconds: " + (end - start));

    }

    private static DataSerializable createSerializable() {
        DataSerializable data = new DataSerializable(7);
        for (int cx = 0; cx < data.getSideSize(); cx++) {
            for (int cz = 0; cz < data.getSideSize(); cz++) {
                for (int x = 0; x < data.getX(); x++) {
                    for (int y = 0; y < data.getY(); y++) {
                        for (int z = 0; z < data.getZ(); z++) {
                            data.setValue(cx, cz, x, y, z, (byte) 0x7f);
                        }
                    }
                }
            }
        }
        return data;
    }

    private static DataExternalizable createExternalizable() {
        DataExternalizable data = new DataExternalizable(7);
        for (int cx = 0; cx < data.getSideSize(); cx++) {
            for (int cz = 0; cz < data.getSideSize(); cz++) {
                for (int x = 0; x < data.getX(); x++) {
                    for (int y = 0; y < data.getY(); y++) {
                        for (int z = 0; z < data.getZ(); z++) {
                            data.setValue(cx, cz, x, y, z, (byte) 0x7f);
                        }
                    }
                }
            }
        }
        return data;
    }

}

Serialization via reflections:

package de.cammeritz.chunksaver.util;

import java.io.Serializable;

/**
 * Created by Fabian / Cammeritz on 20.10.2017 at 02:59.
 */

public class DataSerializable implements Serializable {

    private final int x = 16;
    private final int y = 256;
    private final int z = 16;

    private byte[][][][][] ids = null;
    private int sideSize;

    public DataSerializable(int sideSize) {
        this.sideSize = sideSize;
        ids = new byte[sideSize][sideSize][16][256][16];
    }

    public int getX() {
        return x;
    }

    public int getY() {
        return y;
    }

    public int getZ() {
        return z;
    }

    public int getSideSize() {
        return sideSize;
    }

    public byte getValue(int cx, int cz, int x, int y, int z) {
        return ids[cx][cz][x][y][z];
    }

    public void setValue(int cx, int cz, int x, int y, int z, byte value) {
        ids[cx][cz][x][y][z] = value;
        return;
    }

}

Seralization via my own implementation:

package de.cammeritz.chunksaver.util;

import java.io.Externalizable;
import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;

/**
 * Created by Fabian / Cammeritz on 20.10.2017 at 02:58.
 */

public class DataExternalizable implements Externalizable {

    private final int x = 16;
    private final int y = 256;
    private final int z = 16;

    private byte[][][][][] ids = null;
    private int sideSize;

    public DataExternalizable() {

    }

    public DataExternalizable(int sideSize) {
        this.sideSize = sideSize;
        ids = new byte[sideSize][sideSize][16][256][16];
    }

    public int getX() {
        return x;
    }

    public int getY() {
        return y;
    }

    public int getZ() {
        return z;
    }

    public int getSideSize() {
        return sideSize;
    }

    public byte getValue(int cx, int cz, int x, int y, int z) {
        return ids[cx][cz][x][y][z];
    }

    public void setValue(int cx, int cz, int x, int y, int z, byte value) {
        ids[cx][cz][x][y][z] = value;
        return;
    }

    @Override
    public void writeExternal(ObjectOutput out) throws IOException {
        out.writeObject(ids);
    }

    @Override
    public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
        ids = (byte[][][][][]) in.readObject();
    }
}

Basically i can agree with what @markspace said above ("I think the idea that Serializable is slow is fairly old. Modern JVMs like 7 and 8 implement a lot of speed-ups to help Serializable run much faster. I would start with that and only investigate further if it was in fact running slower than acceptable") and also what @EJP said ("I think @markspace is right on the money here. You don't need it to be as fast as possible, you need it to be fast enough. In the old days we had to make sort-merges fast enough so they didn't run into a second operator shift. Any faster than that there was really no payback.")

The problem of the test now is that the results are very confusing and also showing that i definitely will use Externalizable here.

Results from 3 Tests with the same values and exact sizes of datasets i will need later in my project:

Time taken to save via reflection in milliseconds: 746
Time taken to save via my own code in milliseconds: 812
Time taken to load via reflection in milliseconds: 3191
Time taken to save via my own code in milliseconds: 2811

Time taken to save via reflection in milliseconds: 755
Time taken to save via my own code in milliseconds: 934
Time taken to load via reflection in milliseconds: 3545
Time taken to save via my own code in milliseconds: 2671

Time taken to save via reflection in milliseconds: 401
Time taken to save via my own code in milliseconds: 784
Time taken to load via reflection in milliseconds: 3065
Time taken to save via my own code in milliseconds: 2627

What confuses me about this is that the reflection implementation is saving significantly faster than my own implementation but in the opposite it takes around 1 second longer to load the data.

The point now is that this 1 second is very significant for what i am planning to do since the saving does not really matter but the loading has to be done quick. So the outcome clearly shows me that i should use the Externalizable way here.

But can anyone here tell me why exactly the reflection way is saving faster and how i could improve my own implementation of saving the data?

Thanks to all!