Gson: How to skip rows in JSON array while parsing using stream api

1.2k views Asked by At

Am trying to parse a huge JSON array using Gson stream where for every run, I just have to process 10 objects at a time.

So that on first run, it process 10. In second run, it starts from 11th. In third, it starts from 21st and so on... You get the drill.

JSON array is in the format:

[
  { "key1": "value1"},
  { "key2": "value2"},
  { "key3": "value3"},
  { "key4": "value4"},
  ..........
  .........
  ..........
  { "key10": "value10"},
  ..........
  .........
  ..........
  { "key20": "value20"},
  ..........
  .........
  ..........
 ]

Am trying below code but seems like it doesn't work properly and am parsing always from the start only. This is what am doing:

public static void readJsonStream(int skipRows) {
    JsonReader reader = null;
    String FILENAME = "/examples/uh_data.json";
    final InputStream stream = UHReportParser.class.getClass().getResourceAsStream(FILENAME);
    try {
        reader = new JsonReader(new InputStreamReader(stream, "UTF-8"));
        Gson gson = new GsonBuilder().create();

        // Read file in stream mode
        reader.beginArray();
        int count = 1;
        while (reader.hasNext()) {



            if (count++<=skipRows){
                continue;
            } else if(count>skipRows+10){
                break;
            }

            else{

                UserData data = null;

                // Read data into object model
                data = gson.fromJson(reader, UserData.class);  //starts from one again
                String description = data.getDescription();

                }

        }
    } catch (UnsupportedEncodingException ex) {
        ex.printStackTrace();
    } catch (IOException ex) {
        if (reader != null) {
            try {
                reader.close();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
    }
}

What should be modified here? How can I achieve the desired results?

1

There are 1 answers

0
Lyubomyr Shaydariv On

I didn't analyze your algorithm in depth, but it doesn't seem to skip values at the "skip" phase and I would definitely refactor your JSON stream reader in order to make it as clean as possible (at least for what I can do). This would allow you to reuse such a method as much as possible too. Consider the following methods:

static void readArrayBySkipAndLimitFromBegin(final JsonReader jsonReader, final int skip, final int limit,
        final Consumer<? super JsonReader> callback)
        throws IOException {
    readArrayBySkipAndLimit(jsonReader, skip, limit, true, false, callback);
}

static void readArrayBySkipAndLimit(final JsonReader jsonReader, final int skip, final int limit, final boolean processBegin,
        final boolean processEnd, final Consumer<? super JsonReader> callback)
        throws IOException {
    // the JSON stream can be already processed somehow
    if ( processBegin ) {
        jsonReader.beginArray();
    }
    // just skip the `skip`
    for ( int i = 0; i < skip && jsonReader.hasNext(); i++ ) {
        jsonReader.skipValue();
    }
    // and limit to the `limit` just passing the JsonReader instance to its consumer elsewhere
    for ( int i = 0; i < limit && jsonReader.hasNext(); i++ ) {
        callback.accept(jsonReader);
    }
    // in case you need it ever...
    if ( processEnd ) {
        while ( jsonReader.hasNext() ) {
            jsonReader.skipValue();
        }
        jsonReader.endArray();
    }
}

Here is a JSON document I was using to test it (32 array elements in total):

[
    {"key1": "value1"},
    {"key2": "value2"},
    ...
    {"key31": "value31"},
    {"key32": "value32"}
]

Now, test it:

private static final Gson gson = new Gson();
private static final Type mapOfStringToStringType = new TypeToken<Map<String, String>>() {}.getType();

public static void main(final String... args)
        throws IOException {
    // read up to 2B+ entries, every 10 rows
    for ( int i = 0; i >= 0; i += 10 ) {
        System.out.print("Step #" + i / 10 + ": ");
        final Collection<Map<String, String>> maps = new ArrayList<>();
        // consume and close
        try ( final JsonReader jsonReader = Resources.getPackageResourceJsonReader(Q50737654.class, "array.json") ) {
            // consume the JSON reader, parse each array page element and add it to the result collection
            readArrayBySkipAndLimitFromBegin(jsonReader, i, 10, jr -> maps.add(gson.fromJson(jr, mapOfStringToStringType)));
        }
        System.out.println(maps);
        if ( maps.isEmpty() ) {
            break;
        }
    }
    System.out.println("Done");
}

Example output:

Step #0: [{key1=value1}, {key2=value2}, {key3=value3}, {key4=value4}, {key5=value5}, {key6=value6}, {key7=value7}, {key8=value8}, {key9=value9}, {key10=value10}]
Step #1: [{key11=value11}, {key12=value12}, {key13=value13}, {key14=value14}, {key15=value15}, {key16=value16}, {key17=value17}, {key18=value18}, {key19=value19}, {key20=value20}]
Step #2: [{key21=value21}, {key22=value22}, {key23=value23}, {key24=value24}, {key25=value25}, {key26=value26}, {key27=value27}, {key28=value28}, {key29=value29}, {key30=value30}]
Step #3: [{key31=value31}, {key32=value32}]
Step #4: []
Done

As you can see, it's really easy.