How to get all records from multiple API calls?

942 views Asked by At

I'm currently writing a Java program to obtain some lists from the LTA DataMall API. The problem is that each call to the LTA DataMall API returns a maximum of 50 records, and I want to get all records (which could only be done 50 at a time) and write it to a CSV file instead of just the first 50 records only.

For this example, I'm going to use the BusStops API.

A typical API call from LTA DataMall

According to this userguide, API responses typically return a maximum of 50 records per call.

Say that I'm calling for the BusStops API, of which the GET request will be something like http://datamall2.mytransport.sg/ltaodataservice/BusStops. The resulting response will look something like this:

{
    "odata.metadata": "http://datamall2.mytransport.sg/ltaodataservice/$metadata#BusStops",
    "value": [
        {
            "BusStopCode": "01012",
            "RoadName": "Victoria St",
            "Description": "Hotel Grand Pacific",
            "Latitude": 1.29684825487647,
            "Longitude": 103.85253591654006
        },
        {
            "BusStopCode": "01013",
            "RoadName": "Victoria St",
            "Description": "St. Joseph's Ch",
            "Latitude": 1.29770970610083,
            "Longitude": 103.8532247463225
        },
        {
            "BusStopCode": "01019",
            "RoadName": "Victoria St",
            "Description": "Bras Basah Cplx",
            "Latitude": 1.29698951191332,
            "Longitude": 103.85302201172507
        },
        {
            "BusStopCode": "01029",
            "RoadName": "Nth Bridge Rd",
            "Description": "Cosmic Insurance Bldg",
            "Latitude": 1.2966729849642,
            "Longitude": 103.85441422464267
        }
        // and so on, up till ...
        {
            "BusStopCode": "02099",
            "RoadName": "Raffles Blvd",
            "Description": "Marina Ctr Ter",
            "Latitude": 1.29101661693418,
            "Longitude": 103.86255772172497
        },
        {
            "BusStopCode": "02101",
            "RoadName": "Raffles Ave",
            "Description": "Bef Temasek Ave",
            "Latitude": 1.28939197625331,
            "Longitude": 103.8618029276249
        }
    ]
}

Keep in mind that each request returns 50 records, so to get another 50 records, I have to append $skip to the URL, so to retrieve the 51st to 100th record, the URL will be like this: http://datamall2.mytransport.sg/ltaodataservice/BusStops?$skip=49.

FYI, there are 5296 BusStop records in total.

The Problem

With that in mind, I'm trying to write the above response to a CSV file. The method below is created for this purpose:

public static void writeBusStops() {
        String file = "./csv/bus_stops.csv";
        // mainQueryItems is to get the JsonObjects. They will also be written to the CSV file as headers.
        String[] mainQueryItems = new String[]{"BusStopCode", "RoadName",
            "Description", "Latitude", "Longitude"};
        Iterator<JsonElement> jIter;
        int count = 0;

        try (CSVWriter writer = new CSVWriter(new FileWriter(file))) {
            writer.writeNext(mainQueryItems); // write the header

            // ApiLTA.getBusStops(0) just gets the raw JSON from LTA's DataMall API from the 1st to 50th record.
            // To get the 51st to 100th record, change '0' to '49', i.e. ApiLTA.getBusStops(49).
            // "value" is the JsonArray I want to get, all others like "odata.metadata" are redundant.
            jIter = jsonIterator(ApiLTA.getBusStops(0),"value");

            while (jIter.hasNext()) { // each bus stop
                JsonObject jObject = jIter.next().getAsJsonObject();

                // Get the individual items
                String[] svcItems = new String[mainQueryItems.length];
                for (int i = 0; i < mainQueryItems.length; i++) {
                    svcItems[i] = getStringItem(jObject, mainQueryItems[i]);
                }

                // Write the items to the CSV file
                writer.writeNext(svcItems);

                count++;

                if ((count + 1) % 50 == 0) {
                    System.out.println("new " + count); // for debug
                    jIter = jsonIterator(ApiLTA.getBusStops(count), "value");
                }
            }
            System.out.println("done"); // for debug
        } catch (IOException ex) {
            Logger.getLogger(ApiLTADeserialiser.class.getName()).log(Level.SEVERE, null, ex);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

EDIT: For those interested in the jsonIterator method...

private static Iterator<JsonElement> jsonIterator(String jsonToParse, String k)
        throws JsonSyntaxException, IllegalStateException {
    // Parse the raw JSON
    JsonParser parser = new JsonParser();
    JsonElement raw = parser.parse(jsonToParse); // The jsonToParse is simply, the URL of the GET request

    // Get the array we need directly and pass it to an iterator
    JsonArray jArray = raw.getAsJsonObject().getAsJsonArray(k);

    return jArray.iterator();
}

I am using gson-2.7 and opencsv.3.8 as my external packages.

The intended output in the CSV file was to be:

"BusStopCode","RoadName","Description","Latitude","Longitude"
"01012","Victoria St","Hotel Grand Pacific","1.29684825487647","103.85253591654006"
"01013","Victoria St","St. Joseph's Ch","1.29770970610083","103.8532247463225"
"01019","Victoria St","Bras Basah Cplx","1.29698951191332","103.85302201172507"
"01029","Nth Bridge Rd","Cosmic Insurance Bldg","1.2966729849642","103.85441422464267"
// and so on
"99009","Changi Village Rd","Changi Village Ter","1.38969812175274","103.98762553601895"
"99011","Loyang Ave","Bef Sch Of Commando","1.38328110134472","103.97812168830427"
// and so on

However, the output became:

"BusStopCode","RoadName","Description","Latitude","Longitude"
"01012","Victoria St","Hotel Grand Pacific","1.29684825487647","103.85253591654006"
"01013","Victoria St","St. Joseph's Ch","1.29770970610083","103.8532247463225"
"01019","Victoria St","Bras Basah Cplx","1.29698951191332","103.85302201172507"
"01029","Nth Bridge Rd","Cosmic Insurance Bldg","1.2966729849642","103.85441422464267"
// and so on
"02089","Raffles Blvd","Pan Pacific Hotel","1.29152694444975","103.8592061110504"
"02099","Raffles Blvd","Marina Ctr Ter","1.29101661693418","103.86255772172497"
"01012","Victoria St","Hotel Grand Pacific","1.29684825487647","103.85253591654006"
"01013","Victoria St","St. Joseph's Ch","1.29770970610083","103.8532247463225"
// and so on

Which is basically, repeating the first 50 records itself.

What do you all feel would be a better way of making multiple API calls to populate all the records instead of repeating the first 50 records itself? How then can I accommodate this method to get less than 50 records?

1

There are 1 answers

0
Lyubomyr Shaydariv On

I don't have access to that API and didn't take a look into how to obtain the API key, but I made a simple Spring-based mock which, I hope, works exactly the way your API works.

First off, $skip=49 looks a bit weird to me. Having no experience in the API, I believe it should be $skip=50.

Second, you might want to redesign your reading and writing routines separating their responsibilities. What if you redesign your data producing and consuming methods to make them a bit more flexible?

produce()

The following method creates a special iterable that generates an iterator over any objects and checks the out-of-bounds state itself encapsulating the iteration state. It accepts three parameters: slice size (50 in your case), a mapper that accepts the "skipped" state and must produce a new value for the upcoming iterator next(), and a predicate that checks whether the producing iterator can be considered finished.

private static <T> Iterable<T> produce(final int slice, final IntFunction<T> mapper, final Predicate<? super T> predicate) {
    return () -> new Iterator<T>() {
        private boolean hasMore = true;
        private int i;

        @Override
        public boolean hasNext() {
            return hasMore;
        }

        @Override
        public T next()
                throws NoSuchElementException {
            if ( !hasMore ) {
                throw new NoSuchElementException();
            }
            final T next = mapper.apply(i * slice);
            hasMore = predicate.test(next);
            i++;
            return next;
        }
    };
}

getValuesFrom() and hasNotEmptyValues()

These two methods are really simple and just allow to examine the incoming JSON objects.

private static JsonArray getValuesFrom(final JsonElement root) {
    final JsonObject rootObject = root.getAsJsonObject();
    return rootObject.get("values").getAsJsonArray();
}

private static boolean hasNotEmptyValues(final JsonElement root) {
    return getValuesFrom(root).size() != 0;
}

consumeJsonValues()

This is a special, not really generic, method that accepts the produced/generated content and delegates its product to the consumer upon some special rules:

  • The given iterable is converted to a stream in order to use Java 8 Stream API facilities. (Using Google Guava alternatives is fine as well and can be used in non-Java 8 environments).
  • The stream is filtered against empty values since the produce() method returns empty value as the "finished" marker.
  • The filtered out stuff now is flatMapped to flatten the values stored in the root object of responses.
  • Then it's mapped to JsonObject
  • And finally forEach JsonObject is delegated to the given consumer passed as the second parameter
private static void consumeJsonValues(final Iterable<? extends JsonElement> jsonElements, final Consumer<? super JsonObject> consumer) {
    stream(jsonElements.spliterator(), false)
            .filter(LtaMockControllerClient::hasNotEmptyValues)
            .flatMap(root -> stream(getValuesFrom(root).spliterator(), false))
            .map(JsonElement::getAsJsonObject)
            .forEach(consumer);
}

readBusStops()

private static JsonElement readBusStops(final Gson gson, final int skip) {
    try ( final InputStream inputStream = new URL("http://127.0.0.1:9000/ltaodataservice/BusStops?$skip=" + skip).openStream();
          final Reader reader = new InputStreamReader(inputStream) ) {
        return gson.fromJson(reader, JsonElement.class);
    } catch ( final IOException ex ) {
        throw new RuntimeException(ex);
    }
}

How does it look like after all?

As you might notice earlier, all the stuff is located in LtaMockControllerClient. Consider the following main() method:

private static final Gson gson = new Gson();

public static void main(final String... args)
        throws IOException {
    try ( final CSVWriter writer = new CSVWriter(new PrintWriter(new OutputStreamWriter(out))) ) {
        writer.writeNext(new String[]{ "BusStopCode", "RoadName", "Description", "Latitude", "Longitude" });
        consumeJsonValues(
                produce(50, skip -> readBusStops(gson, skip), LtaMockControllerClient::hasNotEmptyValues),
                v -> writer.writeNext(new String[]{
                        v.getAsJsonPrimitive("BusStopCode").getAsString(),
                        v.getAsJsonPrimitive("RoadName").getAsString(),
                        v.getAsJsonPrimitive("Description").getAsString(),
                        v.getAsJsonPrimitive("Latitude").getAsString(),
                        v.getAsJsonPrimitive("Longitude").getAsString()
                })
        );
    }
}

The code above is pretty easy, literally:

  • Open a CSV writer (for example, to System.out).
  • Write the header line.
  • Make a producer/consumer pipeline to read from the remote service (see what the produce() method accepts) and delegating the parsed result to a consumer that just writes the parsed properties to the result CSV file.

In the code above there is localhost:9000, as this is just a local mock server to behave like the LTA service. It's just a regular REST controller written in Spring MVC:

@RestController
@RequestMapping("/ltaodataservice")
public class LtaMockController {

    private static final Gson gson = new Gson();

    private static final List<BusStopDto> busStops = generateBusStops(5296);

    @RequestMapping(method = GET, value = "/BusStops", produces = "application/json")
    public String get(
            @RequestParam(value = "$skip", defaultValue = "0") final int from
    ) {
        final int size = busStops.size();
        final List<BusStopDto> values = from <= size
                ? busStops.subList(from, min(from + 50, size))
                : emptyList();
        return gson.toJson(new ResponseDto("http://datamall2.mytransport.sg/ltaodataservice/$metadata#BusStops", values));
    }

    private static List<BusStopDto> generateBusStops(final int count) {
        final List<BusStopDto> busStops = new ArrayList<>();
        for ( int i = 1; i <= count; i++ ) {
            busStops.add(new BusStopDto("code:" + i, "road:" + i, "description:" + i, i, i));
        }
        return unmodifiableList(busStops);
    }

    private static final class ResponseDto {

        @SerializedName("odata.metadata")
        @SuppressWarnings("unused")
        private final String odataMetadata;

        @SerializedName("values")
        @SuppressWarnings("unused")
        private final List<BusStopDto> values;

        private ResponseDto(final String odataMetadata, final List<BusStopDto> values) {
            this.odataMetadata = odataMetadata;
            this.values = values;
        }

    }

    private static final class BusStopDto {

        @SerializedName("BusStopCode")
        @SuppressWarnings("unused")
        private final String busStopCode;

        @SerializedName("RoadName")
        @SuppressWarnings("unused")
        private final String roadName;

        @SerializedName("Description")
        @SuppressWarnings("unused")
        private final String description;

        @SerializedName("Latitude")
        @SuppressWarnings("unused")
        private final double latitude;

        @SerializedName("Longitude")
        @SuppressWarnings("unused")
        private final double longitude;

        private BusStopDto(final String busStopCode, final String roadName, final String description, final double latitude, final double longitude) {
            this.busStopCode = busStopCode;
            this.roadName = roadName;
            this.description = description;
            this.latitude = latitude;
            this.longitude = longitude;
        }

    }

}

Both client and server components produce the following output (5296 fake bus stops and 5297 lines in total):

"BusStopCode","RoadName","Description","Latitude","Longitude"
"code:1","road:1","description:1","1.0","1.0"
"code:2","road:2","description:2","2.0","2.0"
"code:3","road:3","description:3","3.0","3.0"
"code:4","road:4","description:4","4.0","4.0"
"code:5","road:5","description:5","5.0","5.0"
"code:6","road:6","description:6","6.0","6.0"
"code:7","road:7","description:7","7.0","7.0"
...
"code:5291","road:5291","description:5291","5291.0","5291.0"
"code:5292","road:5292","description:5292","5292.0","5292.0"
"code:5293","road:5293","description:5293","5293.0","5293.0"
"code:5294","road:5294","description:5294","5294.0","5294.0"
"code:5295","road:5295","description:5295","5295.0","5295.0"
"code:5296","road:5296","description:5296","5296.0","5296.0"