I'm currently writing a Java program to obtain some lists from the LTA DataMall API. The problem is that each call to the LTA DataMall API returns a maximum of 50 records, and I want to get all records (which could only be done 50 at a time) and write it to a CSV file instead of just the first 50 records only.
For this example, I'm going to use the BusStops
API.
A typical API call from LTA DataMall
According to this userguide, API responses typically return a maximum of 50 records per call.
Say that I'm calling for the BusStops
API, of which the GET request will be something like http://datamall2.mytransport.sg/ltaodataservice/BusStops
. The resulting response will look something like this:
{
"odata.metadata": "http://datamall2.mytransport.sg/ltaodataservice/$metadata#BusStops",
"value": [
{
"BusStopCode": "01012",
"RoadName": "Victoria St",
"Description": "Hotel Grand Pacific",
"Latitude": 1.29684825487647,
"Longitude": 103.85253591654006
},
{
"BusStopCode": "01013",
"RoadName": "Victoria St",
"Description": "St. Joseph's Ch",
"Latitude": 1.29770970610083,
"Longitude": 103.8532247463225
},
{
"BusStopCode": "01019",
"RoadName": "Victoria St",
"Description": "Bras Basah Cplx",
"Latitude": 1.29698951191332,
"Longitude": 103.85302201172507
},
{
"BusStopCode": "01029",
"RoadName": "Nth Bridge Rd",
"Description": "Cosmic Insurance Bldg",
"Latitude": 1.2966729849642,
"Longitude": 103.85441422464267
}
// and so on, up till ...
{
"BusStopCode": "02099",
"RoadName": "Raffles Blvd",
"Description": "Marina Ctr Ter",
"Latitude": 1.29101661693418,
"Longitude": 103.86255772172497
},
{
"BusStopCode": "02101",
"RoadName": "Raffles Ave",
"Description": "Bef Temasek Ave",
"Latitude": 1.28939197625331,
"Longitude": 103.8618029276249
}
]
}
Keep in mind that each request returns 50 records, so to get another 50 records, I have to append $skip
to the URL, so to retrieve the 51st to 100th record, the URL will be like this: http://datamall2.mytransport.sg/ltaodataservice/BusStops?$skip=49
.
FYI, there are 5296 BusStop
records in total.
The Problem
With that in mind, I'm trying to write the above response to a CSV file. The method below is created for this purpose:
public static void writeBusStops() {
String file = "./csv/bus_stops.csv";
// mainQueryItems is to get the JsonObjects. They will also be written to the CSV file as headers.
String[] mainQueryItems = new String[]{"BusStopCode", "RoadName",
"Description", "Latitude", "Longitude"};
Iterator<JsonElement> jIter;
int count = 0;
try (CSVWriter writer = new CSVWriter(new FileWriter(file))) {
writer.writeNext(mainQueryItems); // write the header
// ApiLTA.getBusStops(0) just gets the raw JSON from LTA's DataMall API from the 1st to 50th record.
// To get the 51st to 100th record, change '0' to '49', i.e. ApiLTA.getBusStops(49).
// "value" is the JsonArray I want to get, all others like "odata.metadata" are redundant.
jIter = jsonIterator(ApiLTA.getBusStops(0),"value");
while (jIter.hasNext()) { // each bus stop
JsonObject jObject = jIter.next().getAsJsonObject();
// Get the individual items
String[] svcItems = new String[mainQueryItems.length];
for (int i = 0; i < mainQueryItems.length; i++) {
svcItems[i] = getStringItem(jObject, mainQueryItems[i]);
}
// Write the items to the CSV file
writer.writeNext(svcItems);
count++;
if ((count + 1) % 50 == 0) {
System.out.println("new " + count); // for debug
jIter = jsonIterator(ApiLTA.getBusStops(count), "value");
}
}
System.out.println("done"); // for debug
} catch (IOException ex) {
Logger.getLogger(ApiLTADeserialiser.class.getName()).log(Level.SEVERE, null, ex);
} catch (Exception e) {
e.printStackTrace();
}
}
EDIT: For those interested in the jsonIterator
method...
private static Iterator<JsonElement> jsonIterator(String jsonToParse, String k)
throws JsonSyntaxException, IllegalStateException {
// Parse the raw JSON
JsonParser parser = new JsonParser();
JsonElement raw = parser.parse(jsonToParse); // The jsonToParse is simply, the URL of the GET request
// Get the array we need directly and pass it to an iterator
JsonArray jArray = raw.getAsJsonObject().getAsJsonArray(k);
return jArray.iterator();
}
I am using gson-2.7
and opencsv.3.8
as my external packages.
The intended output in the CSV file was to be:
"BusStopCode","RoadName","Description","Latitude","Longitude"
"01012","Victoria St","Hotel Grand Pacific","1.29684825487647","103.85253591654006"
"01013","Victoria St","St. Joseph's Ch","1.29770970610083","103.8532247463225"
"01019","Victoria St","Bras Basah Cplx","1.29698951191332","103.85302201172507"
"01029","Nth Bridge Rd","Cosmic Insurance Bldg","1.2966729849642","103.85441422464267"
// and so on
"99009","Changi Village Rd","Changi Village Ter","1.38969812175274","103.98762553601895"
"99011","Loyang Ave","Bef Sch Of Commando","1.38328110134472","103.97812168830427"
// and so on
However, the output became:
"BusStopCode","RoadName","Description","Latitude","Longitude"
"01012","Victoria St","Hotel Grand Pacific","1.29684825487647","103.85253591654006"
"01013","Victoria St","St. Joseph's Ch","1.29770970610083","103.8532247463225"
"01019","Victoria St","Bras Basah Cplx","1.29698951191332","103.85302201172507"
"01029","Nth Bridge Rd","Cosmic Insurance Bldg","1.2966729849642","103.85441422464267"
// and so on
"02089","Raffles Blvd","Pan Pacific Hotel","1.29152694444975","103.8592061110504"
"02099","Raffles Blvd","Marina Ctr Ter","1.29101661693418","103.86255772172497"
"01012","Victoria St","Hotel Grand Pacific","1.29684825487647","103.85253591654006"
"01013","Victoria St","St. Joseph's Ch","1.29770970610083","103.8532247463225"
// and so on
Which is basically, repeating the first 50 records itself.
What do you all feel would be a better way of making multiple API calls to populate all the records instead of repeating the first 50 records itself? How then can I accommodate this method to get less than 50 records?
I don't have access to that API and didn't take a look into how to obtain the API key, but I made a simple Spring-based mock which, I hope, works exactly the way your API works.
First off,
$skip=49
looks a bit weird to me. Having no experience in the API, I believe it should be$skip=50
.Second, you might want to redesign your reading and writing routines separating their responsibilities. What if you redesign your data producing and consuming methods to make them a bit more flexible?
produce()
The following method creates a special iterable that generates an iterator over any objects and checks the out-of-bounds state itself encapsulating the iteration state. It accepts three parameters: slice size (50 in your case), a mapper that accepts the "skipped" state and must produce a new value for the upcoming iterator
next()
, and a predicate that checks whether the producing iterator can be considered finished.getValuesFrom()
andhasNotEmptyValues()
These two methods are really simple and just allow to examine the incoming JSON objects.
consumeJsonValues()
This is a special, not really generic, method that accepts the produced/generated content and delegates its product to the consumer upon some special rules:
filter
ed against empty values since theproduce()
method returns empty value as the "finished" marker.flatMap
ped to flatten the values stored in the root object of responses.map
ped toJsonObject
forEach
JsonObject is delegated to the given consumer passed as the second parameterreadBusStops()
How does it look like after all?
As you might notice earlier, all the stuff is located in
LtaMockControllerClient
. Consider the followingmain()
method:The code above is pretty easy, literally:
System.out
).produce()
method accepts) and delegating the parsed result to a consumer that just writes the parsed properties to the result CSV file.In the code above there is
localhost:9000
, as this is just a local mock server to behave like the LTA service. It's just a regular REST controller written in Spring MVC:Both client and server components produce the following output (5296 fake bus stops and 5297 lines in total):