Extract fields from JSON

4.1k views Asked by At

I have a JSON object of the form:

{"apps":{"app":[{"id":"application_1481567788061_0002","user":"root","name":"wordcount.py","queue":"default","state":"FAILED","finalStatus":"FAILED","progress":0.0,"trackingUI":"History", "diagnostics":"Application application_1481567788061_0002 failed 2 times due to AM Container for appattempt_1481567788061_0002_000002 exited with  exitCode: 255\nFor more detailed output, check application tracking page:http://sandbox:8088/proxy/application_1481567788061_0002/Then, click on links to logs of each attempt.\nDiagnostics: Exception from container-launch.\nContainer id: container_1481567788061_0002_02_000001\nExit code: 255\nStack trace: ExitCodeException exitCode=255: \n\tat org.apache.hadoop.util.Shell.runCommand(Shell.java:538)\n\tat org.apache.hadoop.util.Shell.run(Shell.java:455)\n\tat org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)\n\tat org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:262)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat java.lang.Thread.run(Thread.java:744)\n\n\nContainer exited with a non-zero exit code 255\nFailing this attempt. Failing the application.","clusterId":1481567788061,"applicationType":"SPARK","applicationTags":"","startedTime":1481568051052,"finishedTime":1481568079289,"elapsedTime":28237,"amHostHttpAddress":"sandbox:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":55598,"vcoreSeconds":27,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0},{"id":"application_1481567788061_0001","user":"root","name":"pi.py","queue":"default","state":"FINISHED","finalStatus":"SUCCEEDED","progress":100.0,"trackingUI":"History","diagnostics":"","clusterId":1481567788061,"applicationType":"SPARK","applicationTags":"","startedTime":1481567853324,"finishedTime":1481567888648,"elapsedTime":35324,"amContainerLogs":"http://sandbox:8042/node/containerlogs/container_1481567788061_0001_01_000001/root","amHostHttpAddress":"sandbox:8042","allocatedMB":-1,"allocatedVCores":-1,"runningContainers":-1,"memorySeconds":138031,"vcoreSeconds":66,"preemptedResourceMB":0,"preemptedResourceVCores":0,"numNonAMContainerPreempted":0,"numAMContainerPreempted":0}]}}

I would like to extract from it a List[Application], where application is:

case class Application(id: String, user: String, name: String)

I imported spray-json. If message is a string containing the JSON component, I want to do something like:

  val json: JsValue = message.parseJson
  val jobsJson = json.first.first
  val jobs = jobsJson.map(job => Application(job(0), job(1), job(2)))

But this is not correct because I can't use json.first.

So how can I extract fields nested in the JSON object? Is there another library that makes things easier?

1

There are 1 answers

3
Buzz On BEST ANSWER

Note: This answer is about play-json and not spray-json library.

You should be able to get data out of the json object with \ or \\ A single slash will look in the next lever down for what ever you are looking for while a double slash will look through the whole object. Say you had the following json stored in a variable called obj:

{"foo":"bar","num":3, "value":{"num":4}}

using obj\num you would just get 3. But with obj\\num you would get in iterator with both 3 and 4 in it.

try this link for a little more information.