what is curl doing differently to httr::POST that causes 400 bad request?

651 views Asked by At

I'm trying to query data from the Materials Project web API in R.

The documentation provides an example query which is conducted using both curl and python. I've copied the curl command below.

curl -s --header "X-API-KEY: <YOUR-API-KEY>" \
    https://materialsproject.org/rest/v2/query \
    -F criteria='{"elements": {"$in": ["Li", "Na", "K"], "$all": ["O"]}, "nelements": 2}' \
    -F properties='["formula", "formation_energy_per_atom"]'

From reading the httr quickstart guide, it seems to me I should be able to reproduce this query with:

library(httr)
POST(url = "https://www.materialsproject.org/rest/v2/query",
     config = add_headers("X-API-KEY" = "<YOUR-API-KEY>",     
     body = list(criteria = "{'elements': {'$in': ['Li', 'Na', 'K'], '$all': ['O']}, 'nelements': 2}",
                 properties = "['formula', 'formation_energy_per_atom']"),
     encode = "multipart",
     verbose())

But while the curl command returns JSON data from the Materials Project database, my R query returns a HTTP/1.1 400 BAD REQUEST. What is curl doing differently than httr in the codes above?

I've tried putting -v on curl and comparing it to the (verbose()) output above, but curl don't show what it's putting in the multipart form.

> Expect: 100-continue
> Content-Type: multipart/form-data; boundary=------------------------d2ef2f3982185118
> 
< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Date: Tue, 27 Dec 2016 21:18:58 GMT
< Server: Apache/2.2.15 (CentOS)
< Vary: Accept-Encoding,User-Agent
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: application/json

Meanwhile httr shows:

-> Content-Type: multipart/form-data; boundary=----------------------------5b4873dbc9cd
-> 
<- HTTP/1.1 100 Continue
>> ------------------------------5b4873dbc9cd
>> Content-Disposition: form-data; name="criteria"
>> 
>> {'elements': {'$in': ['Li', 'Na', 'K'], '$all': ['O']}, 'nelements': 2}
>> ------------------------------5b4873dbc9cd
>> Content-Disposition: form-data; name="properties"
>> 
>> ['formula', 'formation_energy_per_atom']
>> ------------------------------5b4873dbc9cd--
1

There are 1 answers

0
hrbrmstr On BEST ANSWER

It's truly a terrible, poorly thought out & lazily implemented API. They seem to like Python so it's unsurprising this would be the case.

The following works:

library(httr)
library(jsonlite)

list(
  criteria=toJSON(list(
    elements=list(
      `$in`=c("Li", "Na", "K"),
      `$all`=c("0")
    ),
    nelements=unbox(2)
  )),
  properties=toJSON(c("formula", "formation_energy_per_atom"))
) -> params

POST(url="https://www.materialsproject.org/rest/v2/query",
     add_headers(`X-API-KEY`=Sys.getenv("MATERIALS_PROJECT_API_KEY")),
     body=params,
     encode="multipart", verbose()) -> res

and here's the verbose() output to prove it:

-> POST /rest/v2/query HTTP/1.1
-> Host: www.materialsproject.org
-> User-Agent: libcurl/7.51.0 r-curl/2.3 httr/1.2.1
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> X-API-KEY: wouldntyouliketoknow
-> Content-Length: 344
-> Expect: 100-continue
-> Content-Type: multipart/form-data; boundary=------------------------34f08173ce0a7818
-> 
<- HTTP/1.1 100 Continue
>> --------------------------34f08173ce0a7818
>> Content-Disposition: form-data; name="criteria"
>> 
>> {"elements":{"$in":["Li","Na","K"],"$all":["0"]},"nelements":2}
>> --------------------------34f08173ce0a7818
>> Content-Disposition: form-data; name="properties"
>> 
>> ["formula","formation_energy_per_atom"]
>> --------------------------34f08173ce0a7818--

<- HTTP/1.1 200 OK
<- Date: Wed, 28 Dec 2016 02:08:08 GMT
<- Server: Apache/2.2.15 (CentOS)
<- Vary: Accept-Encoding,User-Agent
<- Content-Encoding: gzip
<- Content-Length: 258
<- Connection: close
<- Content-Type: application/json
<- 

It's super picky about the query string structure. They really should have just accepted a JSON body and have been done with it. But half-REDACTED is the way of python folk.

Oh gosh, I just noticed it's a CentOS server supplying the replies. Yep. Those folks really do like pain.