Dspace 5.5 API response 500 when using Python 3 requests in script, returns 200 when testing in browser and Python Console

383 views Asked by At

I am trying to send a get request to DSpace 5.5 API to check if an item with a given handle is present in DSpace.

When I tested it in browser, it worked fine (return code 200, I've got the data about the searched item).

Then I began testing sending request with Python 3 requests module in Python console. Again, DSpace API returned correct response code (200) and json data in the response.

So, I implemented tested function into my script and suddenly DSpace API started to return error code 500. In the DSpace log I came accross this error message:

org.dspace.rest.RestIndex @ REST Login Success for user: [email protected]
2017-01-03 15:38:34,326 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.
2017-01-03 15:38:34,474 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.

2017-01-03 15:38:34,598 ERROR org.dspace.rest.Resource @ Something get wrong. Aborting context in finally statement.

According to DSpace documentation, the request should by like this:

GET /handle/{handle-prefix}/{handle-suffix}

It is pointing to handle API endpoint on our DSpace server, so whole request should be sent to https://dspace.cuni.cz/rest/handle/123456789/937 (I think you can test it yourself).

In the browser I get following response:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
 <item>
  <expand>metadata</expand
  <expand>parentCollection</expand>
  <expand>parentCollectionList</expand>
  <expand>parentCommunityList</expand>
  <expand>bitstreams</expand>
  <expand>all</expand>
  <handle>123456789/937</handle>
  <id>1423</id>
  <name>Komparace vývoje české a slovenské pravicové politiky od roku 1989 do současnosti</name>
  <type>item</type>
  <archived>true</archived>
  <lastModified>2016-12-20 17:52:30.641</lastModified
  <withdrawn>false</withdrawn>
 </item>

When testing in Python console, my code looked like this:

from urllib.parse import urljoin
import requests

def document_in_dspace(handle):
    url = 'https://dspace.cuni.cz/rest/handle/'
    r_url = urljoin(url, handle)
    print(r_url)
    r = requests.get(r_url)

    if r.status_code == requests.codes.ok:
        print(r.text)
        print(r.reason)
        return True
    else:
        print(r.reason)
        print(r.text)
        return False

After calling this function in Python Console with document_in_dspace('123456789/937'), response was this:

https://dspace.cuni.cz/rest/handle/123456789/937
{"id":1423,"name":"Komparace vývoje české a slovenské pravicové politiky od roku 1989 do současnosti","handle":"123456789/937","type":"item","link":"/rest/items/1423","expand":["metadata","parentCollection","parentCollectionList","parentCommunityList","bitstreams","all"],"lastModified":"2016-12-20 17:52:30.641","parentCollection":null,"parentCollectionList":null,"parentCommunityList":null,"bitstreams":null,"archived":"true","withdrawn":"false"}
OK
True

So I've decided to implement this function into my script (without any changes), but now DSpace API returns response code 500 when function is called.

Details on the implementation are bellow:

def get_workflow_process(document):
    if document.document_in_dspace(handle=document.handle) is True:
        return 'delete'
    else:
        return None

wf_process = get_workflow_process(document)
    log.msg("Document:", document.doc_id, "Workflow process:", wf_process)

And the output is:

2017-01-04 11:08:45+0100 [-] DSPACE API response code: 500
2017-01-04 11:08:45+0100 [-] Internal Server Error
2017-01-04 11:08:45+0100 [-] 
2017-01-04 11:08:45+0100 [-] False
2017-01-04 11:08:45+0100 [-] Document: 28243 Workflow process: None

Can you please provide me with any suggestions what might be causing it and how to solve this? I am quite surprised that this works in Python Console but not in actual script and it seems I can't figure out by myself. Thank you!

1

There are 1 answers

0
Jakub Řihák On BEST ANSWER

I think I figured it out. The problem was probably with some trailing newline characters in the handle param of the document_in_dspace function. Updated function looks like this:

def document_in_dspace(handle):
    url = 'https://dspace.cuni.cz/rest/handle/' # TODO: Move to config

    hdl = handle.rstrip()
    prefix, suffix = str(hdl).split(sep='/')

    r_url = url + prefix + '/' + suffix
    log.msg("DSpace API request url is:", r_url)

    r = requests.get(r_url, timeout=1)

    if r.status_code == requests.codes.ok:
        log.msg("DSPACE API response code:", r.status_code)
        log.msg("Document with handle", handle, "found in DSpace!")
        log.msg("Document handle:", handle)
        log.msg("Request:\n", r.request.headers)
        log.msg("\n")
        log.msg(r.reason)
        return True
    else:
        log.msg("DSPACE API response code:", r.status_code)
        log.msg("Document with handle", handle, "not found in DSpace!")
        log.msg("Document handle:", handle)
        log.msg("Request:\n", r.request.headers)
        log.msg("\n")
        log.msg(r.reason)
        return False

Basically, what I did was to call .rstrip() on handle string to get rid of all unwanted trailing charactes, then I separated the prefix and suffix parts of the handle (just for the sake of being sure) and constructed request url (r_url) by joining all the parts together.

I will make the function prettier in the future, but at least this now works as intended.

Output is following:

2017-01-04 15:06:16+0100 [-] Checking if document with handle 123456789/937
 is in DSpace...
2017-01-04 15:06:16+0100 [-] DSpace API request url is: https://dspace.cuni.cz/rest/handle/123456789/937
2017-01-04 15:06:16+0100 [-] DSPACE API response code: 200
2017-01-04 15:06:16+0100 [-] Document with handle 123456789/937
 found in DSpace!
2017-01-04 15:06:16+0100 [-] Document handle: 123456789/937

2017-01-04 15:06:16+0100 [-] Request:
 {'Accept-Encoding': 'gzip, deflate', 'User-Agent': 'python-requests/2.11.1', 'Connection': 'keep-alive', 'Accept': '*/*'}
2017-01-04 15:06:16+0100 [-] 
2017-01-04 15:06:16+0100 [-] OK

Nevertheless, DSpace API seems to return response code 500 when item with given handle is not present in the repository, instead of response code 404.