How do I return large number of columns from mysql database using AWS Lambda?

609 views Asked by At

A brief background: I have a serverless mysql database connected to AWS RDS which consists of 70 columns and 7000+ rows. Many of the columns defined in this RDS have datatype as VARCHAR(2000), BIGINT.

I'm trying to return all the details in this database using an AWS Lambda api, following is the code snippet of the handler function that I'm using to achieve the same:

def handler(event, context):
    try:

        #Storing data to be returned
        dataReturn=[]



        #Insert Data into Database
        with conn.cursor() as cur:
            cur.execute("SELECT <names of all 70 columns> FROM <table_name>" ) 
            row_headers=[x[0] for x in cur.description] #this will extract row headers
            rv = cur.fetchall()
            json_data=[]
            for result in rv:
                json_data.append(dict(zip(row_headers,result)))
            #print(json.dumps(json_data))
            conn.commit()



        return {
            'statusCode': 200,
            'headers': {
            "Access-Control-Allow-Origin": "*",
            "Access-Control-Allow-Methods": 'GET, POST, PUT, DELETE, OPTIONS'
            },
            'body': json.dumps(json_data, cls=DecimalEncoder)
        }
    except Exception: 
        print(traceback.format_exc())
        return {
            'statusCode': 502,
            'headers': {
            "Access-Control-Allow-Origin": "*",
            "Access-Control-Allow-Methods": 'GET, POST, PUT, DELETE, OPTIONS'
            },
            'body': json.dumps("Server Error Occured", cls=DecimalEncoder)
        }

However, when I run this it gives me a '502 server error' with the following logs:

Traceback (most recent call last):
  File "/var/task/getInstrumentsDetails.py", line 55, in handler
    print(json.dumps(json_data))
  File "/var/lang/lib/python3.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/var/lang/lib/python3.7/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/var/lang/lib/python3.7/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/var/lang/lib/python3.7/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Decimal is not JSON serializable

As a sidenote, I am able to return 24 columns from this table using: SELECT <name_of_25_columns> FROM <table_name>, but it fails if I try to display more than 24 columns.

Is this an issue related to the amount of memory that AWS Lambda can read/return from the RDS? Please suggest a suitable fix to this problem.

Thank you

3

There are 3 answers

6
Ionut Ticus On

I don't see the definition for your DecimalEncoder above but you should be able to use something as simple as:

class DecimalEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, decimal.Decimal):
            return float(obj)
        elif isinstance(obj, bytes):
            if len(obj) == 1:
                return ord(obj)
            else:
                return obj.decode()
        return json.JSONEncoder.default(self, obj)

Note that if you want to print the result of json.dumps you need to pass the custom encoder there as well: print(json.dumps(json_data), cls=DecimalEncoder).

Regarding the memory question: Lambda can use 128 MB to 3008 MB, in 64 MB increments; having 7000+ rows with many varchar columns can result in a considerable amount of memory being used; first make sure your code is correct by using a LIMIT clause in your query and then try to run it for the whole data.

Note: you have a dataReturn variable you don't use and you have a conn.commit() statement that should not be necessary when reading (SELECTing) data.

Update: based on the new data it seems your issue is with byte data not Decimal: TypeError: Object of type bytes is not JSON serializable

For example, column Is_Record_Day_Wrkng is a byte with the value \x00; if these byte values should actually be numbers you can use ord to encode them in JSON but if not you should use something like bytes.decode - see updated encode above.

0
cd3k On

Lambda functions have a cap on the amount of data they will transfer by default but you can change it in API gateway in your aws console.

1
E.J. Brennan On

I suspect the problem isn't the number of columns, but is instead that after 24 columns, your 25th column is a decimal type - at least, that is what the error message is actually reporting.

Your return function specifies an encoder, but your print statement does not - it is commented out in your code above, but the error message you are showing does not have the encoder:

Traceback (most recent call last):
  File "/var/task/getInstrumentsDetails.py", line 55, in handler
    print(json.dumps(json_data))