I am implementing a website where people can play board games together. Their actions are passed through a simple websocket which merely relays messages to the other clients. I used to implement the websocket in node and serve it from an AWS EC instance, but I decided to try to move to serverless, Websocket API.
I have followed this tutorial which describes nearly the same use-case as mine.
After the change to serverless, the latency significantly increased (from <100ms to ~500ms). I tried decreasing it by saving the list of clients and opening the connection to the websocket outside of my handler (so that it doesn't get re-initialized every time the lambda is called):
var AWS = require('aws-sdk');
var dynamodb = new AWS.DynamoDB();
let active_connnections = null;
const get_active_connections = async () => {
const data = await dynamodb.scan({ TableName: process.env.TABLE_NAME }).promise();
active_connnections = data.Items.map(id => id.id.S);
};
let endpoint = null;
const connect_endpoint = async () => {
endpoint = new AWS.ApiGatewayManagementApi({endpoint: process.env.ENDPOINT});
};
const init = async (myConnectionId) => {
var promisesToRun = [];
if ((!active_connnections) || !active_connnections.includes(myConnectionId)) {
promisesToRun.push(get_active_connections());
}
if (!endpoint) {
promisesToRun.push(connect_endpoint());
}
await Promise.all(promisesToRun);
};
const send_message = async (connectionId, message) => {
try {
await endpoint.postToConnection({ ConnectionId: connectionId, Data: JSON.stringify(message)}).promise();
} catch (e) {
if (e.statusCode == 410) {
await dynamodb.deleteItem({ TableName: process.env.TABLE_NAME, Key: { id: {S: connectionId } } } ).promise();
} else {
throw e;
}
}
};
exports.handler = async (event, context) => {
await init(event.requestContext.connectionId);
const postCalls = items.map(async (id) => send_message(id, JSON.parse(event.body)));
await Promise.all(postCalls);
This decreased the latency in the cases when my lambda is not cold-starting to acceptable levels. However, the lambda is being destroyed relatively quickly (1-2s after opening the connection).
Are there any other tricks or flat out different solutions that would allow me to keep latency low for longer periods?
I can see you're using a DynamoDB scan, this will definitely impact performance of your applications.
DynamoDB uses RCU (read capacity units) when attempting any read operations, 1 RCU will be consumed for 2 4KB items for eventual consistency, or 1 4KB item with strong consistency.
It is important to understand that when you perform a scan, the backend of DynamoDB gets every item in the table and then applies filters afterwards. This will consume the RCU of the entire table size.
To get around this people architect their DynamoDB tables to retrieve items by a partition and optionally sort key. Using the GetItem function instead will use only the RCU for the items that are returned.
Performance issues will likely be coming from your total available RCU being depleted, you can validate this within CloudWatch by looking at the read throttles for the DynamoDB in question.