Shared ephemeral file storage between Fargate tasks per Step Function run

112 views Asked by At

I am trying to run a service that does some analysis on 3rd party code using AWS step function and 3 separate fargate tasks that need some kind of shared storage. Two of the tasks are trusted, but one will be running arbitrary unchecked 3rd party code. The results of each step need to somehow be accessible by the following step without compromising data security.

The flow of the service is this:

  1. Step 1 pulls code from storage (originally uploaded by a user, can be S3) and does some operations on it. This step can be trusted, as the 3rd party code is never run.
  2. Step 2 runs the code, and somehow needs to store the output. This step must be untrusted as the code is user provided and may be malicious. This means that it cannot be granted S3 read access to pull the code (since it could pull some from another user), and it cannot be granted S3 write access. Somehow it needs to get the code from step 1, and eventually send its output to step 3.
  3. Step 3 does an operation on the result of step 2, and uploads the results to S3 to be pulled by a separate web service. Step 3 is trusted, and therefor can be granted S3 access.

My thought on how to deal with this was to see if there was some kind of shared ephemeral storage that could be created at the beginning of the step function, and shared between the fargate tasks, and then deleted as soon as the step function terminates. Each fargate task should be able to only access data from the shared storage associated with that particular step function run or we run into the same issue as if we had given all 3 tasks S3 access.

Is something like this possible? Everything I could find on EFS seemed to require manually creating an endpoint with dedicated IAM roles, which wouldn't work for this use case since these step functions are user triggered. Is there some other way to do this that would allow isolation between tasks, but access to some shared resources (which themselves should be isolated per step-function)?

1

There are 1 answers

0
fedonev On

I understand your two requirements to be: (1) isolate the file artefacts per step function execution and (2) accommodate reading and writing by both trusted and untrusted sources.

Consider the following, relying only on S3:

  1. Capture a unique ID that identifies the execution in a Pass State. Generate a UUID with the States.UUID() intrinsic function. Or instead use the Execution ID from the context object as $$.Execution.Id.
  2. A Lambda task stages the input files to S3. Use the execution's random ID as the S3 objects' prefix: /<random-id>/input.zip.
  3. Another Lambda task creates two presigned URLs. The untrusted Fargate Task 2 will use them to read the inputs and write its output. The untrusted task does not even know the random-id. It just gets time-limited obfuscated URLs.
  4. Do the work. Your trusted tasks can access the /<random-id>/ S3 files directly, the untrusted task only via the presigned URLs.
  5. A Lambda task deletes all the files with the prefix /<random-id>/.

Requirement #1 is satisfied with a the random prefix. Requirement #2 is satisfied with the presigned URLs.