Delete a huge amount of files in Rackspace using fog

1.3k views Asked by At

I have millions of files in my Rackspace Files. I would like to delete a part of them, passing lists of file names instead of deleting one by one, which is very slow. Is there any way to do this with fog? Right now, I have a script to delete each file, but would be nice to have something with better performance.

connection = Fog::Storage.new({
  :provider           => 'Rackspace',
  :rackspace_username => "xxxx",
  :rackspace_api_key  => "xxxx",
  :rackspace_region   => :iad  
})

dir = connection.directories.select {|d| d.key == "my_directory"}.first

CloudFileModel.where(duplicated: 1).each do |record| 
    f = record.file.gsub("/","")
    dir.files.destroy(f) rescue nil
    puts "deleted #{record.id}"
end
2

There are 2 answers

0
wicz On BEST ANSWER

Yes, you can with delete_multiple_objects.

Deletes multiple objects or containers with a single request.

To delete objects from a single container, container may be provided and object_names should be an Array of object names within the container.

To delete objects from multiple containers or delete containers, container should be nil and all object_names should be prefixed with a container name.

Containers must be empty when deleted. object_names are processed in the order given, so objects within a container should be listed first to empty the container.

Up to 10,000 objects may be deleted in a single request. The server will respond with 200 OK for all requests. response.body must be inspected for actual results.

Examples: Delete objects from a container

object_names = ['object', 'another/object']
conn.delete_multiple_objects('my_container', object_names)

Delete objects from multiple containers

object_names = ['container_a/object', 'container_b/object']
conn.delete_multiple_objects(nil, object_names)

Delete a container and all it's objects

object_names = ['my_container/object_a', 'my_container/object_b', 'my_container']
conn.delete_multiple_objects(nil, object_names)
0
Sam Harwell On

To my knowledge, the algorithm included here is the most reliable and highest-performance algorithm for deleting a Cloud Files container along with any objects it contains. The algorithm could be modified for your purposes by including a parameter with the names of items to delete instead of calling ListObjects. At the time of this writing, there is no server-side functionality (i.e. bulk operation) capable of meeting your needs in a timely manner. Bulk operations are rate limited to 2-3 delete operations per second, so at least 55 minutes per 10,000 items you delete.

The following code shows the basic algorithm (slightly simplified from the syntax that is actually required in the .NET SDK). It assumes that no other clients are adding objects to the container at any point after execution of this method begins.

Note that you will be rate limited to a maximum of 100 delete operations per second per container which contains files. If multiple containers are involved, distribute your concurrent requests to round-robin the requests to each of the containers. Adjust your concurrency level to the value that approaches the hard rate limit. Using this algorithm has allowed me to reach long-term sustained deletion rates of over 450 objects/second when multiple containers were involved.

public static void DeleteContainer(
  IObjectStorageProvider provider,
  string containerName)
{
  while (true)
  {
    // The only reliable way to determine if a container is empty is
    // to list its objects
    ContainerObject[] objects = provider.ListObjects(containerName);
    if (!objects.Any())
      break;

    // the iterations of this loop should be executed concurrently.
    // depending on connection speed, expect to use 25 to upwards of 300
    // concurrent connections for best performance.
    foreach (ContainerObject obj in objects)
    {
      try
      {
        provider.DeleteObject(containerName, obj.Name);
      }
      catch (ItemNotFoundException)
      {
        // a 404 can happen if the object was deleted on a previous iteration,
        // but the internal database did not fully synchronize prior to calling
        // List Objects again.
      }
    }
  }

  provider.DeleteContainer(containerName);
}